You might not even like rsync. Yeah it’s old. Yeah it’s slow. But if you’re working with Linux you’re going to need to know it.
In this video I walk through my favorite everyday flags for rsync.
Support the channel:
https://patreon.com/VeronicaExplains
https://ko-fi.com/VeronicaExplains
https://thestopbits.bandcamp.com/
Here’s a companion blog post, where I cover a bit more detail: https://vkc.sh/everyday-rsync
Also, @BreadOnPenguins made an awesome rsync video and you should check it out: https://www.youtube.com/watch?v=eifQI5uD6VQ
Lastly, I left out all of the ssh setup stuff because I made a video about that and the blog post goes into a smidge more detail. If you want to see a video covering the basics of using SSH, I made one a few years ago and it’s still pretty good: https://www.youtube.com/watch?v=3FKsdbjzBcc
Chapters:
1:18 Invoking rsync
4:05 The --delete flag for rsync
5:30 Compression flag: -z
6:02 Using tmux and rsync together
6:30 but Veronica… why not use (insert shiny object here)
I still prefer tar for quick and dirty same box copies.
tar cf - * | (cd /target; tar xfp -)
I’ll never not upvote Veronica Explains. Excellent creator and excellent info on everything I’ve seen.
Rsnapshot. It uses rsync, but provides snapshot management and multiple backup versioning.
Yes, but a few hours writing my own scripts will save me from several minutes of reading its documentation…
It took me like 10 min to setup rsnapshot (installing, and writing systemd unit /timer files) on my servers.
I’m sure I could script something similar in under 10 (hours).
Yah, I really like this approach. Same reason I set up Timeshift and Mint Backup on all the user machines in my house. For others rsync + cron is aces.
Veeam for image/block based backups of Windows, Linux and VMs.
syncthing for syncing smaller files across devices.Thank you very much.
deleted by creator
I use syncthing.
Is rsync better?
Syncthing works pretty well for me and my stable of Ubuntu, pi, Mac, and Windows
I’m not super familiar with Syncthing, but judging by the name I’d say Syncthing is not at all meant for backups.
Syncthing is technically to synchronize data across different devices in real time (which I do with my phone), but I also use it to transfer data weekly via wi-fi to my old 2013 laptop with a 500GB HDD and Linux Mint (I only boot it to transfer data, and even then I pause the transfers to this device when its done transferring stuff) so I can have larger data backups that wouldn’t fit in my phone, since LocalSend is unreliable for large amounts of data while Synchting can resume the transfer if anything goes wrong. On top of that Syncthing also works in Windows and Android out of the box.
its for a different purpose. I wouldn’t use syncthing the way I use rsync
I used to use rsnapshot, which is a thin wrapper around rsync to make it incremental, but moved to restic and never looked back. Much easier and encrypted by default.
It’s slow?!?
Compared to something multi threaded, yes. But there are obviously a number of bottlenecks that might diminish the gains of a multi threaded program.
With xargs everything is multithreaded.
That part threw me off. Last time i used it, I did incremental backups of a 500 gig disk once a week or so, and it took 20 seconds max.
I think the there are better alternatives for backup like kopia and restic. Even seafile. Want protection against ransomware, storage compression, encryption, versioning, sync upon write and block deduplication.
comparing seafile to rsync reminds me the old “Space Pen” folk tale.
This exactly. I’d use rsync to sync a directory to a location to then be backed up by kopia, but I wouldn’t use rsync exclusively for backups.
rsync for backups? I guess it depends on what kind of backup
for redundant backups of my data and configs that I still have a live copy of, I use restic, it compresses extremely well
I have used rsync to permanently move something to another drive though
Yeah it’s slow
What’s slow about async? If you have a reasonably fast CPU and are merely syncing differences, it’s pretty quick.
It’s single thread, one file at a time.
That would only matter if it’s lots of small files, right? And after the initial sync, you’d have very few files, no?
Rsync is designed for incremental syncs, which is exactly what you want in a backup solution. If your multithreaded alternative doesn’t do a diff, rsync will win on larger data sets that don’t have rapid changes.
For a home setup that seems fine. But I can understand why you wouldn’t want this for a whole enterprise.
I would generally argue that rsync is not a backup solution. But it is one of the best transfer/archiving solutions.
Yes, it is INCREDIBLY powerful and is often 90% of what people actually want/need. But to be an actual backup solution you still need infrastructure around that. Bare minimum is a crontab. But if you are actually backing something up (not just copying it to a local directory) then you need some logging/retry logic on top of that.
At which point you are building your own borg, as it were. Which, to be clear, is a great thing to do. But… backups are incredibly important and it is very much important to understand what a backup actually needs to be.
Borg gang represent!
I would generally argue that rsync is not a backup solution.
Yeah, if you want to use rsync specifically for backups, you’re probably better-off using something like
rdiff-backup
, which makes use of rsync to generate backups and store them efficiently, and drive it from something likebackupninja
, which will run the task periodically and notify you if it fails.rsync
: one-way synchronizationunison
: bidirectional synchronizationgit
: synchronization of text files with good interactive merging.rdiff-backup
:rsync
-based backups. I used to use this and moved torestic
, as thebackupninja
target forrdiff-backup
has kind of fallen into disrepair.That doesn’t mean “don’t use
rsync
”. I mean,rsync
’s a fine tool. It’s just…not really a backup program on its own.Beware rdiff-backup. It certainly does turn rsync (not a backup program) into a backup program.
However, I used rdiff-backup in the past and it can be a bit problematic. If I remember correctly, every “snapshot” you keep in rdiff-backup uses as many inodes as the thing you are backing up. (Because every “file” in the snapshot is either a file or a hard link to an identical version of that file in another snapshot.) So this can be a problem if you store many snapshots of many files.
But it does make rsync a backup solution; a snapshot or a redundant copy is very useful, but it’s not a backup.
(OTOH, rsync is still wonderful for large transfers.)
Because every “file” in the snapshot is either a file or a hard link to an identical version of that file in another snapshot.) So this can be a problem if you store many snapshots of many files.
I think that you may be thinking of
rsnapshot
rather thanrdiff-backup
which has that behavior; both usersync
.But I’m not sure why you’d be concerned about this behavior.
Are you worried about inode exhaustion on the destination filesystem?
Huh, I think you’re right.
Before discovering ZFS, my previous backup solution was rdiff-backup. I have memories of it being problematic for me, but I may be wrong in my remembering of why it caused problems.
+1 for rdiff-backup. Been using it for 20 years or so, and I love it.
Having a synced copy elsewhere is not an adequate backup and snapshots are pretty important. I recently had RAM go bad and my most recent backups had corrupt data, but having previous snapshots saved the day.
Don’t understand the downvotes. This is the type of lesson people have learned from losing data and no sense in learning it the hard way yourself.
I use rsync and a pruning script in crontab on my NFS mounts. I’ve tested it numerous times breaking containers and restoring them from backup. It works great for me at home because I don’t need anything older than 4 monthly, 4 weekly, and 7 daily backups.
However, in my job I prefer something like bacula. The extra features and granularity of restore options makes a world of difference when someone calls because they deleted prod files.
I don’t know if there’s a term for them, but Bacula (and I think AMANDA might fall into this camp, but I haven’t looked at it in ages) are oriented more towards…“institutional” backup. Like, there’s a dedicated backup server, maybe dedicated offline media like tapes, the backup server needs to drive the backup, etc).
There are some things that
rsnapshot
,rdiff-backup
,duplicity
, and so forth won’t do.-
At least some of them (
rdiff-backup
, for one) won’t dedup files with different names. If a file is unchanged, it won’t use extra storage, but it won’t identify different identical files at different locations. This usually isn’t all that important for a single host, other than maybe if you rename files, but if you’re backing up many different hosts, as in an institutional setting, they likely files in common. They aren’t intended to back up multiple hosts to a single, shared repository. -
Pull-only. I think that it might be possible to run some of the above three in “pull” mode, where the backup server connects and gets the backup, but where they don’t have the ability to write to the backup server. This may be desirable if you’re concerned about a host being compromised, but not the backup server, since it means that an attacker can’t go dick with your backups. Think of those cybercriminals who encrypt data at a company and wipe other copies and then demand a ransom for an unlock key. But the “institutional” backup systems are going to be aimed at having the backup server drive all this, and have the backup server have access to log into the individual hosts and pull the backups over.
-
Dedup for non-identical files. Note that
restic
can do this. While files might not be identical, they might share some common elements, and one might want to try to take advantage of that in backup storage. -
rdiff-backup
andrsnapshot
don’t do encryption (thoughduplicity
does). If one intends to use storage not under one’s physical control (e.g. “cloud backup”), this might be a concern. -
No “full” backups. Some backup programs follow a scheme where one periodically does a backup that stores a full copy of the data, and then stores “incremental” backups from the last full backup. All
rsnapshot
,rdiff-backup
, andduplicity
are always-incremental, and are aimed at storing their backups on a single destination filesystem. A split between “full” and “incremental” is probably something you want if you’re using, say, tape storage and having backups that span multiple tapes, since it controls how many pieces of media you have to dig up to perform a restore. -
I don’t know how Bacula or AMANDA handle it, if at all, but if you have a DBMS like PostgreSQL or MySQL or the like, it may be constantly receiving writes. This means that you can’t get an atomic snapshot of the database, which is critical if you want to be reliably backing up the storage. I don’t know what the convention is here, but I’d guess either using filesystem-level atomic snapshot support (e.g.
btrfs
) or requiring the backup system to be aware of the DBMS and instructing it to suspend modification while it does the backup.rsnapshot
,rdiff-backup
, andduplicity
aren’t going to do anything like that.
I’d agree that using the more-heavyweight, “institutional” backup programs can make sense for some use cases, like if you’re backing up many workstations or something.
-
I was planning to use rsync to ship several TB of stuff from my old NAS to my new one soon. Since we’re already talking about rsync, I guess I may as well ask if this is right way to go?
It depends
rsync
is fine, but to clarify a little further…If you think you’ll stop the transfer and want it to resume (and some data might have changed), then yep,
rsync
is best.But, if you’re just doing a 1-off bulk transfer in a single run, then you could use other tools like
xcopy
/scp
or - if you’ve mounted the remote NAS at a local mount point - just plain oldcp
The reason for that is that
rsync
has to work out what’s at the other end for each file, so it’s doing some back & forwards communications each time which as someone else pointed out can load the CPU and reduce throughput.(From memory, I think Raspberry Pi don’t handle large transfers over
scp
well… I seem to recall a buffer gets saturated and the throughput drops off after a minute or so)Also, on a local network, there’s probably no point in using encryption or compression options - esp. for photos / videos / music… you’re just loading the CPU again to work out that it can’t compress any further.
It’s just a one-off transfer, I’m not planning to stop the transfer, and it’s my media library, so nothing should change, but I figured something resumable is a good idea for a transfer that’s going to take 12+ hours, in case there’s an unplanned stop.
One thing I forgot to mention:
rsync
has an option to preserve file timestamps, so if that’s important for your files, then thst might also be useful… without checking, the other commands probably have that feature, but I don’t recall at the moment.rsync -Prvt <source> <destination>
might be something to try, leave for a minute, stop and retry … that’ll prove it’s all working.Oh… and make sure you get the source and destination paths correct with a trailing
/
(or not), otherwise you’ll get all your files copied to an extra subfolder (or not)
I couldn’t tell you if it’s the right way but I used it on my Rpi4 to sync 4tb of stuff from my Plex drive to a backup and set a script up to have it check/mirror daily. Took a day and a half to copy and now it syncs in minutes tops when there’s new data
yes, it’s the right way to go.
rsync over ssh is the best, and works as long as rsync is installed on both systems.
On low end CPUs you can max out the CPU before maxing out network—if you want to get fancy, you can use rsync over an unencrypted remote shell like
rsh
, but I would only do this if the computers were directly connected to each other by one Ethernet cable.
Ive personally used rsync for backups for about…15 years or so? Its worked out great. An awesome video going over all the basics and what you can do with it.
And I generally enjoy Veronica’s presentation. Knowledgable and simple.
Her https://tinkerbetter.tube/w/ffhBwuXDg7ZuPPFcqR93Bd made me learn a new way of looking at data. There was some tricks I havent done before. She has such good videos.
Veronica is fantastic. Love her video editing, it reminds me more of the early days of YouTube.
Yep, I found her through YouTube. Her and action retro’s content is always great.with some Adrian black on the side.
It works fine if all you need is transfer, my issue with it it’s just not efficient. If you want a “time travel” feature, your only option is to duplicate data. Differential backups, compression, and encryption for off-site ones is where other tools shine.
If you want a “time travel” feature, your only option is to duplicate data.
Not true. Look at the --link-dest flag. Encryption, sure, rsync can’t do that, but incremental backups work fine and compression is better handled at the filesystem level anyway IMO.
Isn’t that creating hardlinks between source and dest? Hard links only work on the same drive. And I’m not sure how that gives you “time travel”, as in, browsing snapshots or file states at the different times you ran rsync.
Edit: ah the hard link is between dest and the link-dest argument, makes more sense.
I wouldn’t bundle fs and backup compression in the same bucket, because they have vastly different reqs. Backup compression doesn’t need to be optimized for fast decompression.
I have it add a backup suffix based on the date. It moves changed and deleted files to another directory adding the date to the filename.
It can also do hard-link copied so that you can have multiple full directory trees to avoid all that duplication.
No file deltas or compression, but it does mean that you can access the backups directly.
Thanks! I was not aware of these options, along with what other poster mentioned about
--link-dest
. These do turn rsync into a backup program, which is something the root article should explain!(Both are limited in some aspects to other backup software, but they might still be a simpler but effective solution. And sometimes simple is best!)
Agree. It’s neat for file transfers and simple one-shot backups, but if you’re looking for a proper backup solution then other tools/services have advanced virtually every aspect of backups it pretty much always makes sense to use one of those instead.
I use rsync for many of the reasons covered in the video. It’s widely available and has a long history. To me that feels important because it’s had time to become stable and reliable. Using Linux is a hobby for me so my needs are quite low. It’s nice to have a tool that just works.
I use it for all my backups and moving my backups to off network locations as well as file/folder transfers on my own network.
I even made my own tool (https://codeberg.org/taters/rTransfer) to simplify all my rsync commands into readable files because rsync commands can get quite long and overwhelming. It’s especially useful chaining multiple rsync commands together to run under a single command.
I’ve tried other backup and syncing programs and I’ve had bad experiences with all of them. Other backup programs have failed to restore my system. Syncing programs constantly stop working and I got tired of always troubleshooting. Rsync when set up properly has given me a lot less headaches.
Grsync is great. Having a GUI can be helpful