Synchronizing folders with rsync
En Español  

In this post I cover the basics of rsync, in preparation for a subsequent post that will cover backups and it's use in conjunction with cronjobs to automatize the backup process. From the copying and synchronization of local files and folders, to it's use for transfer information among computers. Its use as a daemon when SSH is unavailable was moved to it's own section.

Topics
The basics of rsync
Copying local files and folders
Dealing with whitespace and rare characters
Update the contents of a folder
Synchronizing two folders with rsync
Compressing the files while transferring them
Transferring files between two remote systems
Excluding files and directories
Running rsync as a daemon (moved to it's own section)
Some additional rsync parameters
Footnotes

The basics of rsync

rsync is a very versatile copying and backup tool that is included by default in almost every Linux distribution. It can be used as an advanced copying tool, allowing us to copy files both locally and remotely. It can also be used as a backup tool. It supports the creation of incremental backups.

rsync counts with a famous delta-transfer algorithm that allows us to transfer new files as well as recent changes to existent files, while ignoring unchanged files. Additionally to this, the behavior of rsync can be throughly customized, helping us to automatize backups, it can also be run as a daemon to turn the computer into a host and allow rsync clients connect to it.

Besides the copying of local files and folders, rsync allow us to copy over SSH (Secure Shell), RSH (Remote Shell) and it can be run as a daemon in a computer and allow other computers to connect to it, when rsync is run as a daemon it listens to the port TCP 873.

When we use rsync as a daemon or when we use RSH, the data that is send between computers travels unencrypted, so, if you are transferring files between two computers in the same local network, this is useful, but this shouldn't be used to transfer files over insecure networks, such as the Internet. For this purpose SSH is the way to go.

This is the main reason why I favor the use of SSH for my transfers, besides, since SSH is secure, many servers have the SSH daemon available. But the use of rsync as a daemon for transfers over fast connections, as is usually the case in a local network, is useful. I don't have the RSH daemon running in my computers so you may find me a bit biased about SSH in the examples. The examples covering the transfer of files between two computers use SSH as the medium of transport, but in a separate post I cover the use of rsync as a daemon.

Copying local files and folders

To copy the contents of one local folder into another, replacing the files in the destination folder, we use:

rsync -rtv source_folder/ destination_folder/

In the source_folder notice that I added a slash at the end, doing this prevents a new folder from being created, if we don't add the slash, a new folder named as the source folder will be created in the destination folder. So, if you want to copy the contents of a folder called Pictures into an existent folder which is also called Pictures but in a different location, you need to add the trailing slash, otherwise, a folder called Pictures is created inside the Pictures folder that we specify as destination.

rsync -rtv source/ destination/
A graphical representation of the results of rsync with a trailing slash in the source folder.

rsync -rtv source destination/
A graphical representation of the results of rsync without a trailing slash in the source folder.

The parameter -r means recursive, this is, it will copy the contents of the source folder, as well as the contents of every folder inside it.

The parameter -t makes rsync preserve the modification times of the files that it copies from the source folder.

The parameter -v means verbose, this parameter will print information about the execution of the command, such as the files that are successfully transferred, so we can use this as a way to keep track of the progress of rsync.

This are the parameters that I frequently use, as I am usually backing up personal files and this doesn't contain things such as symlinks, but another very useful parameter to use rsync with is the parameter -a.

rsync -av source/ destination/

The parameter -a also makes the copy recursive and preserve the modification times, but additionally it copies the symlinks that it encounters as symlinks, preserve the permissions, preserve the owner and group information, and preserve device and special files. This is useful if you are copying the entire home folder of a user, or if you are copying system folders somewhere else.

Dealing with whitespace and rare characters

We can escape spaces and rare characters just as in bash, by the use of \ before any whitespace and rare character. Additionally, we can use single quotes to enclose the string:

rsync -rtv so\{ur\ ce/ dest\ ina\{tion/
rsync -rtv 'so{ur ce/' 'dest ina{tion/'

Update the contents of a folder

In order to save bandwidth and time, we can avoid copying the files that we already have in the destination folder that have not been modified in the source folder. To do this, we can add the parameter -u to rsync, this will synchronize the destination folder with the source folder, this is where the delta-transfer algorithm enters. To synchronize two folders like this we use:

rsync -rtvu source_folder/ destination_folder/

By default, rsync will take into consideration the date of modification of the file and the size of the file to decide whether the file or part of it needs to be transferred or if the file can be left alone, but we can instead use a hash to decide whether the file is different or not. To do this we need to use the -c parameter, which will perform a checksum in the files to be transferred. This will skip any file where the checksum coincides.

rsync -rtvuc source_folder/ destination_folder/

Synchronizing two folders with rsync

To keep two folders in synchrony, not only do we need to add the new files in the source folder to the destination folder, as in the past topics, we also need to remove the files that are deleted in the source folder from the destination folder. rsync allow us to do this with the parameter --delete, this used in conjunction with the previously explained parameter -u that updates modified files allow us to keep two directories in synchrony while saving bandwidth.

rsync -rtvu --delete source_folder/ destination_folder/

The deletion process can take place in different moments of the transfer by adding some additional parameters:

  • rsync can look for missing files and delete them before it does the transfer process, this is the default behavior and can be set with --delete-before
  • rsync can look for missing files after the transfer is completed, with the parameter --delete-after
  • rsync can delete the files done during the transfer, when a file is found to be missing, it is deleted at that moment, we enable this behavior with --delete-during
  • rsync can do the transfer and find the missing files during this process, but instead of delete the files during this process, waits until it is finished and delete the files it found afterwards, this can be accomplished with --delete-delay

e.g.:

rsync -rtvu --delete-delay source_folder/ destination_folder/

Compressing the files while transferring them

To save some bandwidth, and usually it can save some time as well, we can compress the information being transfer, to accomplish this we need to add the parameter -z to rsync.

rsync -rtvz source_folder/ destination_folder/

Note, however, that if we are transferring a large number of small files over a fast connection, rsync may be slower with the parameter -z than without it, as it will take longer to compress every file before transfer it than just transferring over the files. Use this parameter if you have a a connection with limited speed between two computers, or if you need to save bandwidth.

Transferring files between two remote systems

rsync can copy files and synchronize a local folder with a remote folder in a system running the SSH daemon, the RSH daemon, or the rsync daemon. The examples here use SSH for the file transfers, but the same principles apply if you want to do this with rsync as a daemon in the host computer, read Running rsync as a daemon when ssh is not available further below for more information about this.

To transfer files between the local computer and a remote computer, we need to specify the address of the remote system, it may be a domain name, an IP address, or a the name of a server that we have already saved in our SSH config file (information about how to do this can be found in Defining SSH servers), followed by a colon, and the folder we want to use for the transfer. Note that rsync can not transfer files between two remote systems, only a local folder or a remote folder can be used in conjunction with a local folder. To do this we use:

Local folder to remote folder, using a domain, an IP address and a server defined in the SSH configuration file:
rsync -rtvz source_folder/ user@domain:/path/to/destination_folder/
rsync -rtvz source_folder/ user@xxx.xxx.xxx.xxx:/path/to/destination_folder/
rsync -rtvz source_folder/ server_name:/path/to/destination_folder/

Remote folder to local folder, using a domain, an IP address and a server defined in the SSH configuration file:
rsync -rtvz user@domain:/path/to/source_folder/ destination_folder/
rsync -rtvz user@xxx.xxx.xxx.xxx:/path/to/source_folder/ destination_folder/
rsync -rtvz server_name:/path/to/source_folder/ destination_folder/

Excluding files and directories

There are many cases in which we need to exclude certain files and directories from rsync, a common case is when we synchronize a local project with a remote repository or even with the live site, in this case we may want to exclude some development directories and some hidden files from being transfered over to the live site. Excluding files can be done with --exclude followed by the directory or the file that we want to exclude. The source folder or the destination folder can be a local folder or a remote folder as explained in the previous section.

rsync -rtv --exclude 'directory' source_folder/ destination_folder/
rsync -rtv --exclude 'file.txt' source_folder/ destination_folder/
rsync -rtv --exclude 'path/to/directory' source_folder/ destination_folder/
rsync -rtv --exclude 'path/to/file.txt' source_folder/ destination_folder/

The paths are relative to the folder from which we are calling rsync unless it starts with a slash, in which case the path would be absolute.

Another way to do this is to create a file with the list of both files and directories to exclude from rsync, as well as patterns (all files that would match the pattern would be excluded, *.txt would exclude any file with that extension), one per line, and call this file with --exclude-from followed by the file that we want to use for the exclusion of files. First, we create and edit this file in our favorite text editor, in this example I use gedit, but you may use kate, Vim, nano, or any other text editor:

touch excluded.txt
gedit excluded.txt

In this file we can add the following:

directory
relative/path/to/directory
file.txt
relative/path/to/file.txt
/home/juan/directory
/home/juan/file.txt
*.swp

And then we call rsync:

rsync -rvz --exclude-from 'excluded.txt' source_folder/ destination_folder/

In addition to delete files that have been removed from the source folder, as explained in Synchronizing two folders with rsync, rsync can delete files that are excluded from the transfer, we do this with the parameter --delete-excluded, e.g.:

rsync -rtv --exclude-from 'excluded.txt' --delete-excluded source/ destination/

This command would make rsync recursive, preserve the modification times from the source folder, increase verbosity, exclude all the files that match the patterns in the file excluded.txt, and delete all of this files that match the patternif they exist in the destination folder.

Running rsync as a daemon when ssh is not available

This was moved to it's own section, Running rsync as a daemon.

Some additional rsync parameters

-t Preserves the modification times of the files that are being transferred.
-q Suppress any non-error message, this is the contrary to -v which increases the verbosity.
-d Transfer a directory without recursing, this is, only the files that are in the folder are transferred.
-l Copy the symlinks as symlinks.
-L Copy the file that a symlink is pointing to whenever it finds a symlink.
-W Copy whole files. When we are using the delta-transfer algorithm we only copy the part of the archive that was updated, sometimes this is not desired.
--progress Shows the progress of the files that are being transferred.
-h Shows the information that rsync provides us in a human readable format, the amounts are given in K's, M's, G's and so on.

Footnotes

The amount of options that rsync provide us is immense, we can define exactly which files we want to transfer, what specific files we want to compress, what files we want to delete in the destination folder if this files exists, and we can deal with system files as well, for more information we can use man rsync and man rsyncd.conf

I leave the information concerning backups out of this post, as this will be covered, together with the automation of the backups, in an incoming post.

It is possible to run rsync on Windows with the use of cygwin, however I don't have a Windows box available at the moment (nor do I plan to aquire one in the foreseeable future), so even thought I have done it I can't post about this. If you run rsync as a service in Windows tho, you need to add the line "strict mode = false" in rsyncd.conf under the modules area, this will prevent rsync from checking the permissions in the secrets file and thus failing because they are not properly set (as they don't work the same as in Linux).

This post may be updated if there is something to correct or to add a little more information if I see it necessary.