Tag Archives: rsync

rsync create directory tree on remote host

I ran into an issue where I want to use rsync to copy a folder to a remote host into a destination directory that doesn’t yet exist. I was frustrated to find that rsync doesn’t appear to be able to create a remote directory tree. It would keep erroring out with this message:

rsync: mkdir "/opt/splunk/var/run/searchpeers" failed: No such file or directory (2)

I discovered this workaround which allowed me to finally accomplish what I wanted in one line: create the remote directory structure first, then synchronize into it. This is done with the --rsync-path option. You can specify the mkdir -p command beforehand, then add the rsync command after double ampersand (&&)

My specific use case was to copy a Splunk search peer bundle from one indexer to another. This was my working one liner:

rsync -aP --rsync-path="sudo mkdir -p /opt/splunk/var/run/searchpeers && sudo rsync" /opt/splunk/var/run/searchpeers splunk-idx2.jeppson.org:/opt/splunk/var/run/searchpeers

Success.

rsync to remote directory containing spaces

I ran into an issue where if I ran rsync to a remote server with a path that contained spaces it would simply terminate at the first space instead of copying the whole path. I learned you need to do some quote tricks to get this to work properly (thanks to this site for helping me.)

The trick is to enclose the entire remote path, server included, in SINGLE quotes. In addition to the single quotes you still need to escape every space with a backslash ( \ )

rsync -aP <source_folder> 'REMOTE_SERVER:/full\ path\ with\ escaped\ spaces/folder/'

Fix Apache Permission Denied errors

The other day I ran the rsync command to migrate files from an old webserver to a new one. What I didn’t notice right away was that the rsync changed the permissions of the folder I was copying into.

The problem presented itself with a very lovely 403 forbidden error message when trying to access any website that server hosted. Checking the logs (/var/log/apache2/error.log on my Debian system) revealed this curious message:

[error] [client 192.168.22.22] (13)Permission denied: access to / denied

This made it look like apache was denying access for some reason. I verified apache config and confirmed it shouldn’t be denying anything. After some head scratching I came across this site which explained that Apache throws that error when it encounters filesystem access denied error messages.

I was confused because /var/www, where the websites live, had the appropriate permissions. After some digging I found that the culprit in my case was not /var/www, but rather the /var directory underneath /var/www. For some reason the rsync changed /var to not have any execute permissions (necessary for folder access.)  A simple

chmod o+rx /var/

resolved my problem. Next time you get 403 it could be underlying filesystem issues and not apache at all.

Verify backup integrity with rsync, sed, cat, and tee

Recently it became apparent that a large data transfer I did might have had some errors. I wanted to find an easy way to compare the source and destination to make sure that they were identical. My solution: rsync, sed, cat and tee

I have used rsync quite a bit but did not know about the –checksum flag until recently. When you run rsync with –checksum, it takes much longer, but it effectively does something similar to what a ZFS scrub does – it runs a checksum of every source file and compares it with the checksum of each destination file.  If there is a mismatch, rsync will overwrite the destination file with the source file to correct it.

In my situation I performed a large data migration from my old mdadam-based RAID array to my brand new ZFS array. During the transfer the disks were acting very strange, and at one point one of the disks even popped out of the array. The culprit turned out to be a faulty SATA controller. I bought a cheap 4 port SATA controller from Amazon for my new ZFS array. Do not do this! Spring the cash out for a better controller. The cheap ones, this one at least, only caused headache. I removed it and used the on-board SATA ports on my motherboard and the issues went  away.

All of those shennanigans made me wonder if there was corrupt data on my new ZFS array. A ZFS scrub repaired 15.5G of data! While I’m sure that fixed a lot of the issues, I realized there probably was still some corruption. This is how I verified it

rsync -Pahn --checksum /path/to/source /path/to/destination | tee migration.txt

-P shows progress, -a means archive, -h is for human readable measurements, and -n means dry run (don’t actually copy anything)

Tee is a cool utility that allows you to redirect output of a command both to a file and to standard output. This is useful if you want to see the verification take place in real time but also want to analyze it later.

After the comparison (which took a while!) I wanted to see the discrepancies. the -P flag lists each directory rsync checks as well as which files it detected. You can use sed in conjunction with cat to weed out the unwanted lines (directory listings) so that only the files with discrepancies are left.

 cat pictures.txt | sed '/\/$/d' | tee pictures-truncated.txt

The sed regex simply looks for any line ending in a / (directory listing) and removes that line. What is left is the files in question. You can combine the entire thing into one line like so

rsync -Pahn --checksum /path/to/source /path/to/destination | sed '/\/$/d' | tee migration.txt

In my case I wanted to compare discrepencies with rsync and make decisions on if I wanted to actually fix the issues. If you are 100% sure the source is OK to remove the destination completely, you can simply run

rsync -Pah --checksum --delete /path/to/source /path/to/destination

Compare two latest ZFS snapshots for differences

In my previous post about ZFS snapshots I discussed how to get the latest snapshot name. I came across a need to get the name of the second to last snapshot and then compare that with the latest. A little CLI kung-fu is required for this but nothing too scary.

The command of the day is: zfs diff.

zfs diff storage/mythTV@auto-20141007.1248-2h storage/mythTV@auto-20141007.1323-2h

If you get an error using zfs diff, you aren’t running as root. You will need to delegate the diff ZFS permission to the account you’re using:

zfs allow backup diff storage

where backup is the account you want to grant permissions for and storage is the dataset you want to grant permissions to.

The next step is to grab the two latest snapshots using the following commands.

Obtain latest snapshot:

zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1

Obtain the second to latest snapshot:

zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1

Putting it together in one line:

zfs diff `zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1` `zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1`

While doing some testing I came across an unfortunate bug with the ZFS diff function. Sometimes it won’t show files that have been deleted! It indicates that the folder where the deleted files were in was modified but doesn’t specify any further. This bug appears to affect all ZFS implementations per here and here. As of this writing there has been no traction on this bug. The frustrating part is the bug is over two years old.

The workaround for this regrettable bug is to use rsync  with the -n parameter to compare snapshots. -n indicates to only do a dry run and not actually try to copy anything.

To use Rsync for comparison, you have to do a little more CLI-fu to massage the output from the zfs list command so it’s acceptable to rsync as well as include the full mountpoint of both snapshots. When working with rsync, don’t forget the trailing slash.

rsync -vahn --delete /mnt/storage/Documents/.zfs/snapshot/`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1 | sed 's/.*@//g'`/ /mnt/storage/Documents/.zfs/snapshot/`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1 | sed 's/.*@//g'`/

Command breakdown:

Rsync arguments:
-v means verbose (lists files added/deleted)
-a means archive (preserve permissions)
-h means human readable numbers
-n means do a dry run only (no writing)
–delete will delete anything in the destination that’s not in the source (but not really since we’re doing -n – it will just print what it would delete on the screen)

Sed arguments
/s search and replace
/.*@ simple regex meaning anything up to and including the @ sign
/  What comes after this slash is what we would like to replace what was matched in the previous command. In this case, we choose nothing, and move directly to the last argument
/g tells sed to keep looking for other matches (not really necessary if we know there is only one in the stream)

All these backticks are pretty ugly, so for readability sake, save those commands into variables instead. The following is how you would do it in bash:

FIRST_SNAPSHOT="`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1 | sed 's/.*@//g'/`"
SECOND_SNAPSHOT="`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1 | sed 's/.*@//g'/`"
rsync -vahn --delete /mnt/storage/Documents/.zfs/snapshot/$FIRST_SNAPSHOT /mnt/storage/Documents/.zfs/snapshot/$SECOND_SNAPSHOT

I think I’ll stop for now.

Configuring rsync between two machines

rsync is a powerful backup tool. I have used it over SSH before but never with its own internal daemon. Following this guide I configured the rsync daemon with a share and host based access control. I then configured an rsync task in freeNAS to sync pictures between itself and the rsync server via rsync, not SSH (for speed). In this example my server is running Debian Wheezy and the client is running FreeNAS.

  1. On the server, create /etc/rsyncd.conf and add the following:
    max connections = 1
    log file = /var/log/rsync.log
    timeout = 300
    [Pictures]
    comment = All our pictures
    path = /storage/Pictures
    read only = yes
    list = yes
    uid = nobody
    gid = nogroup
    #auth users = mongrel
    list = yes
    hosts allow = 127.0.0.0/8 192.168.0.0/16
    #secrets file = /etc/rsyncd.secrets

    Note the only access control here is via source IP address. You can also have username/password access controls which I commented out.

  2. (Still on the server) start the rsync daemon
    rsync --daemon
  3. Configure the client. I used the freeNAS GUI which generated the following cron job
    rsync -r -t -z --delete  192.168.54.10::Pictures '/mnt/storage/Pictures/'

    Putting that to the test in the command line with an additonal -P parameter to see progress, I saw that the command synchronized successfully. Excellent.

I tested transfer speeds using both the rsync daemon and ssh method. There was a noticeable (8 MB/s) difference in transfer speeds. The rsync way is definitely faster.