Sending ZFS file systems as snapshots to a remote system.
The last post on ZFS was epically long. This one will be a lot more concise. The usual disclaimer applies. Use at your own risk. This post applies to ZFS on Ubuntu 18.04 and similar.
One of the nice and useful features of ZFS is that you can send entire filesystems from one pool to another. ZFS doesn't care whether this happens by file and carrier pigeon or over a network connection. For this article we will just focus on the network option.
Why would you want to send a file system? Backup is one option, data migration another.
So how does the process work? You take a snapshot of the filesystem, you then send the snapshot to the new pool where it becomes the basis of a new instance of this file system. After that you can send just the incremental delta between the first and the second snapshot to keep the second filesystem update more effectively.
Now let's do an example:
# Take the snapshotzfs snapshot /pool/filesytem@firstsnap# List the snapshotzfs list -t snapshot
Note that snapshots can be done recursively as well as for ZFS hosted block devices (zvol), something useful if you want to give the VMS hosted on a hypervisor all the benefits of ZFS (bit rot protection, inline compression, etc.) without the VM guest actually needing to know anything about ZFS.
Let's get PV (pipe view) so we can see the progress of our filesystem send.
sudo apt-get install pv
Let's send our first snapshot to the remote system.
zfs send /pool/filesytem@firstsnap | pv | ssh user@myserver.domain.net zfs recv -v /pool/filesytem
The -v flat is for verbose but obviously there are a host of other flags available. Read the man pages to earn your black-belt ;) - your user at myserver.domain.net needs to have zfs volume creation privileges.
You should also notice PV giving you a nice data count and animation during the send process.
So let's take another snapshot. Then send the incremental changes to our remote system.
zfs snapshot /pool/filesytem@secondsnapzfs send -i /pool/filesytem@firstsnap /pool/filesytem@secondsnap | pv | ssh user@myserver.domain.net zfs recv -v /pool/filesytem
This should take considerably less time as you are only sending the delta between first and second snapshot. Note there are several options available for recv, including forcing rollbacks of snapshots on the receiving filesystem and specifying different destination pool names etc. Best to take a look at the man pages to see what meets your needs.
So what's the benefit? "I've been using rsync for years and it works just fine...." Okay granddad, some of the benefits to me are the ultra fast live file system snapshot capabilities that allow you to freeze the file system at a specific moment in time, you can then back this up to a remote location while normal operation continues. Unlike with rsync you don't end up with a range of file states along the timeline as the sync process works its way through the directory tree.
Secondly, and that's probably the biggest benefit to me in terms of virtualization, I can migrate zvols containing native guest file systems between data centers with ease and speed regardless of middle ware type. If done right, with converging snapshots, this allows for migrating state coherent workloads between continents in seconds without any awareness from the underlying guest operating system. (Particularly useful when a hurricane is converging on your primary site...)
Alright, this is really just a taster, so if this has piqued your interest. I strongly suggest reading the man pages, as ZFS both in terms of command syntax and documentation is a master class in good usability IMHO.
So, till next time, good luck!