[OpenIndiana-discuss] Useful tidbit for ZFS backup via ZFS Send

Sat Sep 29 17:15:53 UTC 2012

> From: Bryan N Iotti [mailto:ironsides.medvet at gmail.com]
> 
> - zfs send -R rpool@<DATE> | gzip > rpool.COMPLETE.<DATE>.gz
> 
> ... as per Oracle manual.
> 
> I was wondering why it was so slow, taking a couple of hours, then I
> payed attention to my CPU meter and understood that the normal gzip was
> running as a single thread.
> 
> I searched online for a multithreaded version and came across pigz
> (http://www.zlib.net/pigz/). I downloaded it, modified the Makefile to
> use the Solaris Studio 12.3 cc with the proper CFLAGS ("-fast", for now,
> no -m64). I then copied both pigz and unpigz to a directory in my PATH
> and modified the last command to:
> - zfs send -R rpool@<DATE> | /usr/local/bin/pigz >
> rpool.COMPLETE.<DATE>.gz

Before anything else, you shouldn't be storing your zfs send in a file.  Because of two major reasons:  If you need to do a restore, then a single bit corruption destroys the whole set.  And your only option would be a complete filesystem restore, no per-file granularity.

Instead, you should zfs send | zfs receive every single time.  If you're in the unfortunate situation of receiving onto a system that doesn't support zfs, you can instead, create a loopback file container on the receiving system, and create a zfs filesystem inside it, so you can zfs receive onto the destination filesystem.  (I don't think you even need to use lofiadm, the official loopback device tool - I think zfs can use a file directly, as if it were a device.)  Check the man page, but I think the zpool -d option allows you to specify a file to use as if it were a device.

Now, a few words on compression and threading.

Gzip and pigz are based on zlib.  So ... If you check your performance using the default compression level, or add the --fast command-line argument, I think you'll find the performance is significantly better with --fast, while the compression is not significantly worse.  This is true for both the single threaded, and the pigz implementations.  If you're going to continue using gzip and/or pigz, try using --fast in every situation, and I think your life will become slightly better.

But there are lots of different compression algorithms.  
* lzop is based on lzo, which is extremely fast but not as powerful as zlib.  Unfortunately, no known parallel implementation.
* If you enable compression on a zfs filesystem, by default it's using lzjb, which is very similar in performance to lzo.  I do this for nearly every zfs filesystem anywhere; generally speaking, the filesystem is both faster and smaller with this enabled, as compared to using no compression.
* zlib is often considered "default" just because it's common, and most people don't think much abuot it.  Used in zip files and gz files.   Call it "medium" in terms of compression and speed characteristics.
* bzip2 is much slower but somewhat stronger than zlib. There's a parallel implementation called pbzip2.  IMHO, there's no situation where this is the best option, because it's slower than either zlib or lzma, but it doesn't compress as well as lzma.
* xz is based on lzma.  Last I knew, they were still working out the bugs of the parallel implementation, but maybe by now it's ready.  If you use the --fast option, it generally compresses almost as fast as gzip --fast, and it compresses much better than either gzip or bzip2.

At some point, I got tired of waiting for all the parallel implementations of all these things.  And even when products like pigz got released, I was disappointed at the cross-platform compatibility, and in some cases, the threading model.  So I wrote python parallel threaded compression, threadzip.  http://code.google.com/p/threadzip  Who knows, it might even be useful for you.