[OpenIndiana-discuss] Zpool 4k sectors - zfs send/recv

Tue May 24 23:14:28 UTC 2011

I'm about to expand my storage from 2 drives in a mirror to 4 drives (2 x mirrors concatenated). After this stage 3 of my 4 drives will be Advanced format drives, so I'm looking at the zpool with block-size 4096 (ashift=9).

I will have the opportunity here to format the new drives using the 4096 byte block size, by doing this:

(starting with zpool which is a mirror on drive1 and drive2)
1. zpool create newpool block-size 4096 mirror drive3 drive4
2. zfs send -R zpool/blah | zfs recv newpool/blah
3. zpool destroy zpool
4. zpool add newpool mirror drive1 drive2

Any gotchas here I need to know about?

This is going to result in all existing data on drive3/4 mirror vdev and a pretty empty drive1/2 mirror vdev. Are there any ways to get the data redistributed across the drives?

I'm running some tests with just one filesystem onto a test pool (file backing store) and I notice that the disk usage goes up significantly, by about 30%:

root at vault:~# zfs list zpool/projects/something
NAME                            USED  AVAIL  REFER  MOUNTPOINT
zpool/projects/something  49.8M  94.0G  28.8M  /projects/something
root at vault:~# zfs list ztest/projects/soundevolution
NAME                            USED  AVAIL  REFER  MOUNTPOINT
ztest/projects/something  65.3M  1.89G  38.0M  /ztest/projects/something

The used amount goes up from 28.8M to 38.0M, which is exactly as `du -h` reports. This file system has about 3000 files in it; not massive. I've compared the `du -h ` output from the two filesystems above and here's a few changes in file size:
1k	-> 8k
1k	-> 8k
3k	-> 17k
6k	-> 17k
7k	-> 25k
5k	-> 18k
17k	-> 57k
383k	-> 839k
7.3M	-> 7.5M
8k	-> 47k

etc..

As I expect, most of the difference is in the smaller files. I am surprised, though, at how much the files are increasing in size by - I would have expected an increase of up to 4k per file (and perhaps another 4k for metadata). But 3k to 17k? 5k to 18k? 17k to 57k?

Is this normal? Curiosity makes me ask why?

And if zfs is so much less efficient with larger block sizes, is the ashift=9 thing something that only database users for example (or other users of very large files) should even consider??

Thanks for sharing your thoughts.

Matt