[OpenIndiana-discuss] Recommendations for fast storage

Wed Apr 17 12:38:20 UTC 2013

> From: Sašo Kiselkov [mailto:skiselkov.ml at gmail.com]
> 
> Raid-Z indeed does stripe data across all
> leaf vdevs (minus parity) and does so by splitting the logical block up
> into equally sized portions. 

Jay, there you have it.  You asked why use mirrors, and you said you would use raidz2 or raidz3 unless cpu overhead is too much.  I recommended using mirrors and avoiding raidzN, and here is the answer why.

If you have 16 disks arranged in 8x mirrors, versus 10 disks in raidz2 which stripes across 8 disks plus 2 parity disks, then the serial write of each configuration is about the same; that is, 8x the sustained write speed of a single device.  But if you have two or more parallel sequential read threads, then the sequential read speed of the mirrors will be 16x while the raidz2 is only 8x.  The mirror configuration can do 8x random write while the raidz2 is only 1x.  And the mirror can do 16x random read while the raidz2 is only 1x.

In the case you care about the least, they're equal.  In the case you care about most, the mirror configuration is 16x faster.

You also said the raidz2 will offer more protection against failure, because you can survive any two disk failures (but no more.)  I would argue this is incorrect (I've done the probability analysis before).  Mostly because the resilver time in the mirror configuration is 8x to 16x faster (there's 1/8 as much data to resilver, and IOPS is limited by a single disk, not the "worst" of several disks, which introduces another factor up to 2x, increasing the 8x as high as 16x), so the smaller resilver window means lower probability of "concurrent" failures on the critical vdev.  We're talking about 12 hours versus 1 week, actual result of my machines in production.  Also, while it's possible to fault the pool with only 2 failures in the mirror configuration, the probability is against that happening.  The first disk failure probability is 1/16 for each disk ... And then if you have a 2nd concurrent failure, there's a 14/15 probability that it occurs on a separately independent (safe) mirror.  The 3rd concurrent failure 12/14 chance of being safe.  The 4th concurrent failure 10/13 chance of being safe.  Etc.  The mirror configuration can probably withstand a higher number of failures, and also the resilver window for each failure is smaller.  When you look at the total probability of pool failure, they were both like 10^-17 or something like that.  In other words, we're splitting hairs but as long as we are, we might as well point out that they're both about the same.