[OpenIndiana-discuss] Recommendations for fast storage

Richard Elling richard.elling at richardelling.com
Tue Apr 16 22:08:31 UTC 2013


clarification below...

On Apr 16, 2013, at 2:44 PM, Sašo Kiselkov <skiselkov.ml at gmail.com> wrote:

> On 04/16/2013 11:37 PM, Timothy Coalson wrote:
>> On Tue, Apr 16, 2013 at 4:29 PM, Sašo Kiselkov <skiselkov.ml at gmail.com>wrote:
>> 
>>> If you are IOPS constrained, then yes, raid-zn will be slower, simply
>>> because any read needs to hit all data drives in the stripe. This is
>>> even worse on writes if the raidz has bad geometry (number of data
>>> drives isn't a power of 2).
>>> 
>> 
>> Off topic slightly, but I have always wondered at this - what exactly
>> causes non-power of 2 plus number of parities geometries to be slower, and
>> by how much?  I tested for this effect with some consumer drives, comparing
>> 8+2 and 10+2, and didn't see much of a penalty (though the only random test
>> I did was read, our workload is highly sequential so it wasn't important).

This makes sense, even for more random workloads.

> 
> Because a non-power-of-2 number of drives causes a read-modify-write
> sequence on every (almost) write. HDDs are block devices and they can
> only ever write in increments of their sector size (512 bytes or
> nowadays often 4096 bytes). Using your example above, you divide a 128k
> block by 8, you get 8x16k updates - all nicely aligned on 512 byte
> boundaries, so your drives can write that in one go. If you divide by
> 10, you get an ugly 12.8k, which means if your drives are of the
> 512-byte sector variety, they write 24x 512 sectors and then for the
> last partial sector write, they first need to fetch the sector from the
> patter, modify if in memory and then write it out again.

This is true for RAID-5/6, but it is not true for ZFS or raidz. Though it has been
a few years, I did a bunch of tests and found no correlation between the number
of disks in the set (within boundaries as described in the man page) and random
performance for raidz. This is not the case for RAID-5/6 where pathologically
bad performance is easy to create if you know the number of disks and stripe width.
 -- richard

> 
> I said "almost" every write is affected, but this largely depends on
> your workload. If your writes are large async writes, then this RMW
> cycle only happens at the end of the transaction commit (simplifying a
> bit, but you get the idea), which is pretty small. However, if you are
> doing many small updates in different locations (e.g. writing the ZIL),
> this can significantly amplify the load.
> 
> Cheers,
> --
> Saso
> 
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss

--

Richard.Elling at RichardElling.com
+1-760-896-4422





More information about the OpenIndiana-discuss mailing list