[OpenIndiana-discuss] Recommendations for fast storage

Sašo Kiselkov skiselkov.ml at gmail.com
Tue Apr 16 21:44:39 UTC 2013


On 04/16/2013 11:37 PM, Timothy Coalson wrote:
> On Tue, Apr 16, 2013 at 4:29 PM, Sašo Kiselkov <skiselkov.ml at gmail.com>wrote:
> 
>> If you are IOPS constrained, then yes, raid-zn will be slower, simply
>> because any read needs to hit all data drives in the stripe. This is
>> even worse on writes if the raidz has bad geometry (number of data
>> drives isn't a power of 2).
>>
> 
> Off topic slightly, but I have always wondered at this - what exactly
> causes non-power of 2 plus number of parities geometries to be slower, and
> by how much?  I tested for this effect with some consumer drives, comparing
> 8+2 and 10+2, and didn't see much of a penalty (though the only random test
> I did was read, our workload is highly sequential so it wasn't important).

Because a non-power-of-2 number of drives causes a read-modify-write
sequence on every (almost) write. HDDs are block devices and they can
only ever write in increments of their sector size (512 bytes or
nowadays often 4096 bytes). Using your example above, you divide a 128k
block by 8, you get 8x16k updates - all nicely aligned on 512 byte
boundaries, so your drives can write that in one go. If you divide by
10, you get an ugly 12.8k, which means if your drives are of the
512-byte sector variety, they write 24x 512 sectors and then for the
last partial sector write, they first need to fetch the sector from the
patter, modify if in memory and then write it out again.

I said "almost" every write is affected, but this largely depends on
your workload. If your writes are large async writes, then this RMW
cycle only happens at the end of the transaction commit (simplifying a
bit, but you get the idea), which is pretty small. However, if you are
doing many small updates in different locations (e.g. writing the ZIL),
this can significantly amplify the load.

Cheers,
--
Saso



More information about the OpenIndiana-discuss mailing list