[OpenIndiana-discuss] vdev reliability was: Recommendations for fast storage

Edward Ned Harvey (openindiana) openindiana at nedharvey.com
Fri Apr 19 00:37:15 UTC 2013


> From: Timothy Coalson [mailto:tsc5yc at mst.edu]
> 
> As for what I said about resilver speed, I had not accounted for the fact
> that data reads on a raid-z2 component device would be significantly
> shorter than for the same data on 2-way mirrors.  Depending on whether
> you
> are using enormous block sizes, or whether your data is allocated extremely
> linearly in the way scrub/resilver reads it, this could be the limiting
> factor on platter drives due to seek times, and make raid-z2 take much
> longer to resilver.  I fear I was thinking of raid-z2 in terms of raid6.

I'm not sure if you misunderstand something, or if I misunderstand what you're saying, but ...

Even if you are using enormous block sizes, it's actually just enormous *max* block sizes.  If you write a 1 byte file (very slowly such that no write accumulation can occur) then ZFS only writes a 1 byte file, into a block.  So the enormous block sizes only come into play when you're writing large amounts of data ...  And when you're writing large amounts of data, you're likely to simply span multiple sequential blocks anyway.  So all-in-all, the blocksize is rarely very important.  There are some situations where it matters, but ...  All this is a tangent.

The real thing I'm addressing here, is, you said scrub/resilver progresses extremely linearly.  This is unfortunately, about as wrong as it can be.  In actuality, scrub / resilver proceed in approximately temporal order, which in the typical situation of a long-time server with frequent creation & destruction of snapshots, results in approximately random disk order.

Here's the evidence I observed:  I had a ZFS server running in production for about 2 years, and a disk failed.  I had measured previously, on this server, each disk sustains 1 Gbit/sec sequentially.  With 1T disks, linearly resilvering the entire disk including empty space, it should take about 2 hrs to resilver.  But ZFS doesn't resilver the whole disk; it only resilvers used space.  This would be great, if your pool is mostly empty, or if it was disk linearly ordered.  But it actually took 12 hours to resilver that disk.  I went to zfs-discuss and discussed.  Learned about the temporal ordering.  Got my explanation how resilvering just the used portions could take several times longer than resilvering the whole disk.






More information about the OpenIndiana-discuss mailing list