[oi-dev] SSD-based pools

Udo Grabowski (IMK) udo.grabowski at kit.edu
Fri Sep 26 05:11:17 UTC 2014


On 26/09/2014 00:46, Andrew M. Hettinger wrote:
> I'm presently running tests on a pool using 3x Samsung 850 SSDs on a LSI-9211-8i
> (IT) contoller. I thought I'd try seperating the intent log to see if lowering
> the write amplification on the pool-drives would help, so I added another
> matching SSD for that, but under load I still seem to get extensive checksum
> errors. Does anyone have any ideas as to what would be causing this?
>
>     pool: test-array
>    state: DEGRADED
> status: One or more devices has experienced an unrecoverable error.  An
>           attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>           using 'zpool clear' or replace the device with 'zpool replace'.
>      see: http://illumos.org/msg/ZFS-8000-9P
>     scan: scrub repaired 0 in 0h0m with 0 errors on Wed Sep 24 18:20:56 2014
> config:
>
>           NAME                       STATE     READ WRITE CKSUM
>           test-array                 DEGRADED     0     0     0
>             mirror-0                 DEGRADED     0     0     0
>               c0t50025388700060D4d0  DEGRADED     0     0   155  too many errors
>               c0t50025388700060AEd0  DEGRADED     0     0   149  too many errors
>               c0t50025388700060C2d0  DEGRADED     0     0   174  too many errors
>           logs
>             c0t50025388A067DBE9d0    ONLINE       0     0     0
>
> errors: No known data errors
>     ---- errors ---
>     s/w h/w trn tot device
>       0   2   6   8 c0t50025388700060D4d0
>       0   0   0   0 c0t50025388700060AEd0
>       0   0   0   0 c0t50025388700060C2d0
>       0   0   0   0 c0t50025388A067DBE9d0

Transport errors could be bad cabling. 850's are very new,
so I also wouldn't exclude firmware problems. But it could
also be that you again see an instance of a mysterious possible
bug when scrubbing mirrors. I myself have an SSD rpool
(Supertalent SataII), and nearly always get these errors
(though in the range of 10, not 150) when scrubbing this pool
since day 1, regardless of firmware. I never get them
with ordinary disks, so maybe the speed is a factor
to trigger this problem. I tried to hunt that down, but
it's really difficult if you're not a kernel developer....
I have the suspicion that maybe an overlapping interrupt
with with the USB system plays a role here, but that's
just a speculation. I've seen a similar post for rpool mirrors
a couple of years ago, and this also led to no conclusion.
Maybe the number of errors you see opens a better opportunity
to hunt down the problem source with some clever dtrace
scripts, but for that I would recommend to switch over
to the illumos-developer list, where those experts are
lurking.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5285 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://openindiana.org/pipermail/oi-dev/attachments/20140926/a74727e0/attachment-0005.bin>


More information about the oi-dev mailing list