[oi-dev] SSD-based pools
Udo Grabowski (IMK)
udo.grabowski at kit.edu
Fri Sep 26 05:11:17 UTC 2014
On 26/09/2014 00:46, Andrew M. Hettinger wrote:
> I'm presently running tests on a pool using 3x Samsung 850 SSDs on a LSI-9211-8i
> (IT) contoller. I thought I'd try seperating the intent log to see if lowering
> the write amplification on the pool-drives would help, so I added another
> matching SSD for that, but under load I still seem to get extensive checksum
> errors. Does anyone have any ideas as to what would be causing this?
>
> pool: test-array
> state: DEGRADED
> status: One or more devices has experienced an unrecoverable error. An
> attempt was made to correct the error. Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using 'zpool clear' or replace the device with 'zpool replace'.
> see: http://illumos.org/msg/ZFS-8000-9P
> scan: scrub repaired 0 in 0h0m with 0 errors on Wed Sep 24 18:20:56 2014
> config:
>
> NAME STATE READ WRITE CKSUM
> test-array DEGRADED 0 0 0
> mirror-0 DEGRADED 0 0 0
> c0t50025388700060D4d0 DEGRADED 0 0 155 too many errors
> c0t50025388700060AEd0 DEGRADED 0 0 149 too many errors
> c0t50025388700060C2d0 DEGRADED 0 0 174 too many errors
> logs
> c0t50025388A067DBE9d0 ONLINE 0 0 0
>
> errors: No known data errors
> ---- errors ---
> s/w h/w trn tot device
> 0 2 6 8 c0t50025388700060D4d0
> 0 0 0 0 c0t50025388700060AEd0
> 0 0 0 0 c0t50025388700060C2d0
> 0 0 0 0 c0t50025388A067DBE9d0
Transport errors could be bad cabling. 850's are very new,
so I also wouldn't exclude firmware problems. But it could
also be that you again see an instance of a mysterious possible
bug when scrubbing mirrors. I myself have an SSD rpool
(Supertalent SataII), and nearly always get these errors
(though in the range of 10, not 150) when scrubbing this pool
since day 1, regardless of firmware. I never get them
with ordinary disks, so maybe the speed is a factor
to trigger this problem. I tried to hunt that down, but
it's really difficult if you're not a kernel developer....
I have the suspicion that maybe an overlapping interrupt
with with the USB system plays a role here, but that's
just a speculation. I've seen a similar post for rpool mirrors
a couple of years ago, and this also led to no conclusion.
Maybe the number of errors you see opens a better opportunity
to hunt down the problem source with some clever dtrace
scripts, but for that I would recommend to switch over
to the illumos-developer list, where those experts are
lurking.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5285 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://openindiana.org/pipermail/oi-dev/attachments/20140926/a74727e0/attachment-0005.bin>
More information about the oi-dev
mailing list