[OpenIndiana-discuss] Huge ZFS root pool slowdown - diagnose root cause?
jason matthews
jason at broken.net
Tue Dec 11 17:27:50 UTC 2018
This is your offending device:
$ pfexec smartctl -a -d sat,12 /dev/rdsk/c2t0d0s0 | grep Raw_Read
1 Raw_Read_Error_Rate 0x000b 094 094 016 Pre-fail Always - 1376259
Try removing this disk.
The boot manager is in your bios. It currently points to one of your
rpool disks. Go into the boot manager and pick the other disk and see
how it boots then. You can either set this up as a one time boot or
change the setting so it is persistant.
Life should be better with the sick disk removed.
j.
On 12/11/18 8:16 AM, Lou Picciano wrote:
>
> I have now, finally) managed to get perhaps the key bit of reporting from smartctl - does this seem adequately diagnostic?:
> (I am fully satisfied to replace the drive; I just want to be sure I’ve run to ground any potential root causes.)
>
> $ pfexec smartctl -a -d sat,12 /dev/rdsk/c2t0d0s0 | grep Raw_Read
> 1 Raw_Read_Error_Rate 0x000b 094 094 016 Pre-fail Always - 1376259
> $ pfexec smartctl -a -d sat,12 /dev/rdsk/c2t1d0s0 | grep Raw_Read
> 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
>
> Above seems consistent with all the read errors I see at boot.
>
> What happens if you go into the boot manager and manually select a boot disk? If the problem is with a single drive, then the other drive should boot normally right? Try booting from both drives select each one manually.
> That’s also interesting. With the hundreds of read errors at boot up, the boot manager is never even (visibly) presented. I guess I could try this again from a boot from USB image...
>> you can speed up the scrub with:
>>
>> echo zfs_scrub_delay/W0x0 |mdb -kw
>>
>> echo zfs_scan_min_time_ms/W0x0
> Good commands for reference. I was unaware of these! But, even with scrub canceled for the moment, am still seeing virtually continuous drive controller traffic.
>
> You also wanted to see:
> $ iostat -nMxC 5
> extended device statistics
> r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
> 0.0 962.3 0.0 11.3 15.7 0.2 16.3 0.2 5 23 c2
> 0.0 398.4 0.0 4.3 7.1 0.1 17.9 0.2 83 6 c2t0d0
> 0.0 415.2 0.0 4.2 8.6 0.1 20.6 0.2 87 9 c2t1d0
> 0.0 40.2 0.0 0.7 0.0 0.0 0.0 0.4 0 2 c2t2d0
> 0.0 40.4 0.0 0.7 0.0 0.0 0.0 1.1 0 4 c2t3d0
> 0.0 34.4 0.0 0.7 0.0 0.0 0.0 0.3 0 1 c2t4d0
> 0.0 33.6 0.0 0.7 0.0 0.0 0.0 0.3 0 1 c2t5d0
>
> Again, I assume the symmetry in findings between t0 and t1 is due to their mirrored status… But doesn’t seem to help in differentiating offending device. (For comparison, t2-t5 are the data pool.) There is essential zero ‘user’ activity on either data or root pools...
> _______________________________________________
> openindiana-discuss mailing list
> openindiana-discuss at openindiana.org
> https://openindiana.org/mailman/listinfo/openindiana-discuss
More information about the openindiana-discuss
mailing list