[OpenIndiana-discuss] Huge ZFS root pool slowdown - diagnose root cause?

Tue Dec 11 17:27:50 UTC 2018

This is your offending device:

$ pfexec smartctl -a -d sat,12 /dev/rdsk/c2t0d0s0 | grep Raw_Read
   1 Raw_Read_Error_Rate     0x000b   094   094   016    Pre-fail  Always       -       1376259

Try removing this disk.

The boot manager is in your bios. It currently points to one of your 
rpool disks. Go into the boot manager and pick the other disk and see 
how it boots then. You can either set this up as a one time boot or 
change the setting so it is persistant.

Life should be better with the sick disk removed.

j.

On 12/11/18 8:16 AM, Lou Picciano wrote:
>
> I have now, finally) managed to get perhaps the key bit of reporting from smartctl - does this seem adequately diagnostic?:
> (I am fully satisfied to replace the drive; I just want to be sure I’ve run to ground any potential root causes.)
>
> $ pfexec smartctl -a -d sat,12 /dev/rdsk/c2t0d0s0 | grep Raw_Read
>    1 Raw_Read_Error_Rate     0x000b   094   094   016    Pre-fail  Always       -       1376259
> $ pfexec smartctl -a -d sat,12 /dev/rdsk/c2t1d0s0 | grep Raw_Read
>    1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
>
> Above seems consistent with all the read errors I see at boot.
>
> What happens if you go into the boot manager and manually select a boot disk? If the problem is with a single drive, then the other drive should boot normally right? Try booting from both drives select each one manually.
> That’s also interesting. With the hundreds of read errors at boot up, the boot manager is never even (visibly) presented. I guess I could try this again from a boot from USB image...
>> you can speed up the scrub with:
>>
>> echo zfs_scrub_delay/W0x0 |mdb -kw
>>
>> echo zfs_scan_min_time_ms/W0x0
> Good commands for reference. I was unaware of these! But, even with scrub canceled for the moment, am still seeing virtually continuous drive controller traffic.
>
> You also wanted to see:
> $ iostat -nMxC 5
>                      extended device statistics
>      r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>      0.0  962.3    0.0   11.3 15.7  0.2   16.3    0.2   5  23 c2
>      0.0  398.4    0.0    4.3  7.1  0.1   17.9    0.2  83   6 c2t0d0
>      0.0  415.2    0.0    4.2  8.6  0.1   20.6    0.2  87   9 c2t1d0
>      0.0   40.2    0.0    0.7  0.0  0.0    0.0    0.4   0   2 c2t2d0
>      0.0   40.4    0.0    0.7  0.0  0.0    0.0    1.1   0   4 c2t3d0
>      0.0   34.4    0.0    0.7  0.0  0.0    0.0    0.3   0   1 c2t4d0
>      0.0   33.6    0.0    0.7  0.0  0.0    0.0    0.3   0   1 c2t5d0
>
> Again, I assume the symmetry in findings between t0 and t1 is due to their mirrored status… But doesn’t seem to help in differentiating offending device. (For comparison, t2-t5 are the data pool.) There is essential zero ‘user’ activity on either data or root pools...
> _______________________________________________
> openindiana-discuss mailing list
> openindiana-discuss at openindiana.org
> https://openindiana.org/mailman/listinfo/openindiana-discuss