[OpenIndiana-discuss] ZFS's vdev state transition

Fri Jul 20 15:21:07 UTC 2012

It may be prudent to consider a thermal hardware fault with the suspect
disk. In this case it is possible that the disk was at a lower temp at
the time the pool was defined to which it may have not exhibited a data
fault. 

Can you confirm the fault occurs when the disk is hot verses cold?

Regards,
Mike

-----Original Message-----
From: Ichiko Sakamoto [mailto:i-sakamoto at pb.jp.nec.com] 
Sent: Friday, July 20, 2012 4:29 AM
To: openindiana-discuss at openindiana.org
Subject: [OpenIndiana-discuss] ZFS's vdev state transition

Hi, all

I have a disk that has many bad sectors.
I created zpool with this disk and expected that
zpool told me the disk has meny errors.
But zpool told me everything was fine until I scrubbed the zpool.

Is this designed feature?

Here's my test result.

1. Create raidz1 zpool.

  # zpool create -f pool1 raidz c2t4d0 c2t5d0 c2t6d0

  c2t5d0 has many bad sectors.

2. Check status and scrub.

  # zpool status
    pool: pool1
   state: ONLINE
    scan: none requested
  config:

          NAME        STATE     READ WRITE CKSUM
          pool1       ONLINE       0     0     0
            raidz1-0  ONLINE       0     0     0
              c2t4d0  ONLINE       0     0     0
              c2t5d0  ONLINE       0     0     0
              c2t6d0  ONLINE       0     0     0

  errors: No known data errors
  # zpool scrub pool1
  # zpool status pool1
    pool: pool1
   state: ONLINE
    scan: scrub repaired 0 in 0h0m with 0 errors on Fri Jul 20 15:58:38
2012
  config:

          NAME        STATE     READ WRITE CKSUM
          pool1       ONLINE       0     0     0
            raidz1-0  ONLINE       0     0     0
              c2t4d0  ONLINE       0     0     0
              c2t5d0  ONLINE       0     0     0
              c2t6d0  ONLINE       0     0     0

  errors: No known data errors

3. OK, write a large file.

  # dd if=/dev/urandom of=/pool1/file1 bs=$(( 1024 * 1024 )) count=$((
1024 * 100 ))
  102400+0 records in
  102400+0 records out
  107374182400 bytes (107 GB) copied, 2070.16 s, 51.9 MB/s

4. Check status again.

  # zpool status
    pool: pool1
   state: ONLINE
    scan: scrub repaired 0 in 0h0m with 0 errors on Fri Jul 20 15:58:38
2012
  config:

          NAME        STATE     READ WRITE CKSUM
          pool1       ONLINE       0     0     0
            raidz1-0  ONLINE       0     0     0
              c2t4d0  ONLINE       0     0     0
              c2t5d0  ONLINE       0     0     0
              c2t6d0  ONLINE       0     0     0

  errors: No known data errors

No READ, WRITE errors.
But FMA recv many SCSI layer errors like following
----
  TIME                           CLASS
  Jul 20 2012 16:33:19.290564446 ereport.io.scsi.cmd.disk.dev.rqs.merr
  nvlist version: 0
          class = ereport.io.scsi.cmd.disk.dev.rqs.merr
          ena = 0x2227e44a13d01401
          detector = (embedded nvlist)
          nvlist version: 0
                  version = 0x0
                  scheme = dev
                  device-path =
/pci at 0,0/pci8086,340a at 3/pci1000,3080 at 0/sd at 5,0
                  devid = id1,sd at n5000c50026001d17
          (end detector)

          devid = id1,sd at n5000c50026001d17
          driver-assessment = fatal
          op-code = 0x2a
          cdb = 0x2a 0x0 0x5 0xc3 0x8c 0x8b 0x0 0x1 0x0 0x0
          pkt-reason = 0x0
          pkt-state = 0x3f
          pkt-stats = 0x0
          stat-code = 0x2
          key = 0x3
          asc = 0xc
          ascq = 0x0
          sense-data = 0xf0 0x0 0x3 0x5 0xc3 0x8c 0xde 0xa 0x0 0x0 0x0
0x0 0xc 0x0 0x1 0x80 0x0 0x0 0x0 0x0
          lba = 0x5c38c8b
          __ttl = 0x1
          __tod = 0x500909bf 0x1151a95e
----

5. Scrub the zpool to check errors.

  # zpool scrub pool1
  # zpool status pool1
    pool: pool1
   state: DEGRADED
  status: One or more devices has experienced an unrecoverable error.
An
          attempt was made to correct the error.  Applications are
unaffected.
  action: Determine if the device needs to be replaced, and clear the
errors
          using 'zpool clear' or replace the device with 'zpool
replace'.
     see: http://illumos.org/msg/ZFS-8000-9P
    scan: scrub repaired 8.57M in 0h13m with 0 errors on Fri Jul 20
16:52:28 2012
  config:

          NAME        STATE     READ WRITE CKSUM
          pool1       DEGRADED     0     0     0
            raidz1-0  DEGRADED     0     0     0
              c2t4d0  ONLINE       0     0     0
              c2t5d0  DEGRADED     0     0    21  too many errors
              c2t6d0  ONLINE       0     0     0

  errors: No known data errors

I expected WRITE error was counted and c2t5d0's state changed to FAULTED
when I wrote the file.

Thanks,
Ichiko

_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss at openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss