[OpenIndiana-discuss] ZFS's vdev state transition

Ichiko Sakamoto i-sakamoto at pb.jp.nec.com
Mon Jul 23 06:05:07 UTC 2012


(2012/07/21 4:29), Richard Elling wrote:
> On Jul 20, 2012, at 12:01 PM, Bob Friesenhahn wrote:
> 
>> On Fri, 20 Jul 2012, Ichiko Sakamoto wrote:
>> 
>>> Hi, all
>>> 
>>> I have a disk that has many bad sectors.
>>> I created zpool with this disk and expected that
>>> zpool told me the disk has meny errors.
>>> But zpool told me everything was fine until I scrubbed the zpool.
>>> 
>>> Is this designed feature?
>> 
>> Zfs detects hardware-reported write failures, but can/does not detect read failures until it tries to read the data.  I have learned that zfs does periodically "taste" the data in a few locations as part of normal operation (to detect disk errors) but it tries to read from the disk as seldom as possible since doing so would hinder performance.
> 
> Write errors are also detected. In the fmdump output we see a fatal write due to
> media error. ZFS can and does work around this by re-allocating the write, but
> it should be ticked in the write errors column.
> 


In old version like OpenSolaris 2009.06,
when leaf disk vdev's ZIO results in EIO, WRITE error
is counted and the error is reported to FMA within that ZIO's zio_done().

In latest version, error seems to be ignored.

zio_done() zio->io_vd: disk's vdev, zio->io_error: EIO
 +- vdev_stat_update()
 |   | (ZIO without ZIO_FLAG_IO_RETRY flag ignores error)
 |   | http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/vdev.c?r=13700%3A2889e2596bd6#2598
 +- zfs_ereport_post()
     +- zfs_ereport_start()
         | (ZIO without ZIO_FLAG_IO_RETRY flag ignores error)
         | http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zfs_fm.c?r=13574%3Ad0fde6cacaac#148

In my test case, only one column in raidz was error and parent raidz ZIO succeeded.
So ZIO with ZIO_FLAG_IO_RETRY flag was not re-issued.
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zio.c?r=13700%3A2889e2596bd6#2458

I'm sorry if I misunderstand the code.



Here's debug D script and result.

test.d
-----
#!/usr/sbin/dtrace -Cqs

#define printzio(zio) \
        printf(" ZIO = %p\n", zio); \
        printf("  io_error = 0x%x\n", zio->io_error); \
        printf("  io_flags = 0x%x\n", zio->io_flags); \
        printf("  io_type = %d\n", zio->io_type); \
        printf("  io_offset = 0x%x\n", zio->io_offset); \
        printf("  io_vd = %p %s\n", zio->io_vd, \
                zio->io_vd ? (string)zio->io_vd->vdev_path : ""); \
        this->vs = (vdev_stat_t *)&(zio->io_vd->vdev_stat); \
        printf("  errors read=%d write=%d csum=%d\n", \
                this->vs->vs_read_errors, \
                this->vs->vs_write_errors, \
                this->vs->vs_checksum_errors)

BEGIN
{
        printf("%Y START\n", walltimestamp);
}

fbt:zfs:zio_done:entry
/((zio_t *)arg0)->io_error/
{
        self->zio1 = (zio_t *)arg0;
        printf("\n%Y %s:%s\n", walltimestamp, probefunc, probename);
        printzio(self->zio1);
        printf("  STACK");
        stack();
}

fbt:zfs:zio_done:return
/self->zio1/
{
        printf("%Y %s:%s\n", walltimestamp, probefunc, probename);
        printzio(self->zio1);
        self->zio1 = 0;
        exit(0);
}

fbt:zfs:zio_vdev_io_assess:entry
/((zio_t *)arg0)->io_error/
{
        self->zio2 = (zio_t *)arg0;
        printf("\n%Y %s:%s\n", walltimestamp, probefunc, probename);
        printzio(self->zio2);
        printf("  STACK");
        stack();
}

fbt:zfs:zio_vdev_io_assess:return
/self->zio2/
{
        printf("%Y %s:%s\n", walltimestamp, probefunc, probename);
        printzio(self->zio2);
        self->zio2 = 0;
}

END
{
        printf("%Y STOP\n", walltimestamp);
}
-----

Result while I wrote a large file.
-----
# ./test.d
2012 Jul 23 14:21:20 START

2012 Jul 23 15:01:17 zio_vdev_io_assess:entry
 ZIO = ffffff19cb2634c0
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0xa898fb600
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0
  STACK
              zfs`zio_execute+0x8d
              genunix`taskq_thread+0x285
              unix`thread_start+0x8
2012 Jul 23 15:01:17 zio_vdev_io_assess:return
 ZIO = ffffff19cb2634c0
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0xa898fb600
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0

2012 Jul 23 15:01:17 zio_vdev_io_assess:entry
 ZIO = ffffff19d795aea0
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0xb0004b7600
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0
  STACK
              zfs`zio_execute+0x8d
              genunix`taskq_thread+0x285
              unix`thread_start+0x8
2012 Jul 23 15:01:17 zio_vdev_io_assess:return
 ZIO = ffffff19d795aea0
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0xb0004b7600
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0

2012 Jul 23 15:01:18 zio_vdev_io_assess:entry
 ZIO = ffffff19d795aea0
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0xb0004b7600
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0
  STACK
              zfs`zio_execute+0x8d
              zfs`zio_notify_parent+0xa6
              zfs`zio_done+0x3d6
              zfs`zio_execute+0x8d
              zfs`zio_notify_parent+0xa6
              zfs`zio_done+0x3d6
              zfs`zio_execute+0x8d
              genunix`taskq_thread+0x285
              unix`thread_start+0x8
2012 Jul 23 15:01:18 zio_vdev_io_assess:return
 ZIO = ffffff19d795aea0
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0xb0004b7600
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0

2012 Jul 23 15:01:18 zio_done:entry
 ZIO = ffffff19d795aea0
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0xb0004b7600
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0
  STACK
              zfs`zio_execute+0x8d
              zfs`zio_notify_parent+0xa6
              zfs`zio_done+0x3d6
              zfs`zio_execute+0x8d
              zfs`zio_notify_parent+0xa6
              zfs`zio_done+0x3d6
              zfs`zio_execute+0x8d
              genunix`taskq_thread+0x285
              unix`thread_start+0x8
2012 Jul 23 15:01:18 zio_done:return
 ZIO = ffffff19d795aea0
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0xb0004b7600
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0

2012 Jul 23 15:01:17 zio_vdev_io_assess:entry
 ZIO = ffffff19ca46f130
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0xa898fae00
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0
  STACK
              zfs`zio_execute+0x8d
              genunix`taskq_thread+0x285
              unix`thread_start+0x8
2012 Jul 23 15:01:17 zio_vdev_io_assess:return
 ZIO = ffffff19ca46f130
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0xa898fae00
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0

2012 Jul 23 15:01:17 zio_vdev_io_assess:entry
 ZIO = ffffff19c7966c68
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0x60009cae00
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0
  STACK
              zfs`zio_execute+0x8d
              genunix`taskq_thread+0x285
              unix`thread_start+0x8
2012 Jul 23 15:01:17 zio_vdev_io_assess:return
 ZIO = ffffff19c7966c68
  io_error = 0x5
  io_flags = 0x60440
  io_type = 2
  io_offset = 0x60009cae00
  io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
  errors read=0 write=0 csum=0
2012 Jul 23 15:01:18 STOP
-----

ZIO_FLAG_IO_RETRY flag was not set after zio_vdev_io_assess() and
error was not counted after zio_done().


Thanks,
Ichiko



>> If the disk continually reports that all writes are fine then zfs might not discover wrong data for a long time, or until 'scrub'.
> 
> Correct. Again, this case is interesting because the reads are counted as 
> checksum errors, but not read errors. But until we know what version of the
> OS is being used, we can't debug any further.
>  -- richard



More information about the OpenIndiana-discuss mailing list