[OpenIndiana-discuss] ZFS's vdev state transition
Ichiko Sakamoto
i-sakamoto at pb.jp.nec.com
Mon Jul 23 06:05:07 UTC 2012
(2012/07/21 4:29), Richard Elling wrote:
> On Jul 20, 2012, at 12:01 PM, Bob Friesenhahn wrote:
>
>> On Fri, 20 Jul 2012, Ichiko Sakamoto wrote:
>>
>>> Hi, all
>>>
>>> I have a disk that has many bad sectors.
>>> I created zpool with this disk and expected that
>>> zpool told me the disk has meny errors.
>>> But zpool told me everything was fine until I scrubbed the zpool.
>>>
>>> Is this designed feature?
>>
>> Zfs detects hardware-reported write failures, but can/does not detect read failures until it tries to read the data. I have learned that zfs does periodically "taste" the data in a few locations as part of normal operation (to detect disk errors) but it tries to read from the disk as seldom as possible since doing so would hinder performance.
>
> Write errors are also detected. In the fmdump output we see a fatal write due to
> media error. ZFS can and does work around this by re-allocating the write, but
> it should be ticked in the write errors column.
>
In old version like OpenSolaris 2009.06,
when leaf disk vdev's ZIO results in EIO, WRITE error
is counted and the error is reported to FMA within that ZIO's zio_done().
In latest version, error seems to be ignored.
zio_done() zio->io_vd: disk's vdev, zio->io_error: EIO
+- vdev_stat_update()
| | (ZIO without ZIO_FLAG_IO_RETRY flag ignores error)
| | http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/vdev.c?r=13700%3A2889e2596bd6#2598
+- zfs_ereport_post()
+- zfs_ereport_start()
| (ZIO without ZIO_FLAG_IO_RETRY flag ignores error)
| http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zfs_fm.c?r=13574%3Ad0fde6cacaac#148
In my test case, only one column in raidz was error and parent raidz ZIO succeeded.
So ZIO with ZIO_FLAG_IO_RETRY flag was not re-issued.
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zio.c?r=13700%3A2889e2596bd6#2458
I'm sorry if I misunderstand the code.
Here's debug D script and result.
test.d
-----
#!/usr/sbin/dtrace -Cqs
#define printzio(zio) \
printf(" ZIO = %p\n", zio); \
printf(" io_error = 0x%x\n", zio->io_error); \
printf(" io_flags = 0x%x\n", zio->io_flags); \
printf(" io_type = %d\n", zio->io_type); \
printf(" io_offset = 0x%x\n", zio->io_offset); \
printf(" io_vd = %p %s\n", zio->io_vd, \
zio->io_vd ? (string)zio->io_vd->vdev_path : ""); \
this->vs = (vdev_stat_t *)&(zio->io_vd->vdev_stat); \
printf(" errors read=%d write=%d csum=%d\n", \
this->vs->vs_read_errors, \
this->vs->vs_write_errors, \
this->vs->vs_checksum_errors)
BEGIN
{
printf("%Y START\n", walltimestamp);
}
fbt:zfs:zio_done:entry
/((zio_t *)arg0)->io_error/
{
self->zio1 = (zio_t *)arg0;
printf("\n%Y %s:%s\n", walltimestamp, probefunc, probename);
printzio(self->zio1);
printf(" STACK");
stack();
}
fbt:zfs:zio_done:return
/self->zio1/
{
printf("%Y %s:%s\n", walltimestamp, probefunc, probename);
printzio(self->zio1);
self->zio1 = 0;
exit(0);
}
fbt:zfs:zio_vdev_io_assess:entry
/((zio_t *)arg0)->io_error/
{
self->zio2 = (zio_t *)arg0;
printf("\n%Y %s:%s\n", walltimestamp, probefunc, probename);
printzio(self->zio2);
printf(" STACK");
stack();
}
fbt:zfs:zio_vdev_io_assess:return
/self->zio2/
{
printf("%Y %s:%s\n", walltimestamp, probefunc, probename);
printzio(self->zio2);
self->zio2 = 0;
}
END
{
printf("%Y STOP\n", walltimestamp);
}
-----
Result while I wrote a large file.
-----
# ./test.d
2012 Jul 23 14:21:20 START
2012 Jul 23 15:01:17 zio_vdev_io_assess:entry
ZIO = ffffff19cb2634c0
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0xa898fb600
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
STACK
zfs`zio_execute+0x8d
genunix`taskq_thread+0x285
unix`thread_start+0x8
2012 Jul 23 15:01:17 zio_vdev_io_assess:return
ZIO = ffffff19cb2634c0
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0xa898fb600
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
2012 Jul 23 15:01:17 zio_vdev_io_assess:entry
ZIO = ffffff19d795aea0
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0xb0004b7600
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
STACK
zfs`zio_execute+0x8d
genunix`taskq_thread+0x285
unix`thread_start+0x8
2012 Jul 23 15:01:17 zio_vdev_io_assess:return
ZIO = ffffff19d795aea0
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0xb0004b7600
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
2012 Jul 23 15:01:18 zio_vdev_io_assess:entry
ZIO = ffffff19d795aea0
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0xb0004b7600
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
STACK
zfs`zio_execute+0x8d
zfs`zio_notify_parent+0xa6
zfs`zio_done+0x3d6
zfs`zio_execute+0x8d
zfs`zio_notify_parent+0xa6
zfs`zio_done+0x3d6
zfs`zio_execute+0x8d
genunix`taskq_thread+0x285
unix`thread_start+0x8
2012 Jul 23 15:01:18 zio_vdev_io_assess:return
ZIO = ffffff19d795aea0
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0xb0004b7600
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
2012 Jul 23 15:01:18 zio_done:entry
ZIO = ffffff19d795aea0
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0xb0004b7600
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
STACK
zfs`zio_execute+0x8d
zfs`zio_notify_parent+0xa6
zfs`zio_done+0x3d6
zfs`zio_execute+0x8d
zfs`zio_notify_parent+0xa6
zfs`zio_done+0x3d6
zfs`zio_execute+0x8d
genunix`taskq_thread+0x285
unix`thread_start+0x8
2012 Jul 23 15:01:18 zio_done:return
ZIO = ffffff19d795aea0
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0xb0004b7600
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
2012 Jul 23 15:01:17 zio_vdev_io_assess:entry
ZIO = ffffff19ca46f130
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0xa898fae00
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
STACK
zfs`zio_execute+0x8d
genunix`taskq_thread+0x285
unix`thread_start+0x8
2012 Jul 23 15:01:17 zio_vdev_io_assess:return
ZIO = ffffff19ca46f130
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0xa898fae00
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
2012 Jul 23 15:01:17 zio_vdev_io_assess:entry
ZIO = ffffff19c7966c68
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0x60009cae00
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
STACK
zfs`zio_execute+0x8d
genunix`taskq_thread+0x285
unix`thread_start+0x8
2012 Jul 23 15:01:17 zio_vdev_io_assess:return
ZIO = ffffff19c7966c68
io_error = 0x5
io_flags = 0x60440
io_type = 2
io_offset = 0x60009cae00
io_vd = ffffff19bee0d080 /dev/dsk/c2t5d0s0
errors read=0 write=0 csum=0
2012 Jul 23 15:01:18 STOP
-----
ZIO_FLAG_IO_RETRY flag was not set after zio_vdev_io_assess() and
error was not counted after zio_done().
Thanks,
Ichiko
>> If the disk continually reports that all writes are fine then zfs might not discover wrong data for a long time, or until 'scrub'.
>
> Correct. Again, this case is interesting because the reads are counted as
> checksum errors, but not read errors. But until we know what version of the
> OS is being used, we can't debug any further.
> -- richard
More information about the OpenIndiana-discuss
mailing list