[OpenIndiana-discuss] Openindiana ZFS server crashes and reboots

Udo Grabowski (IMK) udo.grabowski at kit.edu
Fri Oct 12 18:00:20 UTC 2012


On 10/12/12 07:34 PM, Bentley, Dain wrote:
> Hello Udo, thanks for the reply.  Here is the text from fmdump -eV.
 > Is there anything I should be looking for?

So BOTH disks spit out ZFS checksum errors like a machine gun,
this can either be a controller/cable problem or a memory problem
(you don't have ECC memory with an ECC supporting processor, e.g.
Xeon ? Otherwise those errors would be reported by fmdump -e).
Or some problem in the OS, those checksum problems haunt me on
my home workstation (also no ECC) when scrubbing the rpool mirror,
although disks and cables are ok and no errors occur when not
scrubbing. But the sheer amount of errors does not look like
that symptom.

'zpool status -v' should show some degradation of the
mirror with the exact checksum count, and also if files or
metadata are affected, 'fmadm faulty' gives components
retired due to errors. If you have data corruption, this
could cause reboots on some occasion.

Hard to say how to hunt this down, other than
checking cables, memory seating, looking for newer HBA or
disk/BIOS firmware, torturing memory with advanced memory testers.
The fact that the machine does reboot let me doubt that this
is a pure ZFS/disk problem, I never saw that, the machine
eventually stalls when those errors hammer the machine
too hard, but it would not cause a reboot. Memory is always
the best bet, but it could also be a motherboard problem.
Maybe the vendor has some hardware checking tool on the
CD in the box or on his website ?

>
> TIME                           CLASS
> Oct 11 2012 06:06:59.370787527 ereport.fs.zfs.data
> nvlist version: 0
>          class = ereport.fs.zfs.data
>          ena = 0x9ebd5c95f9f00401
>          detector = (embedded nvlist)
>          nvlist version: 0
>                  version = 0x0
>                  scheme = zfs
>                  pool = 0x6a8f5a381b7e2f2c
>          (end detector)
>
>          pool = volume0
>          pool_guid = 0x6a8f5a381b7e2f2c
>          pool_context = 0
>          pool_failmode = wait
>          zio_err = 50
>          zio_objset = 0x28
>          zio_object = 0x1
>          zio_level = 0
>          zio_blkid = 0x6047da
>          __ttl = 0x1
>          __tod = 0x50769a43 0x1619c4c7
>
> Oct 11 2012 06:06:59.370787942 ereport.fs.zfs.checksum
> nvlist version: 0
>          class = ereport.fs.zfs.checksum
>          ena = 0x9ebd5c95f9f00401
>          detector = (embedded nvlist)
>          nvlist version: 0
>                  version = 0x0
>                  scheme = zfs
>                  pool = 0x6a8f5a381b7e2f2c
>                  vdev = 0xdd2eef656bfc1db5
>          (end detector)
>
>          pool = volume0
>          pool_guid = 0x6a8f5a381b7e2f2c
>          pool_context = 0
>          pool_failmode = wait
>          vdev_guid = 0xdd2eef656bfc1db5
>          vdev_type = disk
>          vdev_path = /dev/dsk/c3t0d0s0
>          vdev_devid = id1,sd at SATA_____HDS725050KLA360_______KRVN03ZAG1JY4D/a
>          parent_guid = 0xfd55a23a93069d20
>          parent_type = mirror
>          zio_err = 50
>          zio_offset = 0x14893b600
>          zio_size = 0x4600
>          zio_objset = 0x28
>          zio_object = 0x1
>          zio_level = 0
>          zio_blkid = 0x6047da
>          cksum_expected = 0x521c0ebafd7 0x2be41b8abfd1b3 0xfccf5694dcdbaa49 0x3e24575fd2f4aa4d
>          cksum_actual = 0x521c2ebafd7 0x2be436a2bfd1b3 0xfcd00e26f8dbaa49 0x4161c187aaf4aa4d
>          cksum_algorithm = fletcher4
>          __ttl = 0x1
>          __tod = 0x50769a43 0x1619c666
>
> Oct 11 2012 06:06:59.370788126 ereport.fs.zfs.checksum
> nvlist version: 0
>          class = ereport.fs.zfs.checksum
>          ena = 0x9ebd5c95f9f00401
>          detector = (embedded nvlist)
>          nvlist version: 0
>                  version = 0x0
>                  scheme = zfs
>                  pool = 0x6a8f5a381b7e2f2c
>                  vdev = 0x377ede8d0fb06f7e
>          (end detector)
>
>          pool = volume0
>          pool_guid = 0x6a8f5a381b7e2f2c
>          pool_context = 0
>          pool_failmode = wait
>          vdev_guid = 0x377ede8d0fb06f7e
>          vdev_type = disk
>          vdev_path = /dev/dsk/c3t3d0s0
>          vdev_devid = id1,sd at SATA_____HDS725050KLA360_______KRVN03ZAG39LKD/a
>          parent_guid = 0xfd55a23a93069d20
>          parent_type = mirror
>          zio_err = 50
>          zio_offset = 0x14893b600
>          zio_size = 0x4600
>          zio_objset = 0x28
>          zio_object = 0x1
>          zio_level = 0
>          zio_blkid = 0x6047da
>          cksum_expected = 0x521c0ebafd7 0x2be41b8abfd1b3 0xfccf5694dcdbaa49 0x3e24575fd2f4aa4d
>          cksum_actual = 0x521c2ebafd7 0x2be436a2bfd1b3 0xfcd00e26f8dbaa49 0x4161c187aaf4aa4d
>          cksum_algorithm = fletcher4
>          __ttl = 0x1
>          __tod = 0x50769a43 0x1619c71e
>
> ... (1000 more of them in rapid succession)...
 >
> Oct 12 2012 07:54:53.123217977 ereport.fs.zfs.checksum
> nvlist version: 0
>          class = ereport.fs.zfs.checksum
>          ena = 0xe644965bba100401
>          detector = (embedded nvlist)
>          nvlist version: 0
>                  version = 0x0
>                  scheme = zfs
>                  pool = 0xda1ed4eddd886ca2
>                  vdev = 0x76d6d5ecc9007061
>          (end detector)
>
>          pool = volume0
>          pool_guid = 0xda1ed4eddd886ca2
>          pool_context = 0
>          pool_failmode = wait
>          vdev_guid = 0x76d6d5ecc9007061
>          vdev_type = disk
>          vdev_path = /dev/dsk/c3t0d0s0
>          vdev_devid = id1,sd at SATA_____HDS725050KLA360_______KRVN03ZAG1JY4D/a
>          parent_guid = 0xb85bb36665652fa6
>          parent_type = mirror
>          zio_err = 50
>          zio_offset = 0x6414e7e00
>          zio_size = 0xa400
>          zio_objset = 0x28
>          zio_object = 0x1
>          zio_level = 0
>          zio_blkid = 0x1c3121
>          cksum_expected = 0xd682bb3691e 0x110e9a533de6720 0x6bb562c60c3d27a8 0xd15dd859803c65fd
>          cksum_actual = 0xd682db3691e 0x110e9df4bde6720 0x6bb8ae9ba83d27a8 0xf14a4b22583c65fd
>          cksum_algorithm = fletcher4
>          bad_ranges = 0x2fd0 0x2fd8
>          bad_ranges_min_gap = 0x8
>          bad_range_sets = 0x1
>          bad_range_clears = 0x0
>          bad_set_bits = 0x0 0x0 0x0 0x2 0x0 0x0 0x0 0x0
>          bad_cleared_bits = 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
>          __ttl = 0x1
>          __tod = 0x5078050d 0x7582839
>
> Oct 12 2012 09:31:27.252886984 ereport.fs.zfs.checksum
> nvlist version: 0
>          class = ereport.fs.zfs.checksum
>          ena = 0x3a9567025fb00801
>          detector = (embedded nvlist)
>          nvlist version: 0
>                  version = 0x0
>                  scheme = zfs
>                  pool = 0xda1ed4eddd886ca2
>                  vdev = 0x4dd27d7d2cff5683
>          (end detector)
>
>          pool = volume0
>          pool_guid = 0xda1ed4eddd886ca2
>          pool_context = 0
>          pool_failmode = wait
>          vdev_guid = 0x4dd27d7d2cff5683
>          vdev_type = disk
>          vdev_path = /dev/dsk/c3t3d0s0
>          vdev_devid = id1,sd at SATA_____HDS725050KLA360_______KRVN03ZAG39LKD/a
>          parent_guid = 0xb85bb36665652fa6
>          parent_type = mirror
>          zio_err = 50
>          zio_offset = 0x4f8cf3c00
>          zio_size = 0xca00
>          zio_objset = 0x28
>          zio_object = 0x1
>          zio_level = 0
>          zio_blkid = 0x110e12
>          cksum_expected = 0x109aa00f800f 0x1aae29387c03843 0x289b0f3d871cdb32 0xb91c890d879e9fc4
>          cksum_actual = 0x109aa20f800f 0x1aae2d39fc03843 0x289f125e231cdb32 0xe3fb48d05f9e9fc4
>          cksum_algorithm = fletcher4
>          bad_ranges = 0x49d0 0x49d8
>          bad_ranges_min_gap = 0x8
>          bad_range_sets = 0x1
>          bad_range_clears = 0x0
>          bad_set_bits = 0x0 0x0 0x0 0x2 0x0 0x0 0x0 0x0
>          bad_cleared_bits = 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
>          __ttl = 0x1
>          __tod = 0x50781baf 0xf12bfc8
>


-- 
Dr.Udo Grabowski    Inst.f.Meteorology a.Climate Research IMK-ASF-SAT
www-imk.fzk.de/asf/sat/grabowski/ www.imk-asf.kit.edu/english/sat.php
KIT - Karlsruhe Institute of Technology            http://www.kit.edu
Postfach 3640,76021 Karlsruhe,Germany  T:(+49)721 608-26026 F:-926026



More information about the OpenIndiana-discuss mailing list