Roy Sigurd Karlsbakk roy at karlsbakk.net
Thu Mar 3 19:54:06 UTC 2011

Hi all

I have this pool with 11 7-drive RAIDz2 VDEVs, all WD Black 2TB (FASS) drives. Another drive died recently, and I went to replace it. zpool offline, cfgadm -c unconfigure, unplug, devfsadm, zpool replace. Now, after this, I realize the resilver to a spare hadn't finished, so now it's telling me it's resilvering the broken drives to both the spare and the replaced drive. I notice no disk I/O is happening, so I try a few things, but it seems zfs is deadlocked somehow. I power down the system after a shutdown, unplug the drive I just replaced and power it up again. It now tells me it's resilvering, but it lacks enough copies to mount it. I had a small heart attack, those 50TB of data is indeed on others servers, as this is the backup, but a new backup will take us a few weeks. I should underline the fact that only one out of seven drives in the VDEV was actually removed/dead, but ZFS somewhat thought it was missing more, because of double resilver.

It all went well, though, in the end. Resilver finished after a few hours, zpool status shows no data errors, and the volume was mounted correctly.

I can't recall having seen any warnings about this sort of issue anywhere. Can someone update me on what went wrong? It seems starting two resilvers for the same drive in the same VDEV was a Big Mistake, but then, I thought ZFS would sort out/quit the resilver on the spare once the original was replaced.

