[OpenIndiana-discuss] Broken zpool

jason matthews jason at broken.net
Tue Oct 27 23:48:05 UTC 2015


If I had a nickel for every time someone told me, "I don't have a backup."

I am just going to go on a generalized rant... and then return to next 
steps.

This is probably not the appropriate time, given your state of mind is 
not likely accepting this sort of advice at this point in time, to 
remind you that mirrors and backups serve two different purposes and are 
not equivalent things. This is a difficult lesson for some to learn. You 
are one of the some.

I am not trying to be a dick (it happens naturally), but if you cant 
afford to backup terabytes of data, then you cant afford to have 
terabytes of data. Consider a strategy where you backup things important 
to you and gamble on the rest. I use cheap SATA disks at home too. I use 
big ass 6TB WD drives that I dont trust. Because I dont trust them they 
are in 3-way mirrors. I also have a backup pool that I back them up too. 
This is just good stewardship of data you want to keep.

People who buy giant ass disks and then complain about how long it takes 
to resilver a giant ass disk are out of their minds. They remind me of 
morons that buy houses next airports and then complain about the noise 
of airplanes.

I have no idea what happened to your system for you to loose three disks 
simultaneously. There is about a one in a ten billion chance of that 
happening on the same day unless your controller card or cables are bad, 
you lost your cmos settings and then compounded it by doing something 
stupid. I just dont see you recovering from this scenario where you have 
two bad drives trying to resilver from each other.


WWJD - what would Jason do?

Here is what I would do, if this were my system. You should see if 
anyone else has a better idea.

If you are at all concerned about losing data first use dd to backup 
your messed up zfs disks to new drives. use the new drives in the system 
and perform the following operations. If you are willing to wing it like 
me you can skip the backup. Be advised, the system is in this mess 
because you skipped the back up :-) Ironic, right?

One likely has access to some of the files as the pool is marked 
DEGRADED and not FAILED for reasons I dont understand.
- zpool status -v data
-- the files listed from the output of this command are toast. they are 
bogging the system down. if it were my data, i would delete them. say 
what? yes delete them. The system cant recover them, they cant be 
snapshot'd,  they are just in your way. For purposes of recovering files 
on the live filesystem you dont need to delete existing corrupted 
snapshots. If you dont want to delete these corrupted files, then you 
need to find a way to exclude them from your backup  process (rsync 
--exclude=)

-- the surviving files are going to have to be copied  to new target 
media. once the corrupted files are deleted standard file tools like tar 
and cpio can be used to copy to the new target smoothly. It might be 
possible to get snapshot/send/recv working again by deleting all the 
snapshots with corrupted blocks so that the zpool is clean of 
corruption. Be advised, there maybe something on one of those snapshots 
you may want to keep.

- i was going to stay stop the resilver, but that might detach the 
mirrors and that could be a bad thing(tm). Instead, you might want to 
consider tuning the resilver so it goes really slow (in terms of I/O per 
second), obviously it is going slowly in Mb/s :-)

that's what I would do. Your level of comfort and skill level should 
should be governing factors. You may wish to seek professional help.

That's what I would do. Your mileage may vary.

j.




More information about the openindiana-discuss mailing list