[OpenIndiana-discuss] Broken zpool

Rainer Heilke rheilke at dragonhearth.com
Wed Oct 28 02:44:12 UTC 2015


On 27/10/2015 4:48 PM, jason matthews wrote:
>
> This is probably not the appropriate time, given your state of mind is
> not likely accepting this sort of advice at this point in time, to
> remind you that mirrors and backups serve two different purposes and are
> not equivalent things.

I know this, thank you. Not being able to afford a proper backup 
solution, I went the mirroring route to at least give me some security.

> I am not trying to be a dick (it happens naturally), but if you cant
> afford to backup terabytes of data, then you cant afford to have
> terabytes of data.

That is a meaningless statement, that reflects nothing in real-world terms.

> Because I dont trust them they
> are in 3-way mirrors. I also have a backup pool that I back them up too.

It must be nice to have the money for this. Ever hear of a fixed income?

> This is just good stewardship of data you want to keep.

That's an arrogant statement, presuming that if a person doesn't have 
gobs of money, they shouldn't bother with computers at all.

> People who buy giant ass disks and then complain about how long it takes
> to resilver a giant ass disk are out of their minds.

I am not complaining about the time it takes; I know full well how long 
it can take. I am complaining that the "resilvering" stops dead. (More 
on this below.)

> I have no idea what happened to your system for you to loose three disks
> simultaneously.

This was covered in a thread ages ago; the tech took days to find the 
problem, which was a CMOS battery that was on Death's door.

> you lost your cmos settings and then compounded it by doing something
> stupid.

So, rebooting a system is now considered "stupid?"

> I just dont see you recovering from this scenario where you have
> two bad drives trying to resilver from each other.

They aren't trying to resilver from each other. The dead disk is gone. 
The good disk is trying to resilver from the ether. Or some such. 
(Itself?) I added a third drive to the mirror in a vain attempt to get 
past the error saying there weren't enough remaining mirrors when I 
tried to zpool detach the now non-existent drive. Again, what is IT 
trying to resilver from? The same Twilight Zone the first disk is trying 
to resilver from?

> If you are at all concerned about losing data first use dd to backup
> your messed up zfs disks to new drives. use the new drives in the system
> and perform the following operations. If you are willing to wing it like
> me you can skip the backup. Be advised, the system is in this mess
> because you skipped the back up :-) Ironic, right?

Actually, no. The system is in this state because ZFS keeps freezing the 
I/O of the pool, not letting me clear up the issues. The DATA is 
inaccessible due to the lack of backups. If we are being pedantic, we 
must be consistently so. :-)

> One likely has access to some of the files as the pool is marked
> DEGRADED and not FAILED for reasons I dont understand.

It seems to think that the one disk is fine, but the data isn't. ZFS is 
then locking the pool's I/O, not letting me clear up the damaged files 
(nor the pool). It's like there's a trapped loop between two parts of 
the ZFS code, but I refuse to believe Cantrill (and the many programmers 
since) didn't see this kind of problem.

> - zpool status -v data
> -- the files listed from the output of this command are toast. they are
> bogging the system down. if it were my data, i would delete them. say
> what? yes delete them.

I'm more than willing to sacrifice some files, if that lets me get at 
the rest. But when I try to delete a file, the terminal freezes up again 
(though, like some other actions tried earlier, it doesn't give me the 
"I/O suspended" message).

> - i was going to stay stop the resilver, but that might detach the
> mirrors and that could be a bad thing(tm). Instead, you might want to
> consider tuning the resilver so it goes really slow (in terms of I/O per
> second), obviously it is going slowly in Mb/s :-)

The system isn't letting me do _anything_ with this pool's data, 
silvering, tuning, scrubbing,... Nothing. I think I'll try the dd route. 
That should operate below the ZFS (file system) level, and maybe give me 
a chance. One thing I can't figure out is why the one disk is being 
listed as: 7152018192933189428
Isn't that the WNN (if I remember correctly; it's been a while)? Or is 
this just the "unique identifier" mentioned in the zpool man page?

Rainer

-- 
Put your makeup on and fix your hair up pretty,
And meet me tonight in Atlantic  City
			Bruce Springsteen
			



More information about the openindiana-discuss mailing list