[OpenIndiana-discuss] It just trashed itself!!

Mark mark0x01 at gmail.com
Fri Feb 25 07:59:43 UTC 2011


I had an interesting issue today with one of my Open Indiana storage 
servers.
It has around 15 smb/nfs shares and 40Tb of storage.

The problems may have slowly crept up on it, as logs from the nfs client 
showed slow response issues starting about 12 hours earlier.

Eventually it had ground to a halt, and would not complete a console login.
I achieved a normal shut-down via the power button, but on reboot it was 
somewhat stuffed.
On power up, it dropped into single user mode, due to networking issues.
A 'dladm show-phys' revealed some missing network devices.

The box has two to on-board and a quad gigabit card as igb devices, as 
well as a dual 10Gbit ixgbe, but only 3 x igb and 1 x ixgbe devices 
showed up.

I tried another reboot, but that didn't help much either, as some were 
still missing.
Then a reboot - -r, and that resulted in all the network devices 
disappearing.

Suspecting possible hardware issues, I booted of the text installation 
cdrom, and found all the network devices were present and correct.
A zpool import & scrub of the OS mirror showed no issues either.

About an hour later, after a full OS reinstall and reconfigure, it was 
back up in production, thanks to the real virtues of zfs - recovery and 
portability, with smb and nfs shares intact.
(I have build a raw vm workstation Open Solaris on a sata disk , moved 
it to an AMD and then Intel processor box, and had no problems just 
booting it up)

I've saved one of the mirrored OS disks for a post-mortem, to try to 
find out what happened. Some of the errors on screen suggested write 
issues to some /dev/ devices, but when a production system is down, 
rapid recovery is always the primary goal, and analysis took a back seat.

I've been slowly, (try moving 40Tb in a hurry and keeping data 
available), upgrading the Open Solaris boxes to Open Indiana to resolve 
the scrub impact and some of the other issues I had encountered.
These have been very reliable for up to two years so far.
The oldest has been up for about a year, but this one only a month.

Hopefully this isn't a regular event, but I may keep a pre-built OS disk 
ready just in case.

If anyone has suggestions on what to look for in the wreckage, it would 
be helpful.


Mark.

[Sparing a thought for Christchurch Earthquake victims.
Thankfully, my family there are all safe.]




More information about the OpenIndiana-discuss mailing list