[OpenIndiana-discuss] It just trashed itself!!
Lou Picciano
loupicciano at comcast.net
Fri Feb 25 13:53:38 UTC 2011
Mark, It may not help at all - but what kind of network interface hardware are you using?
We've seen occasional, strange dropoffs of interfaces based on RealTek chips. Odd, because one virtual interface will drop off, while others, over the same hardware, stay live. Have not had good luck sorting the problem; except that everyone is saying 'get rid of RealTek' hardware, usually recommending Intel.
Interested in your comments. Lou Picciano
----- Original Message -----
From: "Mark" <mark0x01 at gmail.com>
To: "Discussion list for OpenIndiana" <openindiana-discuss at openindiana.org>
Sent: Friday, February 25, 2011 2:59:43 AM
Subject: [OpenIndiana-discuss] It just trashed itself!!
I had an interesting issue today with one of my Open Indiana storage
servers.
It has around 15 smb/nfs shares and 40Tb of storage.
The problems may have slowly crept up on it, as logs from the nfs client
showed slow response issues starting about 12 hours earlier.
Eventually it had ground to a halt, and would not complete a console login.
I achieved a normal shut-down via the power button, but on reboot it was
somewhat stuffed.
On power up, it dropped into single user mode, due to networking issues.
A 'dladm show-phys' revealed some missing network devices.
The box has two to on-board and a quad gigabit card as igb devices, as
well as a dual 10Gbit ixgbe, but only 3 x igb and 1 x ixgbe devices
showed up.
I tried another reboot, but that didn't help much either, as some were
still missing.
Then a reboot - -r, and that resulted in all the network devices
disappearing.
Suspecting possible hardware issues, I booted of the text installation
cdrom, and found all the network devices were present and correct.
A zpool import & scrub of the OS mirror showed no issues either.
About an hour later, after a full OS reinstall and reconfigure, it was
back up in production, thanks to the real virtues of zfs - recovery and
portability, with smb and nfs shares intact.
(I have build a raw vm workstation Open Solaris on a sata disk , moved
it to an AMD and then Intel processor box, and had no problems just
booting it up)
I've saved one of the mirrored OS disks for a post-mortem, to try to
find out what happened. Some of the errors on screen suggested write
issues to some /dev/ devices, but when a production system is down,
rapid recovery is always the primary goal, and analysis took a back seat.
I've been slowly, (try moving 40Tb in a hurry and keeping data
available), upgrading the Open Solaris boxes to Open Indiana to resolve
the scrub impact and some of the other issues I had encountered.
These have been very reliable for up to two years so far.
The oldest has been up for about a year, but this one only a month.
Hopefully this isn't a regular event, but I may keep a pre-built OS disk
ready just in case.
If anyone has suggestions on what to look for in the wreckage, it would
be helpful.
Mark.
[Sparing a thought for Christchurch Earthquake victims.
Thankfully, my family there are all safe.]
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss at openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
More information about the OpenIndiana-discuss
mailing list