[OpenIndiana-discuss] It just trashed itself!!

Lou Picciano loupicciano at comcast.net
Fri Feb 25 13:53:38 UTC 2011


Mark, It may not help at all - but what kind of network interface hardware are you using? 


We've seen occasional, strange dropoffs of interfaces based on RealTek chips. Odd, because one virtual interface will drop off, while others, over the same hardware, stay live. Have not had good luck sorting the problem; except that everyone is saying 'get rid of RealTek' hardware, usually recommending Intel. 


Interested in your comments. Lou Picciano 

----- Original Message ----- 
From: "Mark" <mark0x01 at gmail.com> 
To: "Discussion list for OpenIndiana" <openindiana-discuss at openindiana.org> 
Sent: Friday, February 25, 2011 2:59:43 AM 
Subject: [OpenIndiana-discuss] It just trashed itself!! 

I had an interesting issue today with one of my Open Indiana storage 
servers. 
It has around 15 smb/nfs shares and 40Tb of storage. 

The problems may have slowly crept up on it, as logs from the nfs client 
showed slow response issues starting about 12 hours earlier. 

Eventually it had ground to a halt, and would not complete a console login. 
I achieved a normal shut-down via the power button, but on reboot it was 
somewhat stuffed. 
On power up, it dropped into single user mode, due to networking issues. 
A 'dladm show-phys' revealed some missing network devices. 

The box has two to on-board and a quad gigabit card as igb devices, as 
well as a dual 10Gbit ixgbe, but only 3 x igb and 1 x ixgbe devices 
showed up. 

I tried another reboot, but that didn't help much either, as some were 
still missing. 
Then a reboot - -r, and that resulted in all the network devices 
disappearing. 

Suspecting possible hardware issues, I booted of the text installation 
cdrom, and found all the network devices were present and correct. 
A zpool import & scrub of the OS mirror showed no issues either. 

About an hour later, after a full OS reinstall and reconfigure, it was 
back up in production, thanks to the real virtues of zfs - recovery and 
portability, with smb and nfs shares intact. 
(I have build a raw vm workstation Open Solaris on a sata disk , moved 
it to an AMD and then Intel processor box, and had no problems just 
booting it up) 

I've saved one of the mirrored OS disks for a post-mortem, to try to 
find out what happened. Some of the errors on screen suggested write 
issues to some /dev/ devices, but when a production system is down, 
rapid recovery is always the primary goal, and analysis took a back seat. 

I've been slowly, (try moving 40Tb in a hurry and keeping data 
available), upgrading the Open Solaris boxes to Open Indiana to resolve 
the scrub impact and some of the other issues I had encountered. 
These have been very reliable for up to two years so far. 
The oldest has been up for about a year, but this one only a month. 

Hopefully this isn't a regular event, but I may keep a pre-built OS disk 
ready just in case. 

If anyone has suggestions on what to look for in the wreckage, it would 
be helpful. 


Mark. 

[Sparing a thought for Christchurch Earthquake victims. 
Thankfully, my family there are all safe.] 


_______________________________________________ 
OpenIndiana-discuss mailing list 
OpenIndiana-discuss at openindiana.org 
http://openindiana.org/mailman/listinfo/openindiana-discuss 


More information about the OpenIndiana-discuss mailing list