[OpenIndiana-discuss] Server hangs weekly

Daniel Kjar dkjar at elmira.edu
Thu Feb 23 01:27:24 UTC 2012


I have this problem on a system that I was using to back up 50 gbs of 
material each night.  It would transfer that across the network in zfs 
and that would kill it but it would only happen after a week or so of 
nightly updates of roughly the same size.  This machine has 32gb of ram 
and a cp process would hang and swallow it all bringing the system to 
its knees.  I just stopped that big transfer job and called it a night.  
I am no longer backing up my files to 3 different buildings but that is 
better than crashing my sunray server every 5 days.

On 2/22/2012 1:48 AM, Milan Jurik wrote:
> Hi,
>
> one of my systems was suffering from very similar symptoms. I had no 
> chance to debug it much as it was on remote site in serverhouse. But 
> in my case it was lack of memory, system was under significant memory 
> pressure. I was unable to reproduce it on small systems I have at 
> home. I added some memory and set limits for zones.
>
> One small suggestion - could you write small script dumping memory 
> info (from kernel mdb) and list of processes to the disk and run it 
> from crontab every few minutes? Maybe it will be unable to store data 
> during "hang" but at least you could see trend.
>
> For lost IP address - are you using NWAM?
>
> Best regards,
>
> Milan
>
> On 22.02.2012 07:32, oimltalk at skidde.net wrote:
>> Hi there,
>>
>> I'm seeing roughly weekly hangs on a server running OpenIndiana 151a. 
>> I'm
>> using it primarily as a home fileserver with ZFS.
>>
>> The exact behavior seems to depend on when I notice it, but 
>> essentially the
>> server drops off the network and is only variably responsive when I 
>> try to
>> access the console directly. Sometimes when this happens the system 
>> doesn't
>> respond at all (e.g., not even to keyboard input). One time I was 
>> able to
>> interact with the console (after the server had disappeared from the
>> network) and tried to see what was going on. Tried pinging
>> google.com(unreachable, as expected). Next I tried `ifconfig -a` and
>> got this:
>>
>> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 
>> 8232
>> index 1
>>         inet 127.0.0.1 netmask ff000000
>> e1000g0: 
>> flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu
>> 1500 index 2
>>         inet 0.0.0.0 netmask ff000000
>>
>>
>> which explains the lack of connectivity. But after it printed that it
>> didn't return. The console still printed my keyboard output 
>> (including ^C,
>> ^Z, etc.), and there was still output coming from other sources (e.g., I
>> have napp-it running regular snapshots, so I saw a notice that it had 
>> used
>> sudo to run that) but I couldn't get a prompt back. Next I tried hitting
>> the power button on the machine I got this:
>>
>> poweroff: initiated by user on /dev/console
>> in.ndpd[994]: phyint_reach_random: SIOCSLIFLNKINFO (interfac e1000g0):
>> Interrupted system call
>> bootadm: /boot/solaris/bin/extract_boot_filelist is not owned by 101,
>> skipping
>> syncing file systems... done
>> WARNING: Power off requested from power button or SC, powering down the
>> system!
>>
>>
>> followed shortly by:
>>
>> WARNING: Failed to shut down the system!
>>
>>
>> Tried looking through the logs for anything interesting but didn't 
>> come up
>> with anything, though to be honest I'm not 100% sure where to look or 
>> what
>> to look for. When the machine drops off the network I can still 
>> access it
>> via IPMI (tried this using both the dedicated jack on the motherboard 
>> and
>> by sharing the Intel NIC--worked in both cases, but OI was still
>> unresponsive), so I doubt it's a bad NIC. Motherboard is a Supermicro
>> X9SCM-F.
>>
>> I know that at least sometimes the system will stop running even my ZFS
>> snapshots via napp-it, since I've come back to a frozen console that 
>> showed
>> the last snapshot being taken 12+ hours before (they're supposed to be
>> taken every 15 minutes). My guess is this is just because it takes me
>> longer to notice sometimes--seems like it's hitting a deadlock somewhere
>> that eventually grinds everything to a halt (like with the ipconfig call
>> above).
>>
>> Also, FWIW, here's what ipconfig -a gets me when it works correctly (MAC
>> address removed, although interestingly it wasn't even printed in the
>> output above):
>>
>> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 
>> 8232
>> index 1
>>         inet 127.0.0.1 netmask ff000000
>> e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 
>> 1500
>> index 2
>>         inet 192.168.10.10 netmask ffffff00
>>         ether [MAC address here]
>> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 
>> 8252
>> index 1
>>         inet6 ::1/128
>> e1000g0: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500 
>> index 2
>>         inet6 fe80::225:90ff:fe50:2c2a/10
>>         ether [MAC address here]
>>
>>
>> Any ideas/suggestions on where to go from here? Thanks in advance.
>>
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss

-- 
Dr. Daniel Kjar
Assistant Professor of Biology
Division of Mathematics and Natural Sciences
Elmira College
1 Park Place
Elmira, NY 14901
607-735-1826
http://faculty.elmira.edu/dkjar

"...humans send their young men to war; ants send their old ladies"
	-E. O. Wilson




More information about the OpenIndiana-discuss mailing list