[OpenIndiana-discuss] Server hangs weekly

oimltalk at skidde.net oimltalk at skidde.net
Wed Feb 22 05:32:01 UTC 2012


Hi there,

I'm seeing roughly weekly hangs on a server running OpenIndiana 151a. I'm
using it primarily as a home fileserver with ZFS.

The exact behavior seems to depend on when I notice it, but essentially the
server drops off the network and is only variably responsive when I try to
access the console directly. Sometimes when this happens the system doesn't
respond at all (e.g., not even to keyboard input). One time I was able to
interact with the console (after the server had disappeared from the
network) and tried to see what was going on. Tried pinging
google.com(unreachable, as expected). Next I tried `ifconfig -a` and
got this:

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232
index 1
        inet 127.0.0.1 netmask ff000000
e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu
1500 index 2
        inet 0.0.0.0 netmask ff000000


which explains the lack of connectivity. But after it printed that it
didn't return. The console still printed my keyboard output (including ^C,
^Z, etc.), and there was still output coming from other sources (e.g., I
have napp-it running regular snapshots, so I saw a notice that it had used
sudo to run that) but I couldn't get a prompt back. Next I tried hitting
the power button on the machine I got this:

poweroff: initiated by user on /dev/console
in.ndpd[994]: phyint_reach_random: SIOCSLIFLNKINFO (interfac e1000g0):
Interrupted system call
bootadm: /boot/solaris/bin/extract_boot_filelist is not owned by 101,
skipping
syncing file systems... done
WARNING: Power off requested from power button or SC, powering down the
system!


followed shortly by:

WARNING: Failed to shut down the system!


Tried looking through the logs for anything interesting but didn't come up
with anything, though to be honest I'm not 100% sure where to look or what
to look for. When the machine drops off the network I can still access it
via IPMI (tried this using both the dedicated jack on the motherboard and
by sharing the Intel NIC--worked in both cases, but OI was still
unresponsive), so I doubt it's a bad NIC. Motherboard is a Supermicro
X9SCM-F.

I know that at least sometimes the system will stop running even my ZFS
snapshots via napp-it, since I've come back to a frozen console that showed
the last snapshot being taken 12+ hours before (they're supposed to be
taken every 15 minutes). My guess is this is just because it takes me
longer to notice sometimes--seems like it's hitting a deadlock somewhere
that eventually grinds everything to a halt (like with the ipconfig call
above).

Also, FWIW, here's what ipconfig -a gets me when it works correctly (MAC
address removed, although interestingly it wasn't even printed in the
output above):

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232
index 1
        inet 127.0.0.1 netmask ff000000
e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500
index 2
        inet 192.168.10.10 netmask ffffff00
        ether [MAC address here]
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252
index 1
        inet6 ::1/128
e1000g0: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500 index 2
        inet6 fe80::225:90ff:fe50:2c2a/10
        ether [MAC address here]


Any ideas/suggestions on where to go from here? Thanks in advance.


More information about the OpenIndiana-discuss mailing list