[OpenIndiana-discuss] server hangs

Thu Sep 1 15:33:48 UTC 2011

I burned through about 3 disks before I figured it out.  Nothing in the 
logs made me think this but the eventual failure of the disks alerted me 
that something hardwarish was happening.

On 08/31/11 11:01 PM, Roman Naumenko wrote:
> Well, might be the reason. 8 drivers is certainly limit too much for a 
> stock psu. But there should be some traces, no?
> How did you figure out the reason for errors on your system?
>
> --Roman
>
> Daniel Kjar said the following, on 31-08-11 9:43 PM:
>> Careful... are you overtaxing your power supply?  My 148 system was 
>> behaving like that when I put too many drives in an ultra 20.
>>
>> On 8/31/2011 7:48 PM, Roman Naumenko wrote:
>>> Hi,
>>>
>>> I have SunOS 5.11 oi_148 installed on my storage server with 8 disks 
>>> in raidz2 pool.
>>> It hangs about once in a week and I had to restart it.
>>> Can you help me troubleshoot it?
>>>
>>> It has some zfs volumes shared over nfs and afpd. (afpd is 
>>> unfortunately a development version to satisfy OSX Lion).
>>>
>>> roks at data:~$ afpd -V
>>> afpd 2.2.0 - Apple Filing Protocol (AFP) daemon of Netatalk
>>>
>>> afpd has been compiled with support for these features:
>>>
>>> AFP3.x support: Yes
>>> TCP/IP Support: Yes
>>> DDP(AppleTalk) Support: No
>>> CNID backends: dbd last tdb
>>> SLP support: No
>>> Zeroconf support: Yes
>>> TCP wrappers support: Yes
>>> Quota support: Yes
>>> Admin group support: Yes
>>> Valid shell checks: Yes
>>> cracklib support: No
>>> Dropbox kludge: No
>>> Force volume uid/gid: No
>>> ACL support: Yes
>>> EA support: ad | sys
>>> LDAP support: Yes
>>>
>>> It also has time-slider enabled, which is pretty buggy peace of hmmm 
>>> software, but it shouldn't cause server to crash or hang.
>>>
>>> So the problems start with nfs and/or afpd timeouts on clients, but 
>>> I still can ssh to the server. Can't read any files or logs though.
>>> Then network service disappears in a minute or few minutes, console 
>>> becomes frozen and I have to do hard restart at that point.
>>>
>>> Where should I look to understand what causing this?
>>> Since I can't reproduce the problem, I'd like to get prepared when 
>>> it happens next time.
>>> I couldn't find anything unusual in the logs after restart.
>>>
>>> time-slider complains for some reason about space on rpool
>>> Aug 31 19:41:36 data time-sliderd: [ID 702911 daemon.notice] No more 
>>> hourly snapshots left
>>> Aug 31 19:41:36 data time-sliderd: [ID 702911 daemon.warning] rpool 
>>> exceeded 80% capacity. Hourly and daily automatic snapshots were 
>>> destroyed
>>>
>>> Where does it see 80%?
>>>
>>> $ df -h
>>>
>>> Filesystem            Size  Used Avail Use% Mounted on
>>> rpool/ROOT/solaris    5.5G  3.0G  2.6G  54% /
>>> swap                  1.4G  396K  1.4G   1% /etc/svc/volatile
>>> /usr/lib/libc/libc_hwcap1.so.1 5.5G  3.0G  2.6G  54% /lib/libc.so.1
>>> swap                  1.4G  8.0K  1.4G   1% /tmp
>>> swap                  1.4G   52K  1.4G   1% /var/run
>>> rpool/export          2.6G   32K  2.6G   1% /export
>>> rpool/export/home     2.6G   33K  2.6G   1% /export/home
>>> rpool/export/home/usr1 2.6G   38K  2.6G   1% /export/home/usr1
>>> rpool/export/home/usr2 3.0G  385M  2.6G  13% /export/home/usr2
>>> rpool                 2.6G   48K  2.6G   1% /rpool
>>>
>>>
>>> --Roman
>>>
>>> _______________________________________________
>>> OpenIndiana-discuss mailing list
>>> OpenIndiana-discuss at openindiana.org
>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss

-- 
Dr. Daniel Kjar
Assistant Professor of Biology
Division of Mathematics and Natural Sciences
Elmira College
1 Park Place
Elmira, NY 14901
607-735-1826
http://faculty.elmira.edu/dkjar

"...humans send their young men to war; ants send their old ladies"
	-E. O. Wilson