[OpenIndiana-discuss] server hangs

Roman Naumenko roman at naumenko.ca
Sun Sep 11 02:08:48 UTC 2011


A continuation, I hoped it wouldn't follow, but the server hanged again.

The error I saw on the console was

Sep 10 20:15:39/256 ERROR: svc:/system/hal:default: Method 
"/lib/svc/method/svc-hal start" failed with exit status 95.
Sep 10 20:15:39/256: system/hal:default failed fatally: transitioned to 
maintenance (see 'svcs -xv' for details)

I couldn't do anything on the console, had to do restart server.

The mounts were lost at 20:08 on the client
Sep 10 20:08:07 station KernelEventAgent[72]: tid 00000000 received 
event(s) VQ_NOTRESP (1)

The last fmdump was 5 days ago
Sep 05 2011 14:37:37.325349500 ereport.fs.zfs.vdev.open_failed
nvlist version: 0

So does it confirming either version for failing psu or bad ssd?

--Roman N

Lucas Van Tol said the following, on 02-09-11 10:12 AM:
> You might not want to have any swap enabled on that.   SSD's tend to perform worse when they are full (I'm not sure if allocating 8G to swap actually uses up space on the physical device or not) and I have seen other Kingston SSD's hang for a bit at times, which would probably not be good for swap.
>
> If possible, you might try and redirect some logs off of rpool; it might not be able to log anything if the rpool is the problem.
>> Date: Fri, 2 Sep 2011 00:07:23 -0400
>> From: roman at naumenko.ca
>> To: openindiana-discuss at openindiana.org
>> Subject: Re: [OpenIndiana-discuss] server hangs
>>
>> It's Kingston 16GB ssd drive.
>>
>> --Roman N
>>
>> Lucas Van Tol said the following, on 01-09-11 5:34 PM:
>>> What is your rpool like?  I saw some bizzare behavior with a compact-flash based rpool; as the CF card got overused and got slower and slower, it eventually would hang without throwing any actual errors (just service times approaching infinity).
>>> Services that had enough information stored in memory continued to work, but anytime something read from the rpool it would hang, and services slowly died off.   The system never seemed to fault/offline the rpool either...
>>>
>>>> Date: Thu, 1 Sep 2011 14:42:54 -0400
>>>> From: roman at naumenko.ca
>>>> To: openindiana-discuss at openindiana.org
>>>> Subject: Re: [OpenIndiana-discuss] server hangs
>>>>
>>>> I need to dig into MB manual, but its basically all commodity hw based (although mb is some server-type Asus).
>>>>
>>>> --Roman N
>>>>
>>>> ----- Original Message -----
>>>>
>>>>> what about hw event logs? if you have power flucuations it might show
>>>>> ip there.
>>>>> you can probably pull those out from your service processor or boot
>>>>> to bios and read them there.
>>>>> Sent from Jasons' hand held
>>>>> On Sep 1, 2011, at 8:37 AM, Roman Naumenko<roman at naumenko.ca>   wrote:
>>>>>> Costly troubleshooting you had.
>>>>>> All right then, I will wait for the next failure to look through it
>>>>>> once again and maybe swap psu if nothing again found.
>>>>>>
>>>>>> --Roman N
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>
>>>>>>> I burned through about 3 disks before I figured it out. Nothing in
>>>>>>> the
>>>>>>> logs made me think this but the eventual failure of the disks
>>>>>>> alerted
>>>>>>> me
>>>>>>> that something hardwarish was happening.
>>>>>>> On 08/31/11 11:01 PM, Roman Naumenko wrote:
>>>>>>>> Well, might be the reason. 8 drivers is certainly limit too much
>>>>>>>> for a
>>>>>>>> stock psu. But there should be some traces, no?
>>>>>>>> How did you figure out the reason for errors on your system?
>>>>>>>>
>>>>>>>> --Roman
>>>>>>>>
>>>>>>>> Daniel Kjar said the following, on 31-08-11 9:43 PM:
>>>>>>>>> Careful... are you overtaxing your power supply? My 148 system
>>>>>>>>> was
>>>>>>>>> behaving like that when I put too many drives in an ultra 20.
>>>>>>>>>
>>>>>>>>> On 8/31/2011 7:48 PM, Roman Naumenko wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have SunOS 5.11 oi_148 installed on my storage server with 8
>>>>>>>>>> disks
>>>>>>>>>> in raidz2 pool.
>>>>>>>>>> It hangs about once in a week and I had to restart it.
>>>>>>>>>> Can you help me troubleshoot it?
>>>>>>>>>>
>>>>>>>>>> It has some zfs volumes shared over nfs and afpd. (afpd is
>>>>>>>>>> unfortunately a development version to satisfy OSX Lion).
>>>>>>>>>>
>>>>>>>>>> roks at data:~$ afpd -V
>>>>>>>>>> afpd 2.2.0 - Apple Filing Protocol (AFP) daemon of Netatalk
>>>>>>>>>>
>>>>>>>>>> afpd has been compiled with support for these features:
>>>>>>>>>>
>>>>>>>>>> AFP3.x support: Yes
>>>>>>>>>> TCP/IP Support: Yes
>>>>>>>>>> DDP(AppleTalk) Support: No
>>>>>>>>>> CNID backends: dbd last tdb
>>>>>>>>>> SLP support: No
>>>>>>>>>> Zeroconf support: Yes
>>>>>>>>>> TCP wrappers support: Yes
>>>>>>>>>> Quota support: Yes
>>>>>>>>>> Admin group support: Yes
>>>>>>>>>> Valid shell checks: Yes
>>>>>>>>>> cracklib support: No
>>>>>>>>>> Dropbox kludge: No
>>>>>>>>>> Force volume uid/gid: No
>>>>>>>>>> ACL support: Yes
>>>>>>>>>> EA support: ad | sys
>>>>>>>>>> LDAP support: Yes
>>>>>>>>>>
>>>>>>>>>> It also has time-slider enabled, which is pretty buggy peace of
>>>>>>>>>> hmmm
>>>>>>>>>> software, but it shouldn't cause server to crash or hang.
>>>>>>>>>>
>>>>>>>>>> So the problems start with nfs and/or afpd timeouts on clients,
>>>>>>>>>> but
>>>>>>>>>> I still can ssh to the server. Can't read any files or logs
>>>>>>>>>> though.
>>>>>>>>>> Then network service disappears in a minute or few minutes,
>>>>>>>>>> console
>>>>>>>>>> becomes frozen and I have to do hard restart at that point.
>>>>>>>>>>
>>>>>>>>>> Where should I look to understand what causing this?
>>>>>>>>>> Since I can't reproduce the problem, I'd like to get prepared
>>>>>>>>>> when
>>>>>>>>>> it happens next time.
>>>>>>>>>> I couldn't find anything unusual in the logs after restart.
>>>>>>>>>>
>>>>>>>>>> time-slider complains for some reason about space on rpool
>>>>>>>>>> Aug 31 19:41:36 data time-sliderd: [ID 702911 daemon.notice] No
>>>>>>>>>> more
>>>>>>>>>> hourly snapshots left
>>>>>>>>>> Aug 31 19:41:36 data time-sliderd: [ID 702911 daemon.warning]
>>>>>>>>>> rpool
>>>>>>>>>> exceeded 80% capacity. Hourly and daily automatic snapshots
>>>>>>>>>> were
>>>>>>>>>> destroyed
>>>>>>>>>>
>>>>>>>>>> Where does it see 80%?
>>>>>>>>>>
>>>>>>>>>> $ df -h
>>>>>>>>>>
>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on
>>>>>>>>>> rpool/ROOT/solaris 5.5G 3.0G 2.6G 54% /
>>>>>>>>>> swap 1.4G 396K 1.4G 1% /etc/svc/volatile
>>>>>>>>>> /usr/lib/libc/libc_hwcap1.so.1 5.5G 3.0G 2.6G 54%
>>>>>>>>>> /lib/libc.so.1
>>>>>>>>>> swap 1.4G 8.0K 1.4G 1% /tmp
>>>>>>>>>> swap 1.4G 52K 1.4G 1% /var/run
>>>>>>>>>> rpool/export 2.6G 32K 2.6G 1% /export
>>>>>>>>>> rpool/export/home 2.6G 33K 2.6G 1% /export/home
>>>>>>>>>> rpool/export/home/usr1 2.6G 38K 2.6G 1% /export/home/usr1
>>>>>>>>>> rpool/export/home/usr2 3.0G 385M 2.6G 13% /export/home/usr2
>>>>>>>>>> rpool 2.6G 48K 2.6G 1% /rpool
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --Roman
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> OpenIndiana-discuss mailing list
>>>>>>>>>> OpenIndiana-discuss at openindiana.org
>>>>>>>>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>>>>>> _______________________________________________
>>>>>>>> OpenIndiana-discuss mailing list
>>>>>>>> OpenIndiana-discuss at openindiana.org
>>>>>>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>>>>> --
>>>>>>> Dr. Daniel Kjar
>>>>>>> Assistant Professor of Biology
>>>>>>> Division of Mathematics and Natural Sciences
>>>>>>> Elmira College
>>>>>>> 1 Park Place
>>>>>>> Elmira, NY 14901
>>>>>>> 607-735-1826
>>>>>>> http://faculty.elmira.edu/dkjar
>>>>>>> "...humans send their young men to war; ants send their old
>>>>>>> ladies"
>>>>>>> -E. O. Wilson
>>>>>>> _______________________________________________
>>>>>>> OpenIndiana-discuss mailing list
>>>>>>> OpenIndiana-discuss at openindiana.org
>>>>>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>>>> _______________________________________________
>>>>>> OpenIndiana-discuss mailing list
>>>>>> OpenIndiana-discuss at openindiana.org
>>>>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>>> _______________________________________________
>>>>> OpenIndiana-discuss mailing list
>>>>> OpenIndiana-discuss at openindiana.org
>>>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>> _______________________________________________
>>>> OpenIndiana-discuss mailing list
>>>> OpenIndiana-discuss at openindiana.org
>>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>    		 	   		
>>> _______________________________________________
>>> OpenIndiana-discuss mailing list
>>> OpenIndiana-discuss at openindiana.org
>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>   		 	   		
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss



More information about the OpenIndiana-discuss mailing list