[OpenIndiana-discuss] intermittent CIFS loss, spontaneous-reboot with OI148/151a and IBM Megaraid M5015?

Thu Feb 2 01:53:47 UTC 2012

Hi Ken,

The OI148 servers do occasionally "lose" SMB/CIFS access.  But no 
spontaneous reboot (yet!).

Hi James,

/etc/syslog.conf is setup with the following:
*.err;kern.notice;auth.notice                   @loghost
*.err;kern.debug;daemon.notice;mail.crit        @loghost

and in the loghost, this is the output (slightly sanitised) from around 
when the server rebooted (just yesterday!):
Feb  1 12:06:01 san7.local svc.startd[10]: [ID 122153 daemon.warning] 
svc:/network/smb/server:default: Method or service exit timed out.  
Killing contract 1511.
Feb  1 12:06:29 san7.local svc.startd[10]: [ID 122153 daemon.warning] 
svc:/network/smb/server:default: Method or service exit timed out.  
Killing contract 1511.
Feb  1 12:08:10 san7.local svc.startd[10]: last message repeated 61 times
Feb  1 12:09:11 san7.local svc.startd[10]: last message repeated 61 times
Feb  1 12:09:28 san7.local svc.startd[10]: last message repeated 16 times

coreadm is set by default for only per-process core dumps, and similarly 
dumpadm shows defaults, I've enable savecore, so will see what i get for 
a core dump in future.

Thanks all.

regards, Yu-Phing

On 02/02/2012 00:50, ken mays wrote:
> Ong,
>
> Any issues with the oi_148 servers?
>
> ~ Ken Mays
>
> ------------------------------------------------------------------------
> *From:* James Carlson <carlsonj at workingcode.com>
> *To:* Discussion list for OpenIndiana 
> <openindiana-discuss at openindiana.org>
> *Cc:* Ong Yu-Phing <ong.yu.phing at group.ong-ong.com>
> *Sent:* Wednesday, February 1, 2012 9:10 AM
> *Subject:* Re: [OpenIndiana-discuss] intermittent CIFS loss, 
> spontaneous-reboot with OI148/151a and IBM Megaraid M5015?
>
> On 02/01/12 03:29, Ong Yu-Phing wrote:
> > We've a number of IBM 3630M3 servers, equipped with BBU M5014/5015s, 
> running as CIFS server, with a mixture of OI148 and OI151a.  Nothing 
> fancy (no dedup, no compression), just a pool of mirrored disks aka 
> RAID10, with CIFS access authenticated via MS AD.
> >
> > Intermittently, CIFS/SMB will go down, sometimes this can be 
> restored via restarting the smb service ("enable -r smb/server"), 
> other times it necessitates a server reset ("svcs | grep smb" shows 
> that smb/server has an * next to it).
>
> If you do "svcs -xv", it should show references to log files for the
> services that are in trouble.  For smb/server, I'd expect that to be
> /var/svc/log/network-smb-server:default.log.  Examining that file would
> be a good first step here.
>
> Also, it's common for services to log via syslog.  /var/adm/messages
> might be a good place to start there.
>
> > And one of the servers (always the same, so far...) will 
> intermittently reboot (more frequently than the SMB service going 
> down).  Sometimes in the middle of the day, sometimes in the evening 
> (once it was around 6pm).  This particular server will reboot and come 
> back up without much delay, and the pool and zfs shares come back 
> online fine.
>
> Spontaneous reboot has to be either a kernel panic or a hardware
> problem.  "dumpadm" should tell you where the kernel dumps are going --
> the "savecore" directory; usually /var/crash.  Look for files there.
>
> Running mdb on the files and using ::status and ::stack commands might
> give a good enough signature that someone could identify the cause.
>
> (I'm not a CIFS expert, but if you gather some basic log information
> about the problem, I imagine one may be able to help.)
>
> -- 
> James Carlson        42.703N 71.076W <carlsonj at workingcode.com 
> <mailto:carlsonj at workingcode.com>>
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org 
> <mailto:OpenIndiana-discuss at openindiana.org>
> http://openindiana.org/mailman/listinfo/openindiana-discuss
>
>