[OpenIndiana-discuss] intermittent CIFS loss, spontaneous-reboot with OI148/151a and IBM Megaraid M5015?

Wed Feb 1 14:10:35 UTC 2012

On 02/01/12 03:29, Ong Yu-Phing wrote:
> We've a number of IBM 3630M3 servers, equipped with BBU M5014/5015s, running as CIFS server, with a mixture of OI148 and OI151a.  Nothing fancy (no dedup, no compression), just a pool of mirrored disks aka RAID10, with CIFS access authenticated via MS AD.
> 
> Intermittently, CIFS/SMB will go down, sometimes this can be restored via restarting the smb service ("enable -r smb/server"), other times it necessitates a server reset ("svcs | grep smb" shows that smb/server has an * next to it).

If you do "svcs -xv", it should show references to log files for the
services that are in trouble.  For smb/server, I'd expect that to be
/var/svc/log/network-smb-server:default.log.  Examining that file would
be a good first step here.

Also, it's common for services to log via syslog.  /var/adm/messages
might be a good place to start there.

> And one of the servers (always the same, so far...) will intermittently reboot (more frequently than the SMB service going down).  Sometimes in the middle of the day, sometimes in the evening (once it was around 6pm).  This particular server will reboot and come back up without much delay, and the pool and zfs shares come back online fine.

Spontaneous reboot has to be either a kernel panic or a hardware
problem.  "dumpadm" should tell you where the kernel dumps are going --
the "savecore" directory; usually /var/crash.  Look for files there.

Running mdb on the files and using ::status and ::stack commands might
give a good enough signature that someone could identify the cause.

(I'm not a CIFS expert, but if you gather some basic log information
about the problem, I imagine one may be able to help.)

-- 
James Carlson         42.703N 71.076W         <carlsonj at workingcode.com>