[OpenIndiana-discuss] NFS exported dataset crashes the system

Marcel Telka marcel at telka.sk
Wed Apr 10 14:46:56 UTC 2013


On Wed, Apr 10, 2013 at 04:35:06PM +0200, Paul van der Zwan wrote:
> 
> On 9 Apr 2013, at 3:13 , Peter Wood <peterwood.sd at gmail.com> wrote:
> 
> > I've asked the ZFS discussion list for help on this but now I have more
> > information and it looks like a bug in the drivers or something.
> > 
> > I have number of Dell PE R710 and PE 2950 servers running OpenSolaris, OI
> > 151a and OI 151a.7. All these systems are used as storage servers, clean OS
> > install, no extra services running. The systems are NFS exporting a lot of
> > ZFS datasets that are mounted on about ten CentOS-5.9 systems.
> > 
> > The above setup has been working for 2+ years with no problem.
> > 
> > Recently we bought two Supermicro systems:
> >  Supermicro X9DRH-iF
> >  Xeon E5-2620 @ 2.0 GHz 6-Core
> >  128GB RAM
> >  LSI SAS9211-8i HBA
> >  32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
> > 
> > I installed OI151.a.7 on them and started migrating data from the old Dell
> > servers (zfs send/receive).
> > 
> > Things have been working great for about two months until I migrated one
> > particular directory to one of the new Supermicro systems and after about
> > two days the system crashed. No network connectivity, black console, no
> > response to keyboard keys, no activity lights (no error lights either) on
> > the chassis. The only way out is to hit the reset button. Nothing in the
> > logs as far as I can tell. Log entries just stop when the system crashes.
> > 
> > In the following two months I did a lot of testing and a lot of trips to
> > the colo in the middle of the night and the observation is that regardless
> > of the OS everything works on the Dell servers. As soon as I move that
> > directory to any of the Supermicro servers with OI151.a.7 it will crash
> > them within 2 hours up to 5 days.
> > 
> > The Supermicro servers can be idle, exporting nothing, or can be exporting
> > 15+ other directories with high IOPS and working for months with no
> > problems but as soon as I have them export that directory they'll crash in
> > 5 days the most.
> > 
> > There is only one difference between that directory an all others exported
> > directories. One of the client systems that mounts it and writes to it is
> > an old Debian 5.0 system. No idea why that would crash a Supermicro system
> > but not a Dell system.
> > 
> > We worked directly with LSI developers and upgraded the firmware to some
> > unpublished, prerelease development version to no avail. We disabled all
> > power saving features and CPU C states in the BIOS and nothing changed.
> > 
> > Any idea?
> 
> I had a similar kind of problem where a VirtualBox Freebsd 9.1 VM could hang the server.
> It had /usr/src and /usr/obj NFS mounted from the OI a7 box it was running on.
> The are separate NFS shared datasets in on of my 3 pools.
> 
> When I ran a make buildworld in that VM it consistently locked up the OI host, no console access,
> no network access ( not even ping ).
> As a test I switched to NFSv4 instead of NFSv3 and I have not seen a hang since.
> So it looked like a heavy NFSv3 load was the issue.

Please try to get a crash dump file when the system is in hung state.
I'm interested to analyze the crash dump file.


Thanks.

-- 
+-------------------------------------------+
| Marcel Telka   e-mail:   marcel at telka.sk  |
|                homepage: http://telka.sk/ |
|                jabber:   marcel at jabber.sk |
+-------------------------------------------+



More information about the OpenIndiana-discuss mailing list