[OpenIndiana-discuss] NFS exported dataset crashes the system

Paul van der Zwan paulz at vanderzwan.org
Wed Apr 10 14:35:06 UTC 2013


On 9 Apr 2013, at 3:13 , Peter Wood <peterwood.sd at gmail.com> wrote:

> I've asked the ZFS discussion list for help on this but now I have more
> information and it looks like a bug in the drivers or something.
> 
> I have number of Dell PE R710 and PE 2950 servers running OpenSolaris, OI
> 151a and OI 151a.7. All these systems are used as storage servers, clean OS
> install, no extra services running. The systems are NFS exporting a lot of
> ZFS datasets that are mounted on about ten CentOS-5.9 systems.
> 
> The above setup has been working for 2+ years with no problem.
> 
> Recently we bought two Supermicro systems:
>  Supermicro X9DRH-iF
>  Xeon E5-2620 @ 2.0 GHz 6-Core
>  128GB RAM
>  LSI SAS9211-8i HBA
>  32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
> 
> I installed OI151.a.7 on them and started migrating data from the old Dell
> servers (zfs send/receive).
> 
> Things have been working great for about two months until I migrated one
> particular directory to one of the new Supermicro systems and after about
> two days the system crashed. No network connectivity, black console, no
> response to keyboard keys, no activity lights (no error lights either) on
> the chassis. The only way out is to hit the reset button. Nothing in the
> logs as far as I can tell. Log entries just stop when the system crashes.
> 
> In the following two months I did a lot of testing and a lot of trips to
> the colo in the middle of the night and the observation is that regardless
> of the OS everything works on the Dell servers. As soon as I move that
> directory to any of the Supermicro servers with OI151.a.7 it will crash
> them within 2 hours up to 5 days.
> 
> The Supermicro servers can be idle, exporting nothing, or can be exporting
> 15+ other directories with high IOPS and working for months with no
> problems but as soon as I have them export that directory they'll crash in
> 5 days the most.
> 
> There is only one difference between that directory an all others exported
> directories. One of the client systems that mounts it and writes to it is
> an old Debian 5.0 system. No idea why that would crash a Supermicro system
> but not a Dell system.
> 
> We worked directly with LSI developers and upgraded the firmware to some
> unpublished, prerelease development version to no avail. We disabled all
> power saving features and CPU C states in the BIOS and nothing changed.
> 
> Any idea?

I had a similar kind of problem where a VirtualBox Freebsd 9.1 VM could hang the server.
It had /usr/src and /usr/obj NFS mounted from the OI a7 box it was running on.
The are separate NFS shared datasets in on of my 3 pools.

When I ran a make buildworld in that VM it consistently locked up the OI host, no console access,
no network access ( not even ping ).
As a test I switched to NFSv4 instead of NFSv3 and I have not seen a hang since.
So it looked like a heavy NFSv3 load was the issue.

	Paul




More information about the OpenIndiana-discuss mailing list