[OpenIndiana-discuss] CIFS performance issues

Robin Axelsson gu99roax at student.chalmers.se
Tue Jan 24 15:39:42 UTC 2012


The system I'm using is not that "beefy". It's a 4-core Phenom II using 
a server grade hard drive as system drive and 8 consumer grade drives 
for the storage pool that are behind an LSI SAS 1068e controller. I have 
4GB RAM in it.

I have experienced freeze-ups due to failing hard drives in the storage 
pool in the past. When they happened, they affected the CIFS connection 
(of course) but not the SSH connection. Moreover, I could see errors 
with "iostat -En". I don't know if you have iostat in Linux but I'm 
afraid you don't.

I experienced a series of shorter freeze-ups today (3-5 seconds long) 
while monitoring the system using "System Monitor" through the 
'vncserver' and 'top' over SSH. Those freeze-ups affected th CIFS 
connection, SSH, and VNC connection (but did not sever them). The 
freeze-ups were not long enough so that I could get to check the RDP 
connection to the VM.

When those freeze-ups occurred, the system monitor gracefully showed 
this as a dip in the real-time network history chart so these freeze-ups 
don't seem to stagger the operation of the network monitor. The CPU 
utilization was around 10-15% and the memory usage was around 13.5% 
(540MB) all the time so I don't think capping the ARC would do much good.

I looked into the /var/adm/messages and found the

nwamd[99]: [ID 234669 daemon.error] 3: nwamd_door_switch: need 
solaris.network.autoconf.read for request type 1

errors during the time. I'll look more carefully next time and see if 
the time-stamps of these entries match the time at which I experience 
those freeze-ups. I suspect that they do. No errors are found with 
iostat -E. I'll also look into the iowait to see if it will give any 
clues, I'm not sure though how to keep a "history" of iowait the way 
system monitor keeps a history of cpu utilization, memory usage and 
network activity.

I have also been suggested to try out the prestable version of OI and 
see if theses freeze-ups occur when using static IP (i.e. not nwam).

Robin.


On 2012-01-24 06:39, Robbie Crash wrote:
> I had problems that sound nearly identical to what you're describing when
> running ZFS Native under Ubuntu, but without the VM aspect. They seemed to
> happen when the server would begin to flush memory after large reads or
> writes to the ZFS pool. How much RAM does your machine have? Have you
> considered evil tuning your ARC cache for testing?  SSH would disconnect
> and fileshares would become unavailable.
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache
>
>
> What is the rest of the system reporting? CPU? Memory in use? IO Wait? Are
> you using consumer grade hard drives? These could be doing their lovely 2
> minute read recovery thing and causing headaches with the pool access. Does
> the host have any CIFS shares that you can attempt to access while the
> guest is frozen?
>
> I found that forcing ZFS to stay 2.5GB under max, rather than the
> default(?) 1GB improved stability vastly.
>
> I haven't had the same issues after moving to OI, but I've also quadrupled
> the amount of RAM in my box. Sorry if any of this is horribly off the mark,
> most of my ZFS/CIFS/SMB problems happened while running ZFS on Ubuntu, and
> I'm pretty new to OI.
>
> On Mon, Jan 23, 2012 at 16:17, Open Indiana<openindiana at out-side.nl>  wrote:
>
>> What happens if you disable nwam and use the basic/manual ifconfig setup?
>>
>>
>> -----Original Message-----
>> From: Robin Axelsson [mailto:gu99roax at student.chalmers.se]
>> Sent: maandag 23 januari 2012 15:10
>> To: openindiana-discuss at openindiana.org
>> Subject: Re: [OpenIndiana-discuss] CIFS performance issues
>>
>> No, I'm not doing anything in particular in the virtual machine. The media
>> file is played on another computer in the (physical) network over CIFS.
>> Over
>> the network I also access the server using Remote Desktop/Terminal Services
>> to communicate to the virtual machine (using the VirtualBox RDP interface,
>> i.e. not the guest OS RDP), VNC (to access OI using vncserver) and SSH (to
>> OI).
>>
>> I wouldn't say that the entire server stops responding, only the connection
>> to CIFS and SSH. I wasn't running VNC when it happened yesterday so I don't
>> know about it, but the RDP connection and the Virtual Machine inside this
>> server was unaffected while CIFS and SSH was frozen.
>>
>> I tried today to start the virtual machine but it failed because it could
>> not find the connection (e1000g2):
>>
>> "Error: failed to start machine. Error message: Failed to open/create the
>> internal network 'HostInterfaceNetworking-e1000
>> g2 - Intel PRO/1000 Gigabit Ethernet' (VERR_SUPDRV_COMPONENT_NOT_FOUND).
>> Failed to attach the network LUN (VERR_SUPDRV_COMPONENT_NOT_FOUND).
>> Unknown error creating VM (VERR_SUPDRV_COMPONENT_NOT_FOUND)"
>>
>> ifconfig -a returns:
>> ...
>> e1000g1: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4>  mtu
>> 1500 index 2
>>          inet 10.40.137.185 netmask ffffff00 broadcast 10.40.137.255
>> e1000g2: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4>  mtu
>> 1500 index 3
>>          inet 10.40.137.196 netmask ffffff00 broadcast 10.40.137.255
>> rge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4>  mtu 1500
>> index
>> 4
>>          inet 0.0.0.0 netmask ff000000
>> ...
>>
>> i.e. e1000g1 and e1000g2 appears to be running just fine, wtf !?! I found
>> the following entries in the /var/adm/messages:
>>
>> Jan 23 13:50:49<computername>  nwamd[95]: [ID 234669 daemon.error] 3:
>> nwamd_door_switch: need solaris.network.autoconf.read for request type 1
>> Jan
>> 23 13:56:59<computername>  last message repeated 75 times Jan 23 13:57:04
>> <computername>  nwamd[95]: [ID 234669 daemon.error] 3:
>> nwamd_door_switch: need solaris.network.autoconf.read for request type 1
>> Jan
>> 23 13:58:19<computername>  last message repeated 15 times Jan 23 13:58:22
>> <computername>  gnome-session[916]: [ID 702911 daemon.warning] WARNING:
>> Unable to determine session: Unable to lookup session information for
>> process '916'
>> Jan 23 13:58:24<computername>  nwamd[95]: [ID 234669 daemon.error] 3:
>> nwamd_door_switch: need solaris.network.autoconf.read for request type 1
>> Jan
>> 23 14:03:24<computername>  last message repeated 60 times Jan 23 14:03:26
>> <computername>  gnome-session[916]: [ID 702911 daemon.warning] WARNING:
>> Unable to determine session: Unable to lookup session information for
>> process '916'
>> Jan 23 14:03:29<computername>  nwamd[95]: [ID 234669 daemon.error] 3:
>> nwamd_door_switch: need solaris.network.autoconf.read for request type 1
>> Jan
>> 23 14:03:34<computername>  last message repeated 1 time Jan 23 14:03:39
>> <computername>  nwamd[95]: [ID 234669 daemon.error] 3:
>> nwamd_door_switch: need solaris.network.autoconf.read for request type 1
>>
>> Some errors here... I looked into the log of the nwam service
>> (/var/svc/log/network-physical\:nwam.log):
>>
>> [ Jan 23 13:03:15 Enabled. ]
>> [ Jan 23 13:03:16 Executing start method ("/lib/svc/method/net-nwam
>> start").
>> ]
>> /lib/svc/method/net-nwam[548]: /sbin/ibd_upgrade: not found [No such file
>> or
>> directory] [ Jan 23 13:03:17 Method "start" exited with status 0. ] [ Jan
>> 23
>> 13:03:17 Rereading configuration. ] [ Jan 23 13:03:17 Executing refresh
>> method ("/lib/svc/method/net-nwam refresh"). ] [ Jan 23 13:03:17 Method
>> "refresh" exited with status 0. ]
>>
>> nothing remarkable here... I investigated the issue on VBox forums and this
>> issue was resolved by the rem_drv/add_drv vboxflt commands. It's not the
>> first time I've had this issue and one of the people at the forums claims
>> that this issue occurs after every third powercycle/reboot. It was hinted
>> that VBox doesn't like dynamic IP addresses so I have also given e1000g2 a
>> fixed address in the router (I configured the DHCP server in the router to
>> always give the same IP to the MAC address of the e1000g2 connection). I've
>> done it on the e1000g1 already, otherwise it would be impossible to ssh to
>> the server from the "outside world".
>>
>> Robin.
>>
>>
>> On 2012-01-23 11:40, Open Indiana wrote:
>>> Ok,
>>>
>>> So if I read it correct your virtual machine is playing an audio file
>>> and then the server stops responding. That could mean the hardware
>>> that virtualbox uses to play the soundfile if flooded or that the
>>> drivers of the soundcard in your server/PC are not working very well?
>>> What soundcard are you using?
>>>
>>>
>>> -----Original Message-----
>>> From: Robin Axelsson [mailto:gu99roax at student.chalmers.se]
>>> Sent: zondag 22 januari 2012 23:38
>>> To: openindiana-discuss at openindiana.org
>>> Subject: Re: [OpenIndiana-discuss] CIFS performance issues
>>>
>>> I don't understand what you mean with PCI-x settings and where to
>>> check them out. The hardware is not PCI-X, it is PCIe. The affected
>>> LSI HBA is a discrete PCIe card that operates in IT-mode. As in system
>>> logs I assume you mean /var/adm/messages and I could not find anything
>> there.
>>> If this was only a hard disk controller issue (I made sure that there
>>> are enough lanes for it) then I wouldn't expect applications such as
>>> SSH to be affected by it.
>>>
>>> The settings of the Intel NIC card is not in the BIOS, at least not
>>> what I can see (i.e. there is no visible BIOS of the discrete NIC like
>>> it is for the LSI SAS controller during POST). So, I'm not entirely
>>> sure what settings for the NIC you are referring to.
>>> Robin.
>>>
>>>
>>> On 2012-01-22 20:28, Open Indiana wrote:
>>>> A very stupid answer, but have you looked at the bios and inspected
>>>> the settings of the network devices and /or PCIx ? How is your bios
>>>> setup (AHCI or raid or ??) ?
>>>>
>>>> Do you see any error in the system logs?
>>>>
>>>> To my opinion your system swallows in the datatransfers. Either on the
>>>> NIC<->montherboard side or at the montherboard<->    harddiskcontroller
>>> side.
>>>> Do your extra NIC's and the LSI share the same PCI-x settings? Do
>>>> they both support all settings?
>>>>
>>>> B,
>>>>
>>>> Roelof
>>>> -----Original Message-----
>>>> From: Robin Axelsson [mailto:gu99roax at student.chalmers.se]
>>>> Sent: zondag 22 januari 2012 19:38
>>>> To: OpenIndiana-discuss at openindiana.org
>>>> Subject: [OpenIndiana-discuss] CIFS performance issues
>>>>
>>>> In the past, I used OpenSolaris b134 which I then updated to
>>>> OpenIndiana
>>>> b148 and never did I experience performance issues related to the
>>>> network connection (and that was when using two of the "infamous"
>>>> RTL8111DL OnBoard ports). Now that I have swapped the motherboard and
>>>> the hard drive and later added a 2-port Intel EXPI9402PT NIC (because
>>>> of driver issues with the Realtek NIC that wasn't there before), I
>>>> performed a fresh install of OpenIndiana.
>>>>
>>>> Since then I experience intermittent network freeze-ups that I cannot
>>>> link to faults of the storage pool (iostat -E returns 0 errors). I
>>>> have had this issue both with the dual port Intel controller as well
>>>> as with a single port Intel controller (EXPI9400PT) and the Realtek
>>>> 8111E OnBoard NIC. The storage pool is behind an LSI MegaRAID 1068e
>>>> based controller using no port extenders.
>>>>
>>>> In detail (9400PT+8111E):
>>>> -------------------------
>>>> I was running a Virtual Machine with VirtualBox 3.2.14 with (1) a
>>>> bridged network connection and was accessed over the network using
>>>> (2) VBox RDP connection and (3) a ZFS based CIFS share to be accessed
>>>> from a Windows computer over the network. These applications were
>>>> administrated both over
>>>> (4) SSH (port 2244) and (5) VNC (using vncserver). A typical start of
>>>> the VM was done with 'screen VBoxHeadless --startvm ...'
>>>>
>>>> I assigned the network ports the following way:
>>>>
>>>> e1000g: VBox RDP, VNC, SSH
>>>> rge0: Virtual Machine Network Connection (Bridged)
>>>>
>>>> I tried various combinations but the connection froze intermittently
>>>> for all applications. The bridged network connection was worst. When
>>>> I SSHed over rge0, the connection was frequently severed which is was
>>>> not
>>> over e1000.
>>>> So I pulled the plug on the rg0 and let everything go through the
>>>> e1000 connection. freeze-ups became more frequent and it seemed like
>>>> the Bridged connection was causing this issue because the connection
>>>> didn't freeze like that when the VM wasn't running.
>>>>
>>>> Note that I didn't assign the CIFS share to any particular port but
>>>> calls to<computername>    were assigned to the e1000 port in the
>>> /etc/inet/hosts file.
>>>> -------------------------
>>>>
>>>> In detail (9402PT):
>>>> -------------------
>>>> In this setup I run essentially the same applications but all through
>>>> the 9402PT which has two ports (e1000g1 and e1000g2). So I assign the
>>>> applications the following way:
>>>>
>>>> e1000g1: VBox RDP, SSH,<computername>    (in /etc/inet/hosts)
>>>> e1000g2: Bridged connection to the virtual machine
>>>>
>>>> So while running the virtual machine on the server, having an open
>>>> SSH connection to it and a command prompt pointing (cd x:\) at the
>>>> CIFS share (which is mapped as a network drive, say "X:") I started a
>>>> media player and played an audio file over the CIFS share which made
>>>> the
>>> connection freeze.
>>>> The freezing affected the media player and the command prompt but the
>>>> RDP connection worked and access to internet inside the VM was flawless.
>>>> The SSH connection was frozen as well. After a few minutes it became
>>>> responsive and iostat -E reported no errors. The command prompt and
>>>> the media player were still frozen but "ls<path to CIFS shared
>> contents>"
>>>> worked fine over the SSH connection. Shortly after that the CIFS
>>>> connection came back and things seem to run ok.
>>>>
>>>> So in conclusion the freeze-ups are still there but less frequent. I
>>>> have tried VirtualBox 4.1.8 but the ethernet connection is worse with
>>>> that version which is why I downgraded to 3.2.14 (which was published
>>>> _after_ 4.1.8).
>>>> -------------------
>>>>
>>>> These issues occur on server grade hardware using drivers that
>>>> are/were certified by Sun (as I understand it). Moreover, CIFS and
>>>> ZFS are the core functionality of OpenIndiana so it is quite
>>>> essential that the network works properly and is stable.
>>>>
>>>> I'm sorely tempted to issue a bug report but I would want some advice
>>>> on how to troubleshoot and provide relevant bug reports. There are no
>>>> entries in the /var/adm/messages that are related to the latest
>>>> freeze-up mentioned above and I couldn't find any when running the
>>>> prior setups. These freeze-ups don't happen all the time so it isn't
>>>> easy to consistently reproduce them.
>>>>
>>>> Robin.
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenIndiana-discuss mailing list
>>>> OpenIndiana-discuss at openindiana.org
>>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenIndiana-discuss mailing list
>>>> OpenIndiana-discuss at openindiana.org
>>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>>
>>>> .
>>>>
>>>
>>> _______________________________________________
>>> OpenIndiana-discuss mailing list
>>> OpenIndiana-discuss at openindiana.org
>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>
>>>
>>>
>>> _______________________________________________
>>> OpenIndiana-discuss mailing list
>>> OpenIndiana-discuss at openindiana.org
>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>
>>> .
>>>
>>
>>
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>
>>
>>
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>
>
>






More information about the OpenIndiana-discuss mailing list