[OpenIndiana-discuss] Troubleshooting OpenIndiana network on vSphere 5.5

Richard Elling richard.elling at richardelling.com
Sun Oct 20 23:06:29 UTC 2013


On Oct 20, 2013, at 6:52 AM, "Chris Murray" <chrismurray84 at gmail.com> wrote:

> Hi all,
> 
> I'm hoping for some troubleshooting advice. I have an OpenIndiana
> oi_151a8 virtual machine which was functioning correctly on vSphere 5.1
> but now isn't on vSphere 5.5 (ESXi-5.5.0-1331820-standard)
> 
> A small corner of my network infrastructure has a vSphere host upon
> which live two virtual machines:
> ape - "Debian Linux ape 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC
> 2012 x86_64 GNU/Linux", uses USB passthrough to read from a APC UPS and
> e-mail me when power is lost
> giraffe - oi_151a8, serves up virtual machine images over NFS.
> 
> Since the upgrade of vSphere from 5.1 to 5.5, virtual machines on other
> hosts whose VMDKs are on this NFS mount are now very slow. Putty
> sessions to the oi_151a8 VM also 'stutter', and I see patterns in ping,
> such as these:
> 
> Reply from 192.168.0.13: bytes=32 time=1367ms TTL=255
> Reply from 192.168.0.13: bytes=32 time<1ms TTL=255
> Reply from 192.168.0.13: bytes=32 time<1ms TTL=255
> Reply from 192.168.0.13: bytes=32 time=1369ms TTL=255
> Reply from 192.168.0.13: bytes=32 time<1ms TTL=255
> Reply from 192.168.0.13: bytes=32 time<1ms TTL=255
> Reply from 192.168.0.13: bytes=32 time=1356ms TTL=255
> Reply from 192.168.0.13: bytes=32 time<1ms TTL=255
> Reply from 192.168.0.13: bytes=32 time<1ms TTL=255
> Reply from 192.168.0.13: bytes=32 time=1376ms TTL=255
> Reply from 192.168.0.13: bytes=32 time<1ms TTL=255
> Reply from 192.168.0.13: bytes=32 time<1ms TTL=255
> Reply from 192.168.0.13: bytes=32 time<1ms TTL=255
> Request timed out.
> 
> At the same time, pings to the neighbouring VM (ape), or the host follow
> the normal "time<1ms" pattern, as do pings to other random machines on
> the network. I've therefore ruled out switch infrastructure, including
> possibly the vSwitch inside this vSphere host given that the 'giraffe'
> VM exhibits a problem whereas 'ape' does not.
> 
> Interestingly, if I power down VMs whose storage lives on giraffe, the
> pings return to sub 1ms. 
> 
> I am drawing the conclusion that this is some symptom of the combination
> of OI, vSphere 5.5 & network load, although I'm not sure where to turn
> next.
> 
> Tried:
> "zpool scrub rpool" - to induce high read load on the SSD in the vSphere
> host. This may look like a strange thing to test, but I've seen odd
> effects on Windows machines whose storage is struggling in the past.
> Created a test pool on SSD and induced write load using "cat /dev/zero >
> /testpool/zerofile".
> "zpool scrub giraffepool" - to induce high read load on the spinning
> drives. Still no effect from these three tests, further hinting that
> it's network load which is a trigger.
> Checked that ipfilter is off with the following, yet there is a message
> in dmesg: "IP Filter: v4.1.9, running."
> 
> chris at giraffe:~# svcs -xv ipfilter
> svc:/network/ipfilter:default (IP Filter)
> State: disabled since October 20, 2013 12:17:02 PM UTC
> Reason: Disabled by an administrator.
>   See: http://illumos.org/msg/SMF-8000-05
>   See: man -M /usr/share/man -s 5 ipfilter
> Impact: This service is not running.
> 
> Haven't tried yet:
> Installing OI again in another VM to see if the problem is localised to
> giraffe, since I'd also have to induce load to be confident of the issue
> existing or not.
> 
> I'm using the e1000 NIC in vSphere and don't have VM tools installed.
> 
> Any troubleshooting advice to help me focus somewhere would be
> appreciated.

Check your network configuration: routes, netmasks, MTU, dup IPs, etc.
 -- richard

--

Richard.Elling at RichardElling.com
+1-760-896-4422





More information about the OpenIndiana-discuss mailing list