[OpenIndiana-discuss] comstar targets dying randomly

wim at vandenberge.us wim at vandenberge.us
Tue Jun 24 18:21:34 UTC 2014


Hello,

I have three OpenIndiana (151A8) servers used as iSCSI targets. All servers have
two 10Gbe interfaces to separate Dell 8024F switches running the latest
firmware. These servers provide storage for a bank of 16 Windows 2012R2
virtualization servers, each running 16 virtual machines (Windows 7x64). Each
virtualization server also connected to both 10Gbe switches. iSCSI is configured
to use round-robin. The interfaces and switches are dedicated to iSCSI, all
other traffic is routed over a separate admin/client network. The virtualization
servers and the iSCSI servers do not share an admin network (the only paths
between then are the iSCSI networks which are flat class C without a gateway.

Everything works fine. When the systems are at their busiest we see a very
nicely balanced load of  25MB/sec on each initiator's iSCSI interfaces with the
occasionally quick peak close to 100MB on individual machines. load on the iSCSI
servers hovers around 3 and network utilization on each of the six target
interface sit slightly about 130MB/sec.

However, every week or so, one of the systems will, without warning or log that
I can find, start dropping iSCSI connections. The virtualization servers will
report a loss of the storage volume. Over a period of 30 minutes or so all
remaining iSCSI connections to that storage server will die and the only way to
get them back is to restart the machine or disable the /network/iscsi/target
service, wait about 2 minutes and then enable it (a simple restart will not work
with a log entry that the service is still running when trying to restart.

This problem occurs on all three servers randomly, sometimes within days,
sometimes only after a couple of weeks. Servers are good, but commodity hardware
(SuperMicrso, LSI, Seagate, Intel) and configured similarly but not identically
(slightly different motherboards, processors (dual 1.8GHz quadcore Xeon min)
 and memory configurations (none less than 128GB)

My problem is that nothing appears in the logs on the OpenIndiana servers.
Spying on the network shows that requests are getting to the Open Indiana
servers but essentially fall in a black hole. I've ruled out problems with
individual disks, cables and controllers.

Has anyone ever seen this before? Any ideas for something I could look at
besides the obvious logs?

thanks in advance,

Wim


More information about the openindiana-discuss mailing list