[OpenIndiana-discuss] comstar targets dying randomly

Tue Jun 24 19:00:51 UTC 2014

hello

You should monitor number of running stmf tasks versus number of stmf
(comstar) processes. We had such issues - once we hited max number of
processes and second time - there was a bug regarding stmf process pool
scaling (but I think it's Nexenta specific, not Illumos).

keys to observe:
echo stmf_cur_ntasks/D | mdb -k # number of currently serving tasks

echo stmf_nworkers_cur/D | mdb -k # number of running workers

best regards!
-- 
pawel

On Tue, Jun 24, 2014 at 8:21 PM, wim at vandenberge.us <wim at vandenberge.us>
wrote:

> Hello,
>
> I have three OpenIndiana (151A8) servers used as iSCSI targets. All
> servers have
> two 10Gbe interfaces to separate Dell 8024F switches running the latest
> firmware. These servers provide storage for a bank of 16 Windows 2012R2
> virtualization servers, each running 16 virtual machines (Windows 7x64).
> Each
> virtualization server also connected to both 10Gbe switches. iSCSI is
> configured
> to use round-robin. The interfaces and switches are dedicated to iSCSI, all
> other traffic is routed over a separate admin/client network. The
> virtualization
> servers and the iSCSI servers do not share an admin network (the only paths
> between then are the iSCSI networks which are flat class C without a
> gateway.
>
> Everything works fine. When the systems are at their busiest we see a very
> nicely balanced load of  25MB/sec on each initiator's iSCSI interfaces
> with the
> occasionally quick peak close to 100MB on individual machines. load on the
> iSCSI
> servers hovers around 3 and network utilization on each of the six target
> interface sit slightly about 130MB/sec.
>
> However, every week or so, one of the systems will, without warning or log
> that
> I can find, start dropping iSCSI connections. The virtualization servers
> will
> report a loss of the storage volume. Over a period of 30 minutes or so all
> remaining iSCSI connections to that storage server will die and the only
> way to
> get them back is to restart the machine or disable the
> /network/iscsi/target
> service, wait about 2 minutes and then enable it (a simple restart will
> not work
> with a log entry that the service is still running when trying to restart.
>
> This problem occurs on all three servers randomly, sometimes within days,
> sometimes only after a couple of weeks. Servers are good, but commodity
> hardware
> (SuperMicrso, LSI, Seagate, Intel) and configured similarly but not
> identically
> (slightly different motherboards, processors (dual 1.8GHz quadcore Xeon
> min)
>  and memory configurations (none less than 128GB)
>
> My problem is that nothing appears in the logs on the OpenIndiana servers.
> Spying on the network shows that requests are getting to the Open Indiana
> servers but essentially fall in a black hole. I've ruled out problems with
> individual disks, cables and controllers.
>
> Has anyone ever seen this before? Any ideas for something I could look at
> besides the obvious logs?
>
> thanks in advance,
>
> Wim
> _______________________________________________
> openindiana-discuss mailing list
> openindiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
>