[OpenIndiana-discuss] comstar targets dying randomly

Pawel Stefanski pejotes at gmail.com
Fri Jun 27 08:51:57 UTC 2014


Hello Wim!

You cloud set it
in /etc/system:
set stmf:stmf_min_nworkers = 512
set stmf:stmf_max_nworkers = 1024
and to change it instantly - it is not possible on Nexenta 3.1.5 where
reboot is needed.
echo stmf_min_nworkers/W0t512 | mdb -kw
echo stmf_max_nworkers/W0t1024 | mdb -kw

Of course you should set it according to your needs.

best regards!
-- 
pawel


On Thu, Jun 26, 2014 at 5:48 PM, wim at vandenberge.us <wim at vandenberge.us>
wrote:

> Hello Pawel,
>
> Is the maximum number of process you speak of defined somewhere (and can
> it be
> changed)? We've monitored these two numbers for a couple of days now in 5
> minute
> intervals and observed that a) The number of workers is always higher or
> equal
> to the number of tasks; and b) the numbers seem to top out around 200.
>
> We have not had the opportunity to observe the numbers in a failure
> scenario yet
> though.
>
> W
>
>
>
>
>
>
> > On June 24, 2014 at 3:00 PM Pawel Stefanski <pejotes at gmail.com> wrote:
> >
> >
> > hello
> >
> > You should monitor number of running stmf tasks versus number of stmf
> > (comstar) processes. We had such issues - once we hited max number of
> > processes and second time - there was a bug regarding stmf process pool
> > scaling (but I think it's Nexenta specific, not Illumos).
> >
> > keys to observe:
> > echo stmf_cur_ntasks/D | mdb -k # number of currently serving tasks
> >
> > echo stmf_nworkers_cur/D | mdb -k # number of running workers
> >
> > best regards!
> > --
> > pawel
> >
> >
> >
> > On Tue, Jun 24, 2014 at 8:21 PM, wim at vandenberge.us <wim at vandenberge.us>
> > wrote:
> >
> > > Hello,
> > >
> > > I have three OpenIndiana (151A8) servers used as iSCSI targets. All
> > > servers have
> > > two 10Gbe interfaces to separate Dell 8024F switches running the latest
> > > firmware. These servers provide storage for a bank of 16 Windows 2012R2
> > > virtualization servers, each running 16 virtual machines (Windows
> 7x64).
> > > Each
> > > virtualization server also connected to both 10Gbe switches. iSCSI is
> > > configured
> > > to use round-robin. The interfaces and switches are dedicated to
> iSCSI, all
> > > other traffic is routed over a separate admin/client network. The
> > > virtualization
> > > servers and the iSCSI servers do not share an admin network (the only
> paths
> > > between then are the iSCSI networks which are flat class C without a
> > > gateway.
> > >
> > > Everything works fine. When the systems are at their busiest we see a
> very
> > > nicely balanced load of 25MB/sec on each initiator's iSCSI interfaces
> > > with the
> > > occasionally quick peak close to 100MB on individual machines. load on
> the
> > > iSCSI
> > > servers hovers around 3 and network utilization on each of the six
> target
> > > interface sit slightly about 130MB/sec.
> > >
> > > However, every week or so, one of the systems will, without warning or
> log
> > > that
> > > I can find, start dropping iSCSI connections. The virtualization
> servers
> > > will
> > > report a loss of the storage volume. Over a period of 30 minutes or so
> all
> > > remaining iSCSI connections to that storage server will die and the
> only
> > > way to
> > > get them back is to restart the machine or disable the
> > > /network/iscsi/target
> > > service, wait about 2 minutes and then enable it (a simple restart will
> > > not work
> > > with a log entry that the service is still running when trying to
> restart.
> > >
> > > This problem occurs on all three servers randomly, sometimes within
> days,
> > > sometimes only after a couple of weeks. Servers are good, but commodity
> > > hardware
> > > (SuperMicrso, LSI, Seagate, Intel) and configured similarly but not
> > > identically
> > > (slightly different motherboards, processors (dual 1.8GHz quadcore Xeon
> > > min)
> > > and memory configurations (none less than 128GB)
> > >
> > > My problem is that nothing appears in the logs on the OpenIndiana
> servers.
> > > Spying on the network shows that requests are getting to the Open
> Indiana
> > > servers but essentially fall in a black hole. I've ruled out problems
> with
> > > individual disks, cables and controllers.
> > >
> > > Has anyone ever seen this before? Any ideas for something I could look
> at
> > > besides the obvious logs?
> > >
> > > thanks in advance,
> > >
> > > Wim
> > > _______________________________________________
> > > openindiana-discuss mailing list
> > > openindiana-discuss at openindiana.org
> > > http://openindiana.org/mailman/listinfo/openindiana-discuss
> > >
> > _______________________________________________
> > openindiana-discuss mailing list
> > openindiana-discuss at openindiana.org
> > http://openindiana.org/mailman/listinfo/openindiana-discuss
> _______________________________________________
> openindiana-discuss mailing list
> openindiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
>


More information about the openindiana-discuss mailing list