[OpenIndiana-discuss] Problem with high cpu load (oi_151a)

Thu Oct 20 18:33:07 UTC 2011

Don't know. I don't like to trouble shoot by guess if possible. I rather
follow the evidence to capture the culprit. Use what we know to discover
what we do not know.

We know CS rate in vmstat is high, we know Sys time is high, we know
syscall rate is low, we know it is not a user process therefor it is
kernel. Likely a driver.

So what kernel code is running the most?

What's causing that code to run?

Does that code belong to a driver?

Mike

On Thu, 2011-10-20 at 20:25 +0200, Michael Schuster wrote:

> Hi,
> 
> just found this:
> http://download.oracle.com/docs/cd/E19253-01/820-5245/ghgoc/index.html
> 
> does it help?
> 
> On Thu, Oct 20, 2011 at 20:23, Michael Stapleton
> <michael.stapleton at techsologic.com> wrote:
> > My understanding is that it is not supposed to be a loaded system. We
> > want to know what the load is.
> >
> >
> > gernot at tintenfass:~# intrstat 30
> >
> >      device |      cpu0 %tim      cpu1 %tim
> > -------------+------------------------------
> >    e1000g#0 |         1  0,0         0  0,0
> >      ehci#0 |         0  0,0         4  0,0
> >      ehci#1 |         3  0,0         0  0,0
> >   hci1394#0 |         0  0,0         2  0,0
> >     i8042#1 |         0  0,0         4  0,0
> >      i915#1 |         0  0,0         2  0,0
> >   pci-ide#0 |        15  0,1         0  0,0
> >      uhci#0 |         0  0,0         2  0,0
> >      uhci#1 |         0  0,0         0  0,0
> >      uhci#2 |         3  0,0         0  0,0
> >      uhci#3 |         0  0,0         2  0,0
> >      uhci#4 |         0  0,0         4  0,0
> >
> >      device |      cpu0 %tim      cpu1 %tim
> > -------------+------------------------------
> >    e1000g#0 |         1  0,0         0  0,0
> >      ehci#0 |         0  0,0         3  0,0
> >      ehci#1 |         3  0,0         0  0,0
> >   hci1394#0 |         0  0,0         1  0,0
> >     i8042#1 |         0  0,0         6  0,0
> >      i915#1 |         0  0,0         1  0,0
> >   pci-ide#0 |         3  0,0         0  0,0
> >      uhci#0 |         0  0,0         1  0,0
> >      uhci#1 |         0  0,0         0  0,0
> >      uhci#2 |         3  0,0         0  0,0
> >      uhci#3 |         0  0,0         1  0,0
> >      uhci#4 |         0  0,0         3  0,0
> >
> > gernot at tintenfass:~# vmstat 5 10
> >  kthr      memory            page            disk          faults
> > cpu
> >  r b w   swap  free  re  mf pi po fr de sr cd s0 s1 s2   in   sy   cs us
> > sy id
> >  0 0 0 4243840 1145720 1  6  0  0  0  0  2  0  1  1  1 9767  121 37073 0
> > 54 46
> >  0 0 0 4157824 1059796 4 11  0  0  0  0  0  0  0  0  0 9752  119 37132 0
> > 54 46
> >  0 0 0 4157736 1059752 0  0  0  0  0  0  0  0  0  0  0 9769  113 37194 0
> > 54 46
> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9682  104 36941 0
> > 54 46
> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9769  105 37208 0
> > 54 46
> >  0 0 0 4157728 1059772 0  1  0  0  0  0  0  0  0  0  0 9741  159 37104 0
> > 54 46
> >  0 0 0 4157728 1059772 0  0  0  0  0  0  0  0  0  0  0 9695  127 36931 0
> > 54 46
> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9762  105 37188 0
> > 54 46
> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9723  102 37058 0
> > 54 46
> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9774  105 37263 0
> > 54 46
> >
> > Mike
> >
> >
> > On Thu, 2011-10-20 at 11:02 -0700, Rennie Allen wrote:
> >
> >> Sched is the scheduler itself.  How long did you let this run?  If only
> >> for a couple of seconds, then that number is high, but not ridiculous for
> >> a loaded system, so I think that this output rules out a high context
> >> switch rate.
> >>
> >> Try this command to see if some process is making an excessive number of
> >> syscalls:
> >>
> >> dtrace -n 'syscall:::entry { @[execname]=count()}'
> >>
> >> If not, then I'd try looking at interrupts...
> >>
> >>
> >> On 10/20/11 10:52 AM, "Gernot Wolf" <gw.inet at chello.at> wrote:
> >>
> >> >Yeah, I've been able to run this diagnostics on another OI box (at my
> >> >office, so much for OI not being used in production ;)), and noticed
> >> >that there were several values that were quite different. I just don't
> >> >have any idea on the meaning of this figures...
> >> >
> >> >Anyway, here are the results of the dtrace command (I executed the
> >> >command twice, hence two result sets):
> >> >
> >> >gernot at tintenfass:~# dtrace -n 'sched:::off-cpu { @[execname]=count()}'
> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes
> >> >^C
> >> >
> >> >   ipmgmtd                                                           1
> >> >   gconfd-2                                                          2
> >> >   gnome-settings-d                                                  2
> >> >   idmapd                                                            2
> >> >   inetd                                                             2
> >> >   miniserv.pl                                                       2
> >> >   netcfgd                                                           2
> >> >   nscd                                                              2
> >> >   ospm-applet                                                       2
> >> >   ssh-agent                                                         2
> >> >   sshd                                                              2
> >> >   svc.startd                                                        2
> >> >   intrd                                                             3
> >> >   afpd                                                              4
> >> >   mdnsd                                                             4
> >> >   gnome-power-mana                                                  5
> >> >   clock-applet                                                      7
> >> >   sendmail                                                          7
> >> >   xscreensaver                                                      7
> >> >   fmd                                                               9
> >> >   fsflush                                                          11
> >> >   ntpd                                                             11
> >> >   updatemanagernot                                                 13
> >> >   isapython2.6                                                     14
> >> >   devfsadm                                                         20
> >> >   gnome-terminal                                                   20
> >> >   dtrace                                                           23
> >> >   mixer_applet2                                                    25
> >> >   smbd                                                             39
> >> >   nwam-manager                                                     60
> >> >   svc.configd                                                      79
> >> >   Xorg                                                            100
> >> >   sched                                                        394078
> >> >
> >> >gernot at tintenfass:~# dtrace -n 'sched:::off-cpu { @[execname]=count()}'
> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes
> >> >^C
> >> >
> >> >   automountd                                                        1
> >> >   ipmgmtd                                                           1
> >> >   idmapd                                                            2
> >> >   in.routed                                                         2
> >> >   init                                                              2
> >> >   miniserv.pl                                                       2
> >> >   netcfgd                                                           2
> >> >   ssh-agent                                                         2
> >> >   sshd                                                              2
> >> >   svc.startd                                                        2
> >> >   fmd                                                               3
> >> >   hald                                                              3
> >> >   inetd                                                             3
> >> >   intrd                                                             3
> >> >   hald-addon-acpi                                                   4
> >> >   nscd                                                              4
> >> >   gnome-power-mana                                                  5
> >> >   sendmail                                                          5
> >> >   mdnsd                                                             6
> >> >   devfsadm                                                          8
> >> >   xscreensaver                                                      9
> >> >   fsflush                                                          10
> >> >   ntpd                                                             14
> >> >   updatemanagernot                                                 16
> >> >   mixer_applet2                                                    21
> >> >   isapython2.6                                                     22
> >> >   dtrace                                                           24
> >> >   gnome-terminal                                                   24
> >> >   smbd                                                             39
> >> >   nwam-manager                                                     58
> >> >   zpool-rpool                                                      65
> >> >   svc.configd                                                      79
> >> >   Xorg                                                             82
> >> >   sched                                                        369939
> >> >
> >> >So, quite obviously there is one executable standing out here, "sched",
> >> >now what's the meaning of this figures?
> >> >
> >> >Regards,
> >> >Gernot Wolf
> >> >
> >> >
> >> >Am 20.10.11 19:22, schrieb Michael Stapleton:
> >> >> Hi Gernot,
> >> >>
> >> >> You have a high context switch rate.
> >> >>
> >> >> try
> >> >> #dtrace -n 'sched:::off-cpu { @[execname]=count()}'
> >> >>
> >> >> For a few seconds to see if you can get the name of and executable.
> >> >>
> >> >> Mike
> >> >> On Thu, 2011-10-20 at 18:44 +0200, Gernot Wolf wrote:
> >> >>
> >> >>> Hello all,
> >> >>>
> >> >>> I have a machine here at my home running OpenIndiana oi_151a, which
> >> >>> serves as a NAS on my home network. The original install was
> >> >>>OpenSolaris
> >> >>> 2009.6 which was later upgraded to snv_134b, and recently to oi_151a.
> >> >>>
> >> >>> So far this OSOL (now OI) box has performed excellently, with one major
> >> >>> exception: Sometimes, after a reboot, the cpu load was about 50-60%,
> >> >>> although the system was doing nothing. Until recently, another reboot
> >> >>> solved the issue.
> >> >>>
> >> >>> This does not work any longer. The system has always a cpu load of
> >> >>> 50-60% when idle (and higher of course when there is actually some work
> >> >>> to do).
> >> >>>
> >> >>> I've already googled the symptoms. This didn't turn up very much useful
> >> >>> info, and the few things I found didn't apply to my problem. Most
> >> >>> noticably was this problem which could be solved by disabling cpupm in
> >> >>> /etc/power.conf, but trying that didn't show any effect on my system.
> >> >>>
> >> >>> So I'm finally out of my depth. I have to admit that my knowledge of
> >> >>> Unix is superficial at best, so I decided to try looking for help here.
> >> >>>
> >> >>> I've run several diagnostic commands like top, powertop, lockstat etc.
> >> >>> and attached the results to this email (I've zipped the results of
> >> >>>kstat
> >> >>> because they were>1MB).
> >> >>>
> >> >>> One important thing is that when I boot into the oi_151a live dvd
> >> >>> instead of booting into the installed system, I also get the high cpu
> >> >>> load. I mention this because I have installed several things on my OI
> >> >>> box like vsftpd, svn, netstat etc. I first thought that this problem
> >> >>> might be caused by some of this extra stuff, but getting the same
> >> >>>system
> >> >>> when booting the live dvd ruled that out (I think).
> >> >>>
> >> >>> The machine is a custom build medium tower:
> >> >>> S-775 Intel DG965WHMKR ATX mainbord
> >> >>> Intel Core 2 Duo E4300 CPU 1.8GHz
> >> >>> 1x IDE DVD recorder
> >> >>> 1x IDE HD 200GB (serves as system drive)
> >> >>> 6x SATA II 1.5TB HD (configured as zfs raidz2 array)
> >> >>>
> >> >>> I have to solve this problem. Although the system runs fine and
> >> >>> absolutely serves it's purpose, having the cpu at 50-60% load
> >> >>>constantly
> >> >>> is a waste of energy and surely a rather unhealthy stress on the
> >> >>>hardware.
> >> >>>
> >> >>> Anyone any ideas...?
> >> >>>
> >> >>> Regards,
> >> >>> Gernot Wolf
> >> >>> _______________________________________________
> >> >>> OpenIndiana-discuss mailing list
> >> >>> OpenIndiana-discuss at openindiana.org
> >> >>> http://openindiana.org/mailman/listinfo/openindiana-discuss
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> OpenIndiana-discuss mailing list
> >> >> OpenIndiana-discuss at openindiana.org
> >> >> http://openindiana.org/mailman/listinfo/openindiana-discuss
> >> >>
> >> >
> >> >_______________________________________________
> >> >OpenIndiana-discuss mailing list
> >> >OpenIndiana-discuss at openindiana.org
> >> >http://openindiana.org/mailman/listinfo/openindiana-discuss
> >>
> >>
> >>
> >> _______________________________________________
> >> OpenIndiana-discuss mailing list
> >> OpenIndiana-discuss at openindiana.org
> >> http://openindiana.org/mailman/listinfo/openindiana-discuss
> >
> >
> > _______________________________________________
> > OpenIndiana-discuss mailing list
> > OpenIndiana-discuss at openindiana.org
> > http://openindiana.org/mailman/listinfo/openindiana-discuss
> >
> 
> 
>