[OpenIndiana-discuss] Problem with high cpu load (oi_151a)

Michael Stapleton michael.stapleton at techsologic.com
Thu Oct 20 19:00:20 UTC 2011


+1

Mike

On Thu, 2011-10-20 at 11:47 -0700, Rennie Allen wrote:

> I'd like to see a run of the script I sent earlier.  I don't trust
> intrstat (not for any particular reason, other than that I have never used
> it)...
> 
> 
> On 10/20/11 11:33 AM, "Michael Stapleton"
> <michael.stapleton at techsologic.com> wrote:
> 
> >Don't know. I don't like to trouble shoot by guess if possible. I rather
> >follow the evidence to capture the culprit. Use what we know to discover
> >what we do not know.
> >
> >We know CS rate in vmstat is high, we know Sys time is high, we know
> >syscall rate is low, we know it is not a user process therefor it is
> >kernel. Likely a driver.
> >
> >So what kernel code is running the most?
> >
> >What's causing that code to run?
> >
> >Does that code belong to a driver?
> >
> >
> >Mike
> >
> >
> >
> >On Thu, 2011-10-20 at 20:25 +0200, Michael Schuster wrote:
> >
> >> Hi,
> >> 
> >> just found this:
> >> http://download.oracle.com/docs/cd/E19253-01/820-5245/ghgoc/index.html
> >> 
> >> does it help?
> >> 
> >> On Thu, Oct 20, 2011 at 20:23, Michael Stapleton
> >> <michael.stapleton at techsologic.com> wrote:
> >> > My understanding is that it is not supposed to be a loaded system. We
> >> > want to know what the load is.
> >> >
> >> >
> >> > gernot at tintenfass:~# intrstat 30
> >> >
> >> >      device |      cpu0 %tim      cpu1 %tim
> >> > -------------+------------------------------
> >> >    e1000g#0 |         1  0,0         0  0,0
> >> >      ehci#0 |         0  0,0         4  0,0
> >> >      ehci#1 |         3  0,0         0  0,0
> >> >   hci1394#0 |         0  0,0         2  0,0
> >> >     i8042#1 |         0  0,0         4  0,0
> >> >      i915#1 |         0  0,0         2  0,0
> >> >   pci-ide#0 |        15  0,1         0  0,0
> >> >      uhci#0 |         0  0,0         2  0,0
> >> >      uhci#1 |         0  0,0         0  0,0
> >> >      uhci#2 |         3  0,0         0  0,0
> >> >      uhci#3 |         0  0,0         2  0,0
> >> >      uhci#4 |         0  0,0         4  0,0
> >> >
> >> >      device |      cpu0 %tim      cpu1 %tim
> >> > -------------+------------------------------
> >> >    e1000g#0 |         1  0,0         0  0,0
> >> >      ehci#0 |         0  0,0         3  0,0
> >> >      ehci#1 |         3  0,0         0  0,0
> >> >   hci1394#0 |         0  0,0         1  0,0
> >> >     i8042#1 |         0  0,0         6  0,0
> >> >      i915#1 |         0  0,0         1  0,0
> >> >   pci-ide#0 |         3  0,0         0  0,0
> >> >      uhci#0 |         0  0,0         1  0,0
> >> >      uhci#1 |         0  0,0         0  0,0
> >> >      uhci#2 |         3  0,0         0  0,0
> >> >      uhci#3 |         0  0,0         1  0,0
> >> >      uhci#4 |         0  0,0         3  0,0
> >> >
> >> > gernot at tintenfass:~# vmstat 5 10
> >> >  kthr      memory            page            disk          faults
> >> > cpu
> >> >  r b w   swap  free  re  mf pi po fr de sr cd s0 s1 s2   in   sy   cs
> >>us
> >> > sy id
> >> >  0 0 0 4243840 1145720 1  6  0  0  0  0  2  0  1  1  1 9767  121
> >>37073 0
> >> > 54 46
> >> >  0 0 0 4157824 1059796 4 11  0  0  0  0  0  0  0  0  0 9752  119
> >>37132 0
> >> > 54 46
> >> >  0 0 0 4157736 1059752 0  0  0  0  0  0  0  0  0  0  0 9769  113
> >>37194 0
> >> > 54 46
> >> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9682  104
> >>36941 0
> >> > 54 46
> >> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9769  105
> >>37208 0
> >> > 54 46
> >> >  0 0 0 4157728 1059772 0  1  0  0  0  0  0  0  0  0  0 9741  159
> >>37104 0
> >> > 54 46
> >> >  0 0 0 4157728 1059772 0  0  0  0  0  0  0  0  0  0  0 9695  127
> >>36931 0
> >> > 54 46
> >> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9762  105
> >>37188 0
> >> > 54 46
> >> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9723  102
> >>37058 0
> >> > 54 46
> >> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9774  105
> >>37263 0
> >> > 54 46
> >> >
> >> > Mike
> >> >
> >> >
> >> > On Thu, 2011-10-20 at 11:02 -0700, Rennie Allen wrote:
> >> >
> >> >> Sched is the scheduler itself.  How long did you let this run?  If
> >>only
> >> >> for a couple of seconds, then that number is high, but not
> >>ridiculous for
> >> >> a loaded system, so I think that this output rules out a high context
> >> >> switch rate.
> >> >>
> >> >> Try this command to see if some process is making an excessive
> >>number of
> >> >> syscalls:
> >> >>
> >> >> dtrace -n 'syscall:::entry { @[execname]=count()}'
> >> >>
> >> >> If not, then I'd try looking at interrupts...
> >> >>
> >> >>
> >> >> On 10/20/11 10:52 AM, "Gernot Wolf" <gw.inet at chello.at> wrote:
> >> >>
> >> >> >Yeah, I've been able to run this diagnostics on another OI box (at
> >>my
> >> >> >office, so much for OI not being used in production ;)), and noticed
> >> >> >that there were several values that were quite different. I just
> >>don't
> >> >> >have any idea on the meaning of this figures...
> >> >> >
> >> >> >Anyway, here are the results of the dtrace command (I executed the
> >> >> >command twice, hence two result sets):
> >> >> >
> >> >> >gernot at tintenfass:~# dtrace -n 'sched:::off-cpu {
> >>@[execname]=count()}'
> >> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes
> >> >> >^C
> >> >> >
> >> >> >   ipmgmtd  
> >> 1
> >> >> >   gconfd-2 
> >> 2
> >> >> >   gnome-settings-d
> >> 2
> >> >> >   idmapd   
> >> 2
> >> >> >   inetd    
> >> 2
> >> >> >   miniserv.pl
> >> 2
> >> >> >   netcfgd  
> >> 2
> >> >> >   nscd     
> >> 2
> >> >> >   ospm-applet
> >> 2
> >> >> >   ssh-agent
> >> 2
> >> >> >   sshd     
> >> 2
> >> >> >   svc.startd
> >> 2
> >> >> >   intrd    
> >> 3
> >> >> >   afpd     
> >> 4
> >> >> >   mdnsd    
> >> 4
> >> >> >   gnome-power-mana
> >> 5
> >> >> >   clock-applet
> >> 7
> >> >> >   sendmail 
> >> 7
> >> >> >   xscreensaver
> >> 7
> >> >> >   fmd      
> >> 9
> >> >> >   fsflush  
> >>11
> >> >> >   ntpd     
> >>11
> >> >> >   updatemanagernot
> >>13
> >> >> >   isapython2.6
> >>14
> >> >> >   devfsadm 
> >>20
> >> >> >   gnome-terminal
> >>20
> >> >> >   dtrace   
> >>23
> >> >> >   mixer_applet2
> >>25
> >> >> >   smbd     
> >>39
> >> >> >   nwam-manager
> >>60
> >> >> >   svc.configd
> >>79
> >> >> >   Xorg     
> >>100
> >> >> >   sched    
> >>394078
> >> >> >
> >> >> >gernot at tintenfass:~# dtrace -n 'sched:::off-cpu {
> >>@[execname]=count()}'
> >> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes
> >> >> >^C
> >> >> >
> >> >> >   automountd
> >> 1
> >> >> >   ipmgmtd  
> >> 1
> >> >> >   idmapd   
> >> 2
> >> >> >   in.routed
> >> 2
> >> >> >   init     
> >> 2
> >> >> >   miniserv.pl
> >> 2
> >> >> >   netcfgd  
> >> 2
> >> >> >   ssh-agent
> >> 2
> >> >> >   sshd     
> >> 2
> >> >> >   svc.startd
> >> 2
> >> >> >   fmd      
> >> 3
> >> >> >   hald     
> >> 3
> >> >> >   inetd    
> >> 3
> >> >> >   intrd    
> >> 3
> >> >> >   hald-addon-acpi
> >> 4
> >> >> >   nscd     
> >> 4
> >> >> >   gnome-power-mana
> >> 5
> >> >> >   sendmail 
> >> 5
> >> >> >   mdnsd    
> >> 6
> >> >> >   devfsadm 
> >> 8
> >> >> >   xscreensaver
> >> 9
> >> >> >   fsflush  
> >>10
> >> >> >   ntpd     
> >>14
> >> >> >   updatemanagernot
> >>16
> >> >> >   mixer_applet2
> >>21
> >> >> >   isapython2.6
> >>22
> >> >> >   dtrace   
> >>24
> >> >> >   gnome-terminal
> >>24
> >> >> >   smbd     
> >>39
> >> >> >   nwam-manager
> >>58
> >> >> >   zpool-rpool
> >>65
> >> >> >   svc.configd
> >>79
> >> >> >   Xorg     
> >>82
> >> >> >   sched    
> >>369939
> >> >> >
> >> >> >So, quite obviously there is one executable standing out here,
> >>"sched",
> >> >> >now what's the meaning of this figures?
> >> >> >
> >> >> >Regards,
> >> >> >Gernot Wolf
> >> >> >
> >> >> >
> >> >> >Am 20.10.11 19:22, schrieb Michael Stapleton:
> >> >> >> Hi Gernot,
> >> >> >>
> >> >> >> You have a high context switch rate.
> >> >> >>
> >> >> >> try
> >> >> >> #dtrace -n 'sched:::off-cpu { @[execname]=count()}'
> >> >> >>
> >> >> >> For a few seconds to see if you can get the name of and
> >>executable.
> >> >> >>
> >> >> >> Mike
> >> >> >> On Thu, 2011-10-20 at 18:44 +0200, Gernot Wolf wrote:
> >> >> >>
> >> >> >>> Hello all,
> >> >> >>>
> >> >> >>> I have a machine here at my home running OpenIndiana oi_151a,
> >>which
> >> >> >>> serves as a NAS on my home network. The original install was
> >> >> >>>OpenSolaris
> >> >> >>> 2009.6 which was later upgraded to snv_134b, and recently to
> >>oi_151a.
> >> >> >>>
> >> >> >>> So far this OSOL (now OI) box has performed excellently, with
> >>one major
> >> >> >>> exception: Sometimes, after a reboot, the cpu load was about
> >>50-60%,
> >> >> >>> although the system was doing nothing. Until recently, another
> >>reboot
> >> >> >>> solved the issue.
> >> >> >>>
> >> >> >>> This does not work any longer. The system has always a cpu load
> >>of
> >> >> >>> 50-60% when idle (and higher of course when there is actually
> >>some work
> >> >> >>> to do).
> >> >> >>>
> >> >> >>> I've already googled the symptoms. This didn't turn up very much
> >>useful
> >> >> >>> info, and the few things I found didn't apply to my problem. Most
> >> >> >>> noticably was this problem which could be solved by disabling
> >>cpupm in
> >> >> >>> /etc/power.conf, but trying that didn't show any effect on my
> >>system.
> >> >> >>>
> >> >> >>> So I'm finally out of my depth. I have to admit that my
> >>knowledge of
> >> >> >>> Unix is superficial at best, so I decided to try looking for
> >>help here.
> >> >> >>>
> >> >> >>> I've run several diagnostic commands like top, powertop,
> >>lockstat etc.
> >> >> >>> and attached the results to this email (I've zipped the results
> >>of
> >> >> >>>kstat
> >> >> >>> because they were>1MB).
> >> >> >>>
> >> >> >>> One important thing is that when I boot into the oi_151a live dvd
> >> >> >>> instead of booting into the installed system, I also get the
> >>high cpu
> >> >> >>> load. I mention this because I have installed several things on
> >>my OI
> >> >> >>> box like vsftpd, svn, netstat etc. I first thought that this
> >>problem
> >> >> >>> might be caused by some of this extra stuff, but getting the same
> >> >> >>>system
> >> >> >>> when booting the live dvd ruled that out (I think).
> >> >> >>>
> >> >> >>> The machine is a custom build medium tower:
> >> >> >>> S-775 Intel DG965WHMKR ATX mainbord
> >> >> >>> Intel Core 2 Duo E4300 CPU 1.8GHz
> >> >> >>> 1x IDE DVD recorder
> >> >> >>> 1x IDE HD 200GB (serves as system drive)
> >> >> >>> 6x SATA II 1.5TB HD (configured as zfs raidz2 array)
> >> >> >>>
> >> >> >>> I have to solve this problem. Although the system runs fine and
> >> >> >>> absolutely serves it's purpose, having the cpu at 50-60% load
> >> >> >>>constantly
> >> >> >>> is a waste of energy and surely a rather unhealthy stress on the
> >> >> >>>hardware.
> >> >> >>>
> >> >> >>> Anyone any ideas...?
> >> >> >>>
> >> >> >>> Regards,
> >> >> >>> Gernot Wolf
> >> >> >>> _______________________________________________
> >> >> >>> OpenIndiana-discuss mailing list
> >> >> >>> OpenIndiana-discuss at openindiana.org
> >> >> >>> http://openindiana.org/mailman/listinfo/openindiana-discuss
> >> >> >>
> >> >> >>
> >> >> >> _______________________________________________
> >> >> >> OpenIndiana-discuss mailing list
> >> >> >> OpenIndiana-discuss at openindiana.org
> >> >> >> http://openindiana.org/mailman/listinfo/openindiana-discuss
> >> >> >>
> >> >> >
> >> >> >_______________________________________________
> >> >> >OpenIndiana-discuss mailing list
> >> >> >OpenIndiana-discuss at openindiana.org
> >> >> >http://openindiana.org/mailman/listinfo/openindiana-discuss
> >> >>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> OpenIndiana-discuss mailing list
> >> >> OpenIndiana-discuss at openindiana.org
> >> >> http://openindiana.org/mailman/listinfo/openindiana-discuss
> >> >
> >> >
> >> > _______________________________________________
> >> > OpenIndiana-discuss mailing list
> >> > OpenIndiana-discuss at openindiana.org
> >> > http://openindiana.org/mailman/listinfo/openindiana-discuss
> >> >
> >> 
> >> 
> >> 
> >
> >
> >_______________________________________________
> >OpenIndiana-discuss mailing list
> >OpenIndiana-discuss at openindiana.org
> >http://openindiana.org/mailman/listinfo/openindiana-discuss
> 
> 
> 
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss




More information about the OpenIndiana-discuss mailing list