[OpenIndiana-discuss] Problem with high cpu load (oi_151a)

Thu Oct 20 18:47:44 UTC 2011

I'd like to see a run of the script I sent earlier.  I don't trust
intrstat (not for any particular reason, other than that I have never used
it)...

On 10/20/11 11:33 AM, "Michael Stapleton"
<michael.stapleton at techsologic.com> wrote:

>Don't know. I don't like to trouble shoot by guess if possible. I rather
>follow the evidence to capture the culprit. Use what we know to discover
>what we do not know.
>
>We know CS rate in vmstat is high, we know Sys time is high, we know
>syscall rate is low, we know it is not a user process therefor it is
>kernel. Likely a driver.
>
>So what kernel code is running the most?
>
>What's causing that code to run?
>
>Does that code belong to a driver?
>
>
>Mike
>
>
>
>On Thu, 2011-10-20 at 20:25 +0200, Michael Schuster wrote:
>
>> Hi,
>> 
>> just found this:
>> http://download.oracle.com/docs/cd/E19253-01/820-5245/ghgoc/index.html
>> 
>> does it help?
>> 
>> On Thu, Oct 20, 2011 at 20:23, Michael Stapleton
>> <michael.stapleton at techsologic.com> wrote:
>> > My understanding is that it is not supposed to be a loaded system. We
>> > want to know what the load is.
>> >
>> >
>> > gernot at tintenfass:~# intrstat 30
>> >
>> >      device |      cpu0 %tim      cpu1 %tim
>> > -------------+------------------------------
>> >    e1000g#0 |         1  0,0         0  0,0
>> >      ehci#0 |         0  0,0         4  0,0
>> >      ehci#1 |         3  0,0         0  0,0
>> >   hci1394#0 |         0  0,0         2  0,0
>> >     i8042#1 |         0  0,0         4  0,0
>> >      i915#1 |         0  0,0         2  0,0
>> >   pci-ide#0 |        15  0,1         0  0,0
>> >      uhci#0 |         0  0,0         2  0,0
>> >      uhci#1 |         0  0,0         0  0,0
>> >      uhci#2 |         3  0,0         0  0,0
>> >      uhci#3 |         0  0,0         2  0,0
>> >      uhci#4 |         0  0,0         4  0,0
>> >
>> >      device |      cpu0 %tim      cpu1 %tim
>> > -------------+------------------------------
>> >    e1000g#0 |         1  0,0         0  0,0
>> >      ehci#0 |         0  0,0         3  0,0
>> >      ehci#1 |         3  0,0         0  0,0
>> >   hci1394#0 |         0  0,0         1  0,0
>> >     i8042#1 |         0  0,0         6  0,0
>> >      i915#1 |         0  0,0         1  0,0
>> >   pci-ide#0 |         3  0,0         0  0,0
>> >      uhci#0 |         0  0,0         1  0,0
>> >      uhci#1 |         0  0,0         0  0,0
>> >      uhci#2 |         3  0,0         0  0,0
>> >      uhci#3 |         0  0,0         1  0,0
>> >      uhci#4 |         0  0,0         3  0,0
>> >
>> > gernot at tintenfass:~# vmstat 5 10
>> >  kthr      memory            page            disk          faults
>> > cpu
>> >  r b w   swap  free  re  mf pi po fr de sr cd s0 s1 s2   in   sy   cs
>>us
>> > sy id
>> >  0 0 0 4243840 1145720 1  6  0  0  0  0  2  0  1  1  1 9767  121
>>37073 0
>> > 54 46
>> >  0 0 0 4157824 1059796 4 11  0  0  0  0  0  0  0  0  0 9752  119
>>37132 0
>> > 54 46
>> >  0 0 0 4157736 1059752 0  0  0  0  0  0  0  0  0  0  0 9769  113
>>37194 0
>> > 54 46
>> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9682  104
>>36941 0
>> > 54 46
>> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9769  105
>>37208 0
>> > 54 46
>> >  0 0 0 4157728 1059772 0  1  0  0  0  0  0  0  0  0  0 9741  159
>>37104 0
>> > 54 46
>> >  0 0 0 4157728 1059772 0  0  0  0  0  0  0  0  0  0  0 9695  127
>>36931 0
>> > 54 46
>> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9762  105
>>37188 0
>> > 54 46
>> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9723  102
>>37058 0
>> > 54 46
>> >  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9774  105
>>37263 0
>> > 54 46
>> >
>> > Mike
>> >
>> >
>> > On Thu, 2011-10-20 at 11:02 -0700, Rennie Allen wrote:
>> >
>> >> Sched is the scheduler itself.  How long did you let this run?  If
>>only
>> >> for a couple of seconds, then that number is high, but not
>>ridiculous for
>> >> a loaded system, so I think that this output rules out a high context
>> >> switch rate.
>> >>
>> >> Try this command to see if some process is making an excessive
>>number of
>> >> syscalls:
>> >>
>> >> dtrace -n 'syscall:::entry { @[execname]=count()}'
>> >>
>> >> If not, then I'd try looking at interrupts...
>> >>
>> >>
>> >> On 10/20/11 10:52 AM, "Gernot Wolf" <gw.inet at chello.at> wrote:
>> >>
>> >> >Yeah, I've been able to run this diagnostics on another OI box (at
>>my
>> >> >office, so much for OI not being used in production ;)), and noticed
>> >> >that there were several values that were quite different. I just
>>don't
>> >> >have any idea on the meaning of this figures...
>> >> >
>> >> >Anyway, here are the results of the dtrace command (I executed the
>> >> >command twice, hence two result sets):
>> >> >
>> >> >gernot at tintenfass:~# dtrace -n 'sched:::off-cpu {
>>@[execname]=count()}'
>> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes
>> >> >^C
>> >> >
>> >> >   ipmgmtd  
>> 1
>> >> >   gconfd-2 
>> 2
>> >> >   gnome-settings-d
>> 2
>> >> >   idmapd   
>> 2
>> >> >   inetd    
>> 2
>> >> >   miniserv.pl
>> 2
>> >> >   netcfgd  
>> 2
>> >> >   nscd     
>> 2
>> >> >   ospm-applet
>> 2
>> >> >   ssh-agent
>> 2
>> >> >   sshd     
>> 2
>> >> >   svc.startd
>> 2
>> >> >   intrd    
>> 3
>> >> >   afpd     
>> 4
>> >> >   mdnsd    
>> 4
>> >> >   gnome-power-mana
>> 5
>> >> >   clock-applet
>> 7
>> >> >   sendmail 
>> 7
>> >> >   xscreensaver
>> 7
>> >> >   fmd      
>> 9
>> >> >   fsflush  
>>11
>> >> >   ntpd     
>>11
>> >> >   updatemanagernot
>>13
>> >> >   isapython2.6
>>14
>> >> >   devfsadm 
>>20
>> >> >   gnome-terminal
>>20
>> >> >   dtrace   
>>23
>> >> >   mixer_applet2
>>25
>> >> >   smbd     
>>39
>> >> >   nwam-manager
>>60
>> >> >   svc.configd
>>79
>> >> >   Xorg     
>>100
>> >> >   sched    
>>394078
>> >> >
>> >> >gernot at tintenfass:~# dtrace -n 'sched:::off-cpu {
>>@[execname]=count()}'
>> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes
>> >> >^C
>> >> >
>> >> >   automountd
>> 1
>> >> >   ipmgmtd  
>> 1
>> >> >   idmapd   
>> 2
>> >> >   in.routed
>> 2
>> >> >   init     
>> 2
>> >> >   miniserv.pl
>> 2
>> >> >   netcfgd  
>> 2
>> >> >   ssh-agent
>> 2
>> >> >   sshd     
>> 2
>> >> >   svc.startd
>> 2
>> >> >   fmd      
>> 3
>> >> >   hald     
>> 3
>> >> >   inetd    
>> 3
>> >> >   intrd    
>> 3
>> >> >   hald-addon-acpi
>> 4
>> >> >   nscd     
>> 4
>> >> >   gnome-power-mana
>> 5
>> >> >   sendmail 
>> 5
>> >> >   mdnsd    
>> 6
>> >> >   devfsadm 
>> 8
>> >> >   xscreensaver
>> 9
>> >> >   fsflush  
>>10
>> >> >   ntpd     
>>14
>> >> >   updatemanagernot
>>16
>> >> >   mixer_applet2
>>21
>> >> >   isapython2.6
>>22
>> >> >   dtrace   
>>24
>> >> >   gnome-terminal
>>24
>> >> >   smbd     
>>39
>> >> >   nwam-manager
>>58
>> >> >   zpool-rpool
>>65
>> >> >   svc.configd
>>79
>> >> >   Xorg     
>>82
>> >> >   sched    
>>369939
>> >> >
>> >> >So, quite obviously there is one executable standing out here,
>>"sched",
>> >> >now what's the meaning of this figures?
>> >> >
>> >> >Regards,
>> >> >Gernot Wolf
>> >> >
>> >> >
>> >> >Am 20.10.11 19:22, schrieb Michael Stapleton:
>> >> >> Hi Gernot,
>> >> >>
>> >> >> You have a high context switch rate.
>> >> >>
>> >> >> try
>> >> >> #dtrace -n 'sched:::off-cpu { @[execname]=count()}'
>> >> >>
>> >> >> For a few seconds to see if you can get the name of and
>>executable.
>> >> >>
>> >> >> Mike
>> >> >> On Thu, 2011-10-20 at 18:44 +0200, Gernot Wolf wrote:
>> >> >>
>> >> >>> Hello all,
>> >> >>>
>> >> >>> I have a machine here at my home running OpenIndiana oi_151a,
>>which
>> >> >>> serves as a NAS on my home network. The original install was
>> >> >>>OpenSolaris
>> >> >>> 2009.6 which was later upgraded to snv_134b, and recently to
>>oi_151a.
>> >> >>>
>> >> >>> So far this OSOL (now OI) box has performed excellently, with
>>one major
>> >> >>> exception: Sometimes, after a reboot, the cpu load was about
>>50-60%,
>> >> >>> although the system was doing nothing. Until recently, another
>>reboot
>> >> >>> solved the issue.
>> >> >>>
>> >> >>> This does not work any longer. The system has always a cpu load
>>of
>> >> >>> 50-60% when idle (and higher of course when there is actually
>>some work
>> >> >>> to do).
>> >> >>>
>> >> >>> I've already googled the symptoms. This didn't turn up very much
>>useful
>> >> >>> info, and the few things I found didn't apply to my problem. Most
>> >> >>> noticably was this problem which could be solved by disabling
>>cpupm in
>> >> >>> /etc/power.conf, but trying that didn't show any effect on my
>>system.
>> >> >>>
>> >> >>> So I'm finally out of my depth. I have to admit that my
>>knowledge of
>> >> >>> Unix is superficial at best, so I decided to try looking for
>>help here.
>> >> >>>
>> >> >>> I've run several diagnostic commands like top, powertop,
>>lockstat etc.
>> >> >>> and attached the results to this email (I've zipped the results
>>of
>> >> >>>kstat
>> >> >>> because they were>1MB).
>> >> >>>
>> >> >>> One important thing is that when I boot into the oi_151a live dvd
>> >> >>> instead of booting into the installed system, I also get the
>>high cpu
>> >> >>> load. I mention this because I have installed several things on
>>my OI
>> >> >>> box like vsftpd, svn, netstat etc. I first thought that this
>>problem
>> >> >>> might be caused by some of this extra stuff, but getting the same
>> >> >>>system
>> >> >>> when booting the live dvd ruled that out (I think).
>> >> >>>
>> >> >>> The machine is a custom build medium tower:
>> >> >>> S-775 Intel DG965WHMKR ATX mainbord
>> >> >>> Intel Core 2 Duo E4300 CPU 1.8GHz
>> >> >>> 1x IDE DVD recorder
>> >> >>> 1x IDE HD 200GB (serves as system drive)
>> >> >>> 6x SATA II 1.5TB HD (configured as zfs raidz2 array)
>> >> >>>
>> >> >>> I have to solve this problem. Although the system runs fine and
>> >> >>> absolutely serves it's purpose, having the cpu at 50-60% load
>> >> >>>constantly
>> >> >>> is a waste of energy and surely a rather unhealthy stress on the
>> >> >>>hardware.
>> >> >>>
>> >> >>> Anyone any ideas...?
>> >> >>>
>> >> >>> Regards,
>> >> >>> Gernot Wolf
>> >> >>> _______________________________________________
>> >> >>> OpenIndiana-discuss mailing list
>> >> >>> OpenIndiana-discuss at openindiana.org
>> >> >>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> OpenIndiana-discuss mailing list
>> >> >> OpenIndiana-discuss at openindiana.org
>> >> >> http://openindiana.org/mailman/listinfo/openindiana-discuss
>> >> >>
>> >> >
>> >> >_______________________________________________
>> >> >OpenIndiana-discuss mailing list
>> >> >OpenIndiana-discuss at openindiana.org
>> >> >http://openindiana.org/mailman/listinfo/openindiana-discuss
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> OpenIndiana-discuss mailing list
>> >> OpenIndiana-discuss at openindiana.org
>> >> http://openindiana.org/mailman/listinfo/openindiana-discuss
>> >
>> >
>> > _______________________________________________
>> > OpenIndiana-discuss mailing list
>> > OpenIndiana-discuss at openindiana.org
>> > http://openindiana.org/mailman/listinfo/openindiana-discuss
>> >
>> 
>> 
>> 
>
>
>_______________________________________________
>OpenIndiana-discuss mailing list
>OpenIndiana-discuss at openindiana.org
>http://openindiana.org/mailman/listinfo/openindiana-discuss