[OpenIndiana-discuss] Problem with high cpu load (oi_151a)
Rennie Allen
rennieallen at gmail.com
Thu Oct 20 18:47:44 UTC 2011
I'd like to see a run of the script I sent earlier. I don't trust
intrstat (not for any particular reason, other than that I have never used
it)...
On 10/20/11 11:33 AM, "Michael Stapleton"
<michael.stapleton at techsologic.com> wrote:
>Don't know. I don't like to trouble shoot by guess if possible. I rather
>follow the evidence to capture the culprit. Use what we know to discover
>what we do not know.
>
>We know CS rate in vmstat is high, we know Sys time is high, we know
>syscall rate is low, we know it is not a user process therefor it is
>kernel. Likely a driver.
>
>So what kernel code is running the most?
>
>What's causing that code to run?
>
>Does that code belong to a driver?
>
>
>Mike
>
>
>
>On Thu, 2011-10-20 at 20:25 +0200, Michael Schuster wrote:
>
>> Hi,
>>
>> just found this:
>> http://download.oracle.com/docs/cd/E19253-01/820-5245/ghgoc/index.html
>>
>> does it help?
>>
>> On Thu, Oct 20, 2011 at 20:23, Michael Stapleton
>> <michael.stapleton at techsologic.com> wrote:
>> > My understanding is that it is not supposed to be a loaded system. We
>> > want to know what the load is.
>> >
>> >
>> > gernot at tintenfass:~# intrstat 30
>> >
>> > device | cpu0 %tim cpu1 %tim
>> > -------------+------------------------------
>> > e1000g#0 | 1 0,0 0 0,0
>> > ehci#0 | 0 0,0 4 0,0
>> > ehci#1 | 3 0,0 0 0,0
>> > hci1394#0 | 0 0,0 2 0,0
>> > i8042#1 | 0 0,0 4 0,0
>> > i915#1 | 0 0,0 2 0,0
>> > pci-ide#0 | 15 0,1 0 0,0
>> > uhci#0 | 0 0,0 2 0,0
>> > uhci#1 | 0 0,0 0 0,0
>> > uhci#2 | 3 0,0 0 0,0
>> > uhci#3 | 0 0,0 2 0,0
>> > uhci#4 | 0 0,0 4 0,0
>> >
>> > device | cpu0 %tim cpu1 %tim
>> > -------------+------------------------------
>> > e1000g#0 | 1 0,0 0 0,0
>> > ehci#0 | 0 0,0 3 0,0
>> > ehci#1 | 3 0,0 0 0,0
>> > hci1394#0 | 0 0,0 1 0,0
>> > i8042#1 | 0 0,0 6 0,0
>> > i915#1 | 0 0,0 1 0,0
>> > pci-ide#0 | 3 0,0 0 0,0
>> > uhci#0 | 0 0,0 1 0,0
>> > uhci#1 | 0 0,0 0 0,0
>> > uhci#2 | 3 0,0 0 0,0
>> > uhci#3 | 0 0,0 1 0,0
>> > uhci#4 | 0 0,0 3 0,0
>> >
>> > gernot at tintenfass:~# vmstat 5 10
>> > kthr memory page disk faults
>> > cpu
>> > r b w swap free re mf pi po fr de sr cd s0 s1 s2 in sy cs
>>us
>> > sy id
>> > 0 0 0 4243840 1145720 1 6 0 0 0 0 2 0 1 1 1 9767 121
>>37073 0
>> > 54 46
>> > 0 0 0 4157824 1059796 4 11 0 0 0 0 0 0 0 0 0 9752 119
>>37132 0
>> > 54 46
>> > 0 0 0 4157736 1059752 0 0 0 0 0 0 0 0 0 0 0 9769 113
>>37194 0
>> > 54 46
>> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9682 104
>>36941 0
>> > 54 46
>> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9769 105
>>37208 0
>> > 54 46
>> > 0 0 0 4157728 1059772 0 1 0 0 0 0 0 0 0 0 0 9741 159
>>37104 0
>> > 54 46
>> > 0 0 0 4157728 1059772 0 0 0 0 0 0 0 0 0 0 0 9695 127
>>36931 0
>> > 54 46
>> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9762 105
>>37188 0
>> > 54 46
>> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9723 102
>>37058 0
>> > 54 46
>> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9774 105
>>37263 0
>> > 54 46
>> >
>> > Mike
>> >
>> >
>> > On Thu, 2011-10-20 at 11:02 -0700, Rennie Allen wrote:
>> >
>> >> Sched is the scheduler itself. How long did you let this run? If
>>only
>> >> for a couple of seconds, then that number is high, but not
>>ridiculous for
>> >> a loaded system, so I think that this output rules out a high context
>> >> switch rate.
>> >>
>> >> Try this command to see if some process is making an excessive
>>number of
>> >> syscalls:
>> >>
>> >> dtrace -n 'syscall:::entry { @[execname]=count()}'
>> >>
>> >> If not, then I'd try looking at interrupts...
>> >>
>> >>
>> >> On 10/20/11 10:52 AM, "Gernot Wolf" <gw.inet at chello.at> wrote:
>> >>
>> >> >Yeah, I've been able to run this diagnostics on another OI box (at
>>my
>> >> >office, so much for OI not being used in production ;)), and noticed
>> >> >that there were several values that were quite different. I just
>>don't
>> >> >have any idea on the meaning of this figures...
>> >> >
>> >> >Anyway, here are the results of the dtrace command (I executed the
>> >> >command twice, hence two result sets):
>> >> >
>> >> >gernot at tintenfass:~# dtrace -n 'sched:::off-cpu {
>>@[execname]=count()}'
>> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes
>> >> >^C
>> >> >
>> >> > ipmgmtd
>> 1
>> >> > gconfd-2
>> 2
>> >> > gnome-settings-d
>> 2
>> >> > idmapd
>> 2
>> >> > inetd
>> 2
>> >> > miniserv.pl
>> 2
>> >> > netcfgd
>> 2
>> >> > nscd
>> 2
>> >> > ospm-applet
>> 2
>> >> > ssh-agent
>> 2
>> >> > sshd
>> 2
>> >> > svc.startd
>> 2
>> >> > intrd
>> 3
>> >> > afpd
>> 4
>> >> > mdnsd
>> 4
>> >> > gnome-power-mana
>> 5
>> >> > clock-applet
>> 7
>> >> > sendmail
>> 7
>> >> > xscreensaver
>> 7
>> >> > fmd
>> 9
>> >> > fsflush
>>11
>> >> > ntpd
>>11
>> >> > updatemanagernot
>>13
>> >> > isapython2.6
>>14
>> >> > devfsadm
>>20
>> >> > gnome-terminal
>>20
>> >> > dtrace
>>23
>> >> > mixer_applet2
>>25
>> >> > smbd
>>39
>> >> > nwam-manager
>>60
>> >> > svc.configd
>>79
>> >> > Xorg
>>100
>> >> > sched
>>394078
>> >> >
>> >> >gernot at tintenfass:~# dtrace -n 'sched:::off-cpu {
>>@[execname]=count()}'
>> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes
>> >> >^C
>> >> >
>> >> > automountd
>> 1
>> >> > ipmgmtd
>> 1
>> >> > idmapd
>> 2
>> >> > in.routed
>> 2
>> >> > init
>> 2
>> >> > miniserv.pl
>> 2
>> >> > netcfgd
>> 2
>> >> > ssh-agent
>> 2
>> >> > sshd
>> 2
>> >> > svc.startd
>> 2
>> >> > fmd
>> 3
>> >> > hald
>> 3
>> >> > inetd
>> 3
>> >> > intrd
>> 3
>> >> > hald-addon-acpi
>> 4
>> >> > nscd
>> 4
>> >> > gnome-power-mana
>> 5
>> >> > sendmail
>> 5
>> >> > mdnsd
>> 6
>> >> > devfsadm
>> 8
>> >> > xscreensaver
>> 9
>> >> > fsflush
>>10
>> >> > ntpd
>>14
>> >> > updatemanagernot
>>16
>> >> > mixer_applet2
>>21
>> >> > isapython2.6
>>22
>> >> > dtrace
>>24
>> >> > gnome-terminal
>>24
>> >> > smbd
>>39
>> >> > nwam-manager
>>58
>> >> > zpool-rpool
>>65
>> >> > svc.configd
>>79
>> >> > Xorg
>>82
>> >> > sched
>>369939
>> >> >
>> >> >So, quite obviously there is one executable standing out here,
>>"sched",
>> >> >now what's the meaning of this figures?
>> >> >
>> >> >Regards,
>> >> >Gernot Wolf
>> >> >
>> >> >
>> >> >Am 20.10.11 19:22, schrieb Michael Stapleton:
>> >> >> Hi Gernot,
>> >> >>
>> >> >> You have a high context switch rate.
>> >> >>
>> >> >> try
>> >> >> #dtrace -n 'sched:::off-cpu { @[execname]=count()}'
>> >> >>
>> >> >> For a few seconds to see if you can get the name of and
>>executable.
>> >> >>
>> >> >> Mike
>> >> >> On Thu, 2011-10-20 at 18:44 +0200, Gernot Wolf wrote:
>> >> >>
>> >> >>> Hello all,
>> >> >>>
>> >> >>> I have a machine here at my home running OpenIndiana oi_151a,
>>which
>> >> >>> serves as a NAS on my home network. The original install was
>> >> >>>OpenSolaris
>> >> >>> 2009.6 which was later upgraded to snv_134b, and recently to
>>oi_151a.
>> >> >>>
>> >> >>> So far this OSOL (now OI) box has performed excellently, with
>>one major
>> >> >>> exception: Sometimes, after a reboot, the cpu load was about
>>50-60%,
>> >> >>> although the system was doing nothing. Until recently, another
>>reboot
>> >> >>> solved the issue.
>> >> >>>
>> >> >>> This does not work any longer. The system has always a cpu load
>>of
>> >> >>> 50-60% when idle (and higher of course when there is actually
>>some work
>> >> >>> to do).
>> >> >>>
>> >> >>> I've already googled the symptoms. This didn't turn up very much
>>useful
>> >> >>> info, and the few things I found didn't apply to my problem. Most
>> >> >>> noticably was this problem which could be solved by disabling
>>cpupm in
>> >> >>> /etc/power.conf, but trying that didn't show any effect on my
>>system.
>> >> >>>
>> >> >>> So I'm finally out of my depth. I have to admit that my
>>knowledge of
>> >> >>> Unix is superficial at best, so I decided to try looking for
>>help here.
>> >> >>>
>> >> >>> I've run several diagnostic commands like top, powertop,
>>lockstat etc.
>> >> >>> and attached the results to this email (I've zipped the results
>>of
>> >> >>>kstat
>> >> >>> because they were>1MB).
>> >> >>>
>> >> >>> One important thing is that when I boot into the oi_151a live dvd
>> >> >>> instead of booting into the installed system, I also get the
>>high cpu
>> >> >>> load. I mention this because I have installed several things on
>>my OI
>> >> >>> box like vsftpd, svn, netstat etc. I first thought that this
>>problem
>> >> >>> might be caused by some of this extra stuff, but getting the same
>> >> >>>system
>> >> >>> when booting the live dvd ruled that out (I think).
>> >> >>>
>> >> >>> The machine is a custom build medium tower:
>> >> >>> S-775 Intel DG965WHMKR ATX mainbord
>> >> >>> Intel Core 2 Duo E4300 CPU 1.8GHz
>> >> >>> 1x IDE DVD recorder
>> >> >>> 1x IDE HD 200GB (serves as system drive)
>> >> >>> 6x SATA II 1.5TB HD (configured as zfs raidz2 array)
>> >> >>>
>> >> >>> I have to solve this problem. Although the system runs fine and
>> >> >>> absolutely serves it's purpose, having the cpu at 50-60% load
>> >> >>>constantly
>> >> >>> is a waste of energy and surely a rather unhealthy stress on the
>> >> >>>hardware.
>> >> >>>
>> >> >>> Anyone any ideas...?
>> >> >>>
>> >> >>> Regards,
>> >> >>> Gernot Wolf
>> >> >>> _______________________________________________
>> >> >>> OpenIndiana-discuss mailing list
>> >> >>> OpenIndiana-discuss at openindiana.org
>> >> >>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> OpenIndiana-discuss mailing list
>> >> >> OpenIndiana-discuss at openindiana.org
>> >> >> http://openindiana.org/mailman/listinfo/openindiana-discuss
>> >> >>
>> >> >
>> >> >_______________________________________________
>> >> >OpenIndiana-discuss mailing list
>> >> >OpenIndiana-discuss at openindiana.org
>> >> >http://openindiana.org/mailman/listinfo/openindiana-discuss
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> OpenIndiana-discuss mailing list
>> >> OpenIndiana-discuss at openindiana.org
>> >> http://openindiana.org/mailman/listinfo/openindiana-discuss
>> >
>> >
>> > _______________________________________________
>> > OpenIndiana-discuss mailing list
>> > OpenIndiana-discuss at openindiana.org
>> > http://openindiana.org/mailman/listinfo/openindiana-discuss
>> >
>>
>>
>>
>
>
>_______________________________________________
>OpenIndiana-discuss mailing list
>OpenIndiana-discuss at openindiana.org
>http://openindiana.org/mailman/listinfo/openindiana-discuss
More information about the OpenIndiana-discuss
mailing list