[OpenIndiana-discuss] Problem with high cpu load (oi_151a)
Michael Stapleton
michael.stapleton at techsologic.com
Thu Oct 20 18:33:07 UTC 2011
Don't know. I don't like to trouble shoot by guess if possible. I rather
follow the evidence to capture the culprit. Use what we know to discover
what we do not know.
We know CS rate in vmstat is high, we know Sys time is high, we know
syscall rate is low, we know it is not a user process therefor it is
kernel. Likely a driver.
So what kernel code is running the most?
What's causing that code to run?
Does that code belong to a driver?
Mike
On Thu, 2011-10-20 at 20:25 +0200, Michael Schuster wrote:
> Hi,
>
> just found this:
> http://download.oracle.com/docs/cd/E19253-01/820-5245/ghgoc/index.html
>
> does it help?
>
> On Thu, Oct 20, 2011 at 20:23, Michael Stapleton
> <michael.stapleton at techsologic.com> wrote:
> > My understanding is that it is not supposed to be a loaded system. We
> > want to know what the load is.
> >
> >
> > gernot at tintenfass:~# intrstat 30
> >
> > device | cpu0 %tim cpu1 %tim
> > -------------+------------------------------
> > e1000g#0 | 1 0,0 0 0,0
> > ehci#0 | 0 0,0 4 0,0
> > ehci#1 | 3 0,0 0 0,0
> > hci1394#0 | 0 0,0 2 0,0
> > i8042#1 | 0 0,0 4 0,0
> > i915#1 | 0 0,0 2 0,0
> > pci-ide#0 | 15 0,1 0 0,0
> > uhci#0 | 0 0,0 2 0,0
> > uhci#1 | 0 0,0 0 0,0
> > uhci#2 | 3 0,0 0 0,0
> > uhci#3 | 0 0,0 2 0,0
> > uhci#4 | 0 0,0 4 0,0
> >
> > device | cpu0 %tim cpu1 %tim
> > -------------+------------------------------
> > e1000g#0 | 1 0,0 0 0,0
> > ehci#0 | 0 0,0 3 0,0
> > ehci#1 | 3 0,0 0 0,0
> > hci1394#0 | 0 0,0 1 0,0
> > i8042#1 | 0 0,0 6 0,0
> > i915#1 | 0 0,0 1 0,0
> > pci-ide#0 | 3 0,0 0 0,0
> > uhci#0 | 0 0,0 1 0,0
> > uhci#1 | 0 0,0 0 0,0
> > uhci#2 | 3 0,0 0 0,0
> > uhci#3 | 0 0,0 1 0,0
> > uhci#4 | 0 0,0 3 0,0
> >
> > gernot at tintenfass:~# vmstat 5 10
> > kthr memory page disk faults
> > cpu
> > r b w swap free re mf pi po fr de sr cd s0 s1 s2 in sy cs us
> > sy id
> > 0 0 0 4243840 1145720 1 6 0 0 0 0 2 0 1 1 1 9767 121 37073 0
> > 54 46
> > 0 0 0 4157824 1059796 4 11 0 0 0 0 0 0 0 0 0 9752 119 37132 0
> > 54 46
> > 0 0 0 4157736 1059752 0 0 0 0 0 0 0 0 0 0 0 9769 113 37194 0
> > 54 46
> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9682 104 36941 0
> > 54 46
> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9769 105 37208 0
> > 54 46
> > 0 0 0 4157728 1059772 0 1 0 0 0 0 0 0 0 0 0 9741 159 37104 0
> > 54 46
> > 0 0 0 4157728 1059772 0 0 0 0 0 0 0 0 0 0 0 9695 127 36931 0
> > 54 46
> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9762 105 37188 0
> > 54 46
> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9723 102 37058 0
> > 54 46
> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9774 105 37263 0
> > 54 46
> >
> > Mike
> >
> >
> > On Thu, 2011-10-20 at 11:02 -0700, Rennie Allen wrote:
> >
> >> Sched is the scheduler itself. How long did you let this run? If only
> >> for a couple of seconds, then that number is high, but not ridiculous for
> >> a loaded system, so I think that this output rules out a high context
> >> switch rate.
> >>
> >> Try this command to see if some process is making an excessive number of
> >> syscalls:
> >>
> >> dtrace -n 'syscall:::entry { @[execname]=count()}'
> >>
> >> If not, then I'd try looking at interrupts...
> >>
> >>
> >> On 10/20/11 10:52 AM, "Gernot Wolf" <gw.inet at chello.at> wrote:
> >>
> >> >Yeah, I've been able to run this diagnostics on another OI box (at my
> >> >office, so much for OI not being used in production ;)), and noticed
> >> >that there were several values that were quite different. I just don't
> >> >have any idea on the meaning of this figures...
> >> >
> >> >Anyway, here are the results of the dtrace command (I executed the
> >> >command twice, hence two result sets):
> >> >
> >> >gernot at tintenfass:~# dtrace -n 'sched:::off-cpu { @[execname]=count()}'
> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes
> >> >^C
> >> >
> >> > ipmgmtd 1
> >> > gconfd-2 2
> >> > gnome-settings-d 2
> >> > idmapd 2
> >> > inetd 2
> >> > miniserv.pl 2
> >> > netcfgd 2
> >> > nscd 2
> >> > ospm-applet 2
> >> > ssh-agent 2
> >> > sshd 2
> >> > svc.startd 2
> >> > intrd 3
> >> > afpd 4
> >> > mdnsd 4
> >> > gnome-power-mana 5
> >> > clock-applet 7
> >> > sendmail 7
> >> > xscreensaver 7
> >> > fmd 9
> >> > fsflush 11
> >> > ntpd 11
> >> > updatemanagernot 13
> >> > isapython2.6 14
> >> > devfsadm 20
> >> > gnome-terminal 20
> >> > dtrace 23
> >> > mixer_applet2 25
> >> > smbd 39
> >> > nwam-manager 60
> >> > svc.configd 79
> >> > Xorg 100
> >> > sched 394078
> >> >
> >> >gernot at tintenfass:~# dtrace -n 'sched:::off-cpu { @[execname]=count()}'
> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes
> >> >^C
> >> >
> >> > automountd 1
> >> > ipmgmtd 1
> >> > idmapd 2
> >> > in.routed 2
> >> > init 2
> >> > miniserv.pl 2
> >> > netcfgd 2
> >> > ssh-agent 2
> >> > sshd 2
> >> > svc.startd 2
> >> > fmd 3
> >> > hald 3
> >> > inetd 3
> >> > intrd 3
> >> > hald-addon-acpi 4
> >> > nscd 4
> >> > gnome-power-mana 5
> >> > sendmail 5
> >> > mdnsd 6
> >> > devfsadm 8
> >> > xscreensaver 9
> >> > fsflush 10
> >> > ntpd 14
> >> > updatemanagernot 16
> >> > mixer_applet2 21
> >> > isapython2.6 22
> >> > dtrace 24
> >> > gnome-terminal 24
> >> > smbd 39
> >> > nwam-manager 58
> >> > zpool-rpool 65
> >> > svc.configd 79
> >> > Xorg 82
> >> > sched 369939
> >> >
> >> >So, quite obviously there is one executable standing out here, "sched",
> >> >now what's the meaning of this figures?
> >> >
> >> >Regards,
> >> >Gernot Wolf
> >> >
> >> >
> >> >Am 20.10.11 19:22, schrieb Michael Stapleton:
> >> >> Hi Gernot,
> >> >>
> >> >> You have a high context switch rate.
> >> >>
> >> >> try
> >> >> #dtrace -n 'sched:::off-cpu { @[execname]=count()}'
> >> >>
> >> >> For a few seconds to see if you can get the name of and executable.
> >> >>
> >> >> Mike
> >> >> On Thu, 2011-10-20 at 18:44 +0200, Gernot Wolf wrote:
> >> >>
> >> >>> Hello all,
> >> >>>
> >> >>> I have a machine here at my home running OpenIndiana oi_151a, which
> >> >>> serves as a NAS on my home network. The original install was
> >> >>>OpenSolaris
> >> >>> 2009.6 which was later upgraded to snv_134b, and recently to oi_151a.
> >> >>>
> >> >>> So far this OSOL (now OI) box has performed excellently, with one major
> >> >>> exception: Sometimes, after a reboot, the cpu load was about 50-60%,
> >> >>> although the system was doing nothing. Until recently, another reboot
> >> >>> solved the issue.
> >> >>>
> >> >>> This does not work any longer. The system has always a cpu load of
> >> >>> 50-60% when idle (and higher of course when there is actually some work
> >> >>> to do).
> >> >>>
> >> >>> I've already googled the symptoms. This didn't turn up very much useful
> >> >>> info, and the few things I found didn't apply to my problem. Most
> >> >>> noticably was this problem which could be solved by disabling cpupm in
> >> >>> /etc/power.conf, but trying that didn't show any effect on my system.
> >> >>>
> >> >>> So I'm finally out of my depth. I have to admit that my knowledge of
> >> >>> Unix is superficial at best, so I decided to try looking for help here.
> >> >>>
> >> >>> I've run several diagnostic commands like top, powertop, lockstat etc.
> >> >>> and attached the results to this email (I've zipped the results of
> >> >>>kstat
> >> >>> because they were>1MB).
> >> >>>
> >> >>> One important thing is that when I boot into the oi_151a live dvd
> >> >>> instead of booting into the installed system, I also get the high cpu
> >> >>> load. I mention this because I have installed several things on my OI
> >> >>> box like vsftpd, svn, netstat etc. I first thought that this problem
> >> >>> might be caused by some of this extra stuff, but getting the same
> >> >>>system
> >> >>> when booting the live dvd ruled that out (I think).
> >> >>>
> >> >>> The machine is a custom build medium tower:
> >> >>> S-775 Intel DG965WHMKR ATX mainbord
> >> >>> Intel Core 2 Duo E4300 CPU 1.8GHz
> >> >>> 1x IDE DVD recorder
> >> >>> 1x IDE HD 200GB (serves as system drive)
> >> >>> 6x SATA II 1.5TB HD (configured as zfs raidz2 array)
> >> >>>
> >> >>> I have to solve this problem. Although the system runs fine and
> >> >>> absolutely serves it's purpose, having the cpu at 50-60% load
> >> >>>constantly
> >> >>> is a waste of energy and surely a rather unhealthy stress on the
> >> >>>hardware.
> >> >>>
> >> >>> Anyone any ideas...?
> >> >>>
> >> >>> Regards,
> >> >>> Gernot Wolf
> >> >>> _______________________________________________
> >> >>> OpenIndiana-discuss mailing list
> >> >>> OpenIndiana-discuss at openindiana.org
> >> >>> http://openindiana.org/mailman/listinfo/openindiana-discuss
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> OpenIndiana-discuss mailing list
> >> >> OpenIndiana-discuss at openindiana.org
> >> >> http://openindiana.org/mailman/listinfo/openindiana-discuss
> >> >>
> >> >
> >> >_______________________________________________
> >> >OpenIndiana-discuss mailing list
> >> >OpenIndiana-discuss at openindiana.org
> >> >http://openindiana.org/mailman/listinfo/openindiana-discuss
> >>
> >>
> >>
> >> _______________________________________________
> >> OpenIndiana-discuss mailing list
> >> OpenIndiana-discuss at openindiana.org
> >> http://openindiana.org/mailman/listinfo/openindiana-discuss
> >
> >
> > _______________________________________________
> > OpenIndiana-discuss mailing list
> > OpenIndiana-discuss at openindiana.org
> > http://openindiana.org/mailman/listinfo/openindiana-discuss
> >
>
>
>
More information about the OpenIndiana-discuss
mailing list