[OpenIndiana-discuss] Slowly increasing kernel load with oi_151a

Fri Dec 23 00:08:03 UTC 2011

I recently got a new machine and installed oi_151a from scratch
(core i7-2600, intel motherboard DH67BL, 16GB RAM).
After a few days uptime I noticed a constant system load of about 10%
although the desktop was idle and I had not started anything that caused
a permanent load. There was almost no I/O activity, just a few reads
and writes every few seconds. vmstat showed 0-1% user time but
10-13% system time. prstat -v output was far below 1% or 0% user and
system time for all processes.
Over the following days the load increased further. When I took 7 cpu
cores off-line I got about 80% sys load on the remaining core. Where
does it come from?

When I switch from multi user to single user mode the load persists.
When I reboot, everything is fine for a while (0-1% sys load) but the load
slowly starts increasing again. So, I have to reboot the machine about
every 2 days what is very unpleasant.

I tried to analyze the issue using intrstat, lockstat, etc. but have not
got very far.

All following commands were run in single user mode and with only one cpu
core on-line. (I hope it's ok to put the output here?)

~ # vmstat 5
  kthr      memory            page            disk          faults      cpu
  r b w   swap  free  re  mf pi po fr de sr s1 s2 s3 s4   in   sy   cs us sy id
  0 0 0 9070220 2859260 2  4  0  0  0  0  0  5 -1  1 14  397  265  258  0  4 95
  0 0 0 10392120 4142932 24 64 0 0  0  0  0  0  0  0  1  508   99  227  0 21 79
  0 0 0 10392120 4142960 0 0  0  0  0  0  0  0  0  0 13  511   60  229  0 21 79
  0 0 0 10392124 4142964 0 0  0  0  0  0  0  0  0  0 13  509   59  226  0 21 79

~ # ps -ef
      UID   PID  PPID   C    STIME TTY         TIME CMD
     root     0     0   0   Dec 20 ?           0:01 sched
     root     4     0   0   Dec 20 ?           0:00 kcfpoold
     root     6     0   0   Dec 20 ?           2:34 zpool-rpool
     root     1     0   0   Dec 20 ?           0:00 /sbin/init
     root     2     0   0   Dec 20 ?           0:00 pageout
     root     3     0   0   Dec 20 ?           8:54 fsflush
     root    10     1   0   Dec 20 ?           0:03 /lib/svc/bin/svc.startd
     root    12     1   0   Dec 20 ?           0:08 /lib/svc/bin/svc.configd
   netadm    50     1   0   Dec 20 ?           0:00 /lib/inet/ipmgmtd
    dladm    46     1   0   Dec 20 ?           0:00 /sbin/dlmgmtd
     root   167     0   0   Dec 20 ?           1:50 zpool-tank
     root   232     1   0   Dec 20 ?           0:00 /usr/lib/sysevent/syseventd
     root  9518    10   0 21:01:56 console     0:00 -bash
     root   262     1   0   Dec 20 ?           0:02 devfsadmd
     root   276     1   0   Dec 20 ?           0:00 /usr/lib/power/powerd
     root 10708  9518   0 21:04:53 console     0:00 ps -ef
     root  3222     1   0   Dec 20 ?           0:00 -bash

~ # intrstat
       device |      cpu0 %tim      cpu1 %tim      cpu2 %tim      cpu3 %tim
-------------+------------------------------------------------------------
     e1000g#1 |         1  0.0         0  0.0         0  0.0         0  0.0
       ehci#0 |         1  0.0         0  0.0         0  0.0         0  0.0
       ehci#1 |         1  0.0         0  0.0         0  0.0         0  0.0
       rtls#0 |         1  0.0         0  0.0         0  0.0         0  0.0
(cpu4..7 are all 0.0%)

~ # prstat -v
    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
  10711 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0 254   0 prstat/1
   3222 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 bash/1
[cut]
Total: 14 processes, 366 lwps, load averages: 0.02, 0.32, 0.67

~ # lockstat -kIW -D20 sleep 30

Profiling interrupt: 2913 events in 30.028 seconds (97 events/sec)

Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
-------------------------------------------------------------------------------
  2878  99%  99% 0.00      293 cpu[0]                 acpi_cpu_cstate
    12   0%  99% 0.00      224 cpu[0]                 fsflush
    10   0% 100% 0.00      266 cpu[0]                 i86_mwait
[cut]
-------------------------------------------------------------------------------

Is the high count on acpi_cpu_cstate normal?

The hotkernel script from the dtrace toolkit finally froze my system.
After the reboot hotkernel run flawlessly.

How can I further analyze this?

Thanks,
Mirko