[Xnv-team] [OpenIndiana Distribution - Bug #1625] Xorg hang (100% CPU), nvidia-related
illumos project
devnull at illumos.org
Thu Mar 22 19:05:34 UTC 2012
Issue #1625 has been updated by Marion Hakanson.
Ken, thanks for looking into this. I understand your reasons for closing this item -- heck, I can't even reproduce it on demand, yet it does continue to happen on a regular basis on at least one of my two desktop machines.
As was hinted at above, I'm wondering if this is in fact not Xorg or nvidia specific, but rather a case of lost interrupts peculiar to my hardware. Both of these systems have the nvidia driver sharing the same IRQ with some other onboard devices (keyboard, mouse, disk, or ehci drivers). In fact, something changed on my older machine when I went to oi151a2 (prestable), which seems to have changed the IRQ assignments (nvidia driver no longer shares with ehci and pci-ide), and I haven't had this problem on that machine since.
If I could get some guidance on troubleshooting or diagnosing a system in the state where drivers are no longer receiving interrupts, I believe I could make some progress with this problem.
Could this issue be re-classified appropriately, and kept open until I give up on it?
Dell Optiplex 980:
<pre>
# echo "::interrupts -d" | mdb -k
IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# Driver Name(s)
1 0x41 5 ISA Edg Fixed 4 1 0x0/0x1 i8042#0
9 0x80 9 PCI Lvl Fixed 1 1 0x0/0x9 acpi_wrapper_isr
11 0xd1 14 PCI Lvl Fixed 2 1 0x0/0xb hpet_isr
12 0x42 5 ISA Edg Fixed 5 1 0x0/0xc i8042#0
16 0x82 9 PCI Lvl Fixed 7 2 0x0/0x10 ehci#0, nvidia#0
23 0x83 9 PCI Lvl Fixed 0 1 0x0/0x17 ehci#1
24 0x40 5 PCI Edg MSI 3 1 - ahci#0
25 0x81 7 PCI Edg MSI 6 1 - pcieb#0
26 0x60 6 PCI Edg MSI 1 1 - e1000g#0
32 0x20 2 Edg IPI all 1 - cmi_cmci_trap
160 0xa0 0 Edg IPI all 0 - poke_cpu
208 0xd0 14 Edg IPI all 1 - kcpc_hw_overflow_intr
209 0xd3 14 Edg IPI all 1 - cbe_fire
210 0xd4 14 Edg IPI all 1 - cbe_fire
240 0xe0 15 Edg IPI all 1 - xc_serv
241 0xe1 15 Edg IPI all 1 - apic_error_intr
#
</pre>
Old Pentium-4 machine:
<pre>
# echo "::interrupts -d" | mdb -k
IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# Driver Name(s)
1 0x41 5 ISA Edg Fixed 0 1 0x0/0x1 i8042#0
4 0xb0 12 ISA Edg Fixed 0 1 0x0/0x4 asy#0
6 0x44 5 ISA Edg Fixed 0 1 0x0/0x6 fdc#0
7 0x45 5 ISA Edg Fixed 0 1 0x0/0x7 ecpp#0
9 0x80 9 PCI Lvl Fixed 0 1 0x0/0x9 acpi_wrapper_isr
12 0x42 5 ISA Edg Fixed 0 1 0x0/0xc i8042#0
15 0x43 5 ISA Edg Fixed 0 1 0x0/0xf ata#1
16 0x81 9 PCI Lvl Fixed 0 3 0x0/0x10 uhci#3, uhci#0, nvidia#0
18 0x84 9 PCI Lvl Fixed 0 4 0x0/0x12 uhci#2, e1000g#0, pci-ide#1
, pci-ide#1
19 0x83 9 PCI Lvl Fixed 0 1 0x0/0x13 uhci#1
22 0x40 5 PCI Lvl Fixed 0 1 0x0/0x16 pci-ide#2
23 0x82 9 PCI Lvl Fixed 0 1 0x0/0x17 ehci#0
160 0xa0 0 Edg IPI all 0 - poke_cpu
208 0xd0 14 Edg IPI all 1 - kcpc_hw_overflow_intr
209 0xd1 14 Edg IPI all 1 - cbe_fire
210 0xd3 14 Edg IPI all 1 - cbe_fire
240 0xe0 15 Edg IPI all 1 - xc_serv
241 0xe1 15 Edg IPI all 1 - apic_error_intr
#
</pre>
----------------------------------------
Bug #1625: Xorg hang (100% CPU), nvidia-related
https://www.illumos.org/issues/1625
Author: Marion Hakanson
Status: Closed
Priority: Low
Assignee: OI XNV
Category: XNV (X Window System)
Target version: oi_151_stable
Difficulty: Hard
Tags: nvidia
This Xorg hang is happening on two of my systems. Both run OI-151a now, and both have the nVidia driver version 280.13. The problem also happened under OI-148 with the same nVidia driver, and with several previous driver revisions (I download them directly from the nVidia site). One machine is 64-bit Intel Core i7 w/8GB RAM, GeForce 7300GT; The other is 32-bit Intel P4 w/2GB RAM, GeForce 6200.
When the problem occurs, mouse and keyboard input are ignored, but one can log remotely. You see the Xorg process using 100% CPU. Actually it shows in "prstat -m" as 33% USR, 33% LCK, 33% SLP. Usually restarting gdm is not sufficient, I think the nVidia card or driver is in an unhappy state, with some blocks of black & white bars on the screen after Xorg exits, so I usually just reboot the system.
Also, the Xorg.0.log shows entries like the ones below at the end, prior to killing the Xorg process:
(WW) Oct 04 10:24:40 NVIDIA: WAIT (2, 6, 0x8000, 0x0000c290, 0x0000ca08)
(WW) Oct 04 10:24:47 NVIDIA: WAIT (1, 6, 0x8000, 0x0000c290, 0x0000ca08)
. . .
At the same time, this shows up in /var/adm/messages:
Oct 4 10:24:44 kyklops nvidia: [ID 702911 kern.notice] NVRM: Xid (0000:01:00): 16, Head 00000000 Count 00fdd77b
Oct 4 10:24:44 kyklops nvidia: [ID 702911 kern.notice] NVRM: Xid (0000:01:00): 16, Head 00000001 Count 00fdd74c
Oct 4 10:24:52 kyklops nvidia: [ID 702911 kern.notice] NVRM: Xid (0000:01:00): 16, Head 00000000 Count 00fdd77c
Oct 4 10:24:52 kyklops nvidia: [ID 702911 kern.notice] NVRM: Xid (0000:01:00): 16, Head 00000001 Count 00fdd74d
Oct 4 10:24:53 kyklops nvidia: [ID 702911 kern.notice] NVRM: Xid (0000:01:00): 8, Channel 0000001e
Oct 4 10:25:02 kyklops nvidia: [ID 702911 kern.notice] NVRM: Xid (0000:01:00): 8, Channel 00000020
Oct 4 10:25:02 kyklops nvidia: [ID 702911 kern.notice] NVRM: Xid (0000:01:00): 16, Head 00000000 Count 00fdd77e
Oct 4 10:25:02 kyklops nvidia: [ID 702911 kern.notice] NVRM: Xid (0000:01:00): 16, Head 00000001 Count 00fdd74f
Note that the problem is difficult to replicate on demand. It seems to happen at random times, and probably at or near the time of a window focus change. It does not appear to be necessary to be viewing flash video or a web page, or using any particular X application. I do have "compiz" window-manager effects enabled.
Let me know what kind of debug information I can collect, and how.
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here: http://www.illumos.org/my/account
More information about the Xnv-team
mailing list