[OpenIndiana-discuss] system hang

gonczi at comcast.net gonczi at comcast.net
Wed Mar 30 23:04:23 UTC 2011



Hi Ben, 

The first thing I usually try when a hang happens, is loading the kernel 
debugger (before the hang happens, or course) 

First, make sure you shut off the graphic console ( svcadm disable gdm) 
This is a critical step, otherwise the mdb window pops open in Hyperspace 
and you will not be able to access it, leaving you with the unpleasant option 
of pulling the plug to restart the machine. 

Next, you have 2 choices: either edit the boot stanza , or just 
run mdb -K from one of your login sessions. 

The boot stanza can be edited (temporarily, the changes are not saved between 
reboots) by pressing "e" in the boot menu 
while the cursor is on the kernel that you want to boot. 
Here, you would replace ", console=graphic" with " -k -d " 
(and probably delete the "splash image" line). 

If the system is able to come up, and you are just debugging some 
predictable / reproducible hang, the mdb -K method is much easier. 

Note, it is uppercase K, and do verify that your console is in text mode and. 
You need to be near the console (ILOM is OK). 

When you type mdb -K, the console pops into the debugger. 
At this point, the machine is at a breakpoint, so you need to type ":c" 
ie "colon c" on your console to continue, and let the machine run. 

Given that you managed to load the debugger, you should be able to break 
into mdb at will, by pressing a magic key combo on the console. 
On Sparc, I recall it is ctrl ] 

On intel, try 
F1 A, or 
ctrl-alt-D (as in the letter D) or 
shift-break 

Try all of the above, to see which one triggers the debugger for you. 
shift-break usually works for me. 

If you are desperate and can not find a key combo that works, 
another possibility is set up the system for NMI triggered mdb. 

Most motherboards have an NMI pin (see motherboard docs). 
If you short this to ground, the mobo generates an NMI 
(a non-maskable interrrupt). 

It is common to have a GND (ground) pin right 
next to this, so effectively you just momentarily connect the 2 pins. 


You will need the following line in /etc/system to hook up the NMI to 
trigger the debugger breakpoint: 

set pcplusmp:apic_kmdb_on_nmi=1 

It would be also useful to verify that the machine is configured 
to save crash dumps ( see man dumpadm). 


Once your system is set up, get it to hang, and then break into the debugger, 
and poke around. You may want to intentionally crash the machine at this point, 
just to generate a crash dump. 

It can be done a number of ways, an easy one is writing 
0 into the (r)ip register and typing :c 
e.g: 

<rip/w 0 
:c 

It is just easier to work on a crash dump, than on a live system. 
E.g: generate a ::threadlist -v piped to a file, then pull that up in your favorite editor 
to see what all the theads are doing. 

The ::status command will, of course indicate a null pointer de-reference crash 
do not be thrown by that, since you know you intentionally caused it. 

best wishes 

Steve 


More information about the OpenIndiana-discuss mailing list