[OpenIndiana-discuss] Hyperthreading causing kernelpanic on oi_148

Gertjan Oude Lohuis gertjan at oudelohuis.nl
Tue Jan 18 16:11:19 UTC 2011


Hi!

First of all, I would like to thank all developers for OpenIndiana, 
OpenSolaris, Nexenta and other free Solaris distributions for their 
continued effort. Great work!


I've been testing with OpenIndiana (oi_148) on some new toys, and have 
experienced a few nasty kernel panics. They occurred during a pretty 
simple benchmark with filebench.

All I had to do was something like:

$ /usr/benchmarks/filebench/bin/go_filebench
  > load fileserver
  > set $dir=/tank/filebench/
  > set $nfiles=50000
  > run 120

After creating all files, at the moment filebench said "Starting 50 
filereaderthread threads", the kernel would panic with a message "kernel 
heap corruption detected" and reboot. I'll attach a screenshot.

During and after the reboot, a few things could happen:

1. The controller (an IBM M5025, rebranded LSI 9260) forgot about some 
of its disks and complain about that fact. Apparently some of the disks 
all of a sudden were marked as 'Foreign' by the controller but could not 
be imported, resulting in a degraded (and sometimes corrupt) zpool.
2. OpenIndiana would reboot, panic again, repeat that cycle for a number 
of times, and then continue its work.
3. OpenIndiana would just reboot and continue its work.

I have not been able to detect a pattern in when these would occur.

Eventually, I turned off HyperThreading, by gut feeling because we've 
had other crashes (with snv_134) which were caused by HT. After that, no 
panics or other crashes have occurred.

The hardware:
* IBM X3650-M3 with two Xeon E5620 processors
* 24 GB RAM
* internal M1015-controller (LSI 9240) with two OS-disks and SSD's for cache
* M5025-controller (LSI 9260) which controls the enclosure(s)
* one EXP2524-enclosure with 24 SAS-disks. Each disk is configured as a 
RAID-0 device to mimic JBOD. Since the M5025 can't do JBOD, it will be 
replaced by an LSI 9200-8e.

At the moment, I'm testing and configuring two of these machines with 
identical hardware. Both have the same problem, ruling out faulty 
hardware (except for firmware-bugs, of course).

The zpool configuration did not really matter, some configurations I've 
tried:

* stripe of 24 disks
* stripe of 12 mirrors
* stripe of 3 raidz2 (each 8 disks)
* no logdevices were configured yet

I've made sure that the controllers used the latest firmware, and I 
upgraded the mr_sas-driver with the one provided by LSI. Alas, this did 
not prevent the machine from crashing.


Eventually, disabling HyperThreading in the bios solved everything 
(except for my headache, but that's something else).


* Is a buggy HT-implementation a known issue with Solaris? I've seen 
more than one panic/crash/bug, caused by hyperthreading.
* Can I do anything else to debug this, for the good of mankind? I'm in 
a little hurry to get this machine in production, but I'd be happy to 
run some tests or provide more information.
* Any other thoughts?


Regards,

Gertjan Oude Lohuis
Byte Internet (http://www.byte.nl)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: kernelcrash.jpg
Type: image/jpeg
Size: 64236 bytes
Desc: not available
URL: <http://openindiana.org/pipermail/openindiana-discuss/attachments/20110118/f256437b/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: missing_vds.jpg
Type: image/jpeg
Size: 30168 bytes
Desc: not available
URL: <http://openindiana.org/pipermail/openindiana-discuss/attachments/20110118/f256437b/attachment-0003.jpg>


More information about the OpenIndiana-discuss mailing list