[OpenIndiana-discuss] SATA device errors, possibly due to IRQ conflict

Daniel cs2dsb at gmail.com
Thu Sep 29 16:53:12 UTC 2011


>
> since your bios doesnt reliably detect drives on reboot i would start by
> looking for firmware upgrades.
>
> smc likes to say dont upgrade your firmware unless you have a problem, i be
> you meet that requirement.
>
> j
>

Thanks, I'm in the process of trying to do this but I'm having some trouble
getting it to recognize any of the bootable disks I've created. I'll dig out
a USB stick and try that since I'm having no joy with the virtual CD/floppy
options.

 In your BIOS you should have a setting for "Interrupt 19 Capture" and, I
> believe, the default setting is "Enable".  Change it to "Disable".  This
> will disable your ability to boot off your SAS controller but you don't do
> that right now anyway.
>
> Good luck!
>
> -Russ
>

I've done this but it gives the same errors.


> Some further digging there appears to be a similar issue on the FreeBSD
> side of things with the Tylersburg chipset (found on the X8ST3-F).
>
> http://lists.freebsd.org/pipermail/freebsd-current/2009-July/009946.html
>
> The USB devices and the SATA devices all contend with IRQ19 (as seen by
> uhci and pci-ide all piled together).  Might it be possible to switch your
> SATA mode to AHCI rather than IDE?  That will use a different driver and
> subsequently might use a different interrupt.
>
> -Russ


I believe it is possible to change the SATA port types in bios but from
memory when I did this previously it prevented the OS (OpenSolaris at the
time) from booting  because the rpool physical path changed and I had to go
in and modify something with a Live disk to make it work. I will see if I
can dig out my notes or find an article on this.

Thanks for the advice so far guys. I'll let you know when I make some
progress.

Cheers,

Daniel
On 29 September 2011 17:36, Russell Hansen <russhan at new-swankton.net> wrote:

> Some further digging there appears to be a similar issue on the FreeBSD
> side of things with the Tylersburg chipset (found on the X8ST3-F).
>
> http://lists.freebsd.org/pipermail/freebsd-current/2009-July/009946.html
>
> The USB devices and the SATA devices all contend with IRQ19 (as seen by
> uhci and pci-ide all piled together).  Might it be possible to switch your
> SATA mode to AHCI rather than IDE?  That will use a different driver and
> subsequently might use a different interrupt.
>
> -Russ
>
> ________________________________
>
> From: Russell Hansen [mailto:russhan at new-swankton.net]
> Sent: Thu 9/29/2011 8:59 AM
> To: Discussion list for OpenIndiana
> Subject: Re: [OpenIndiana-discuss] SATA device errors,possibly due to IRQ
> conflict
>
>
>
> In your BIOS you should have a setting for "Interrupt 19 Capture" and, I
> believe, the default setting is "Enable".  Change it to "Disable".  This
> will disable your ability to boot off your SAS controller but you don't do
> that right now anyway.
>
> Good luck!
>
> -Russ
>
> ________________________________
>
> From: Daniel [mailto:cs2dsb at gmail.com]
> Sent: Thu 9/29/2011 2:31 AM
> To: openindiana-discuss at openindiana.org
> Subject: [OpenIndiana-discuss] SATA device errors,possibly due to IRQ
> conflict
>
>
>
> Hi,
>
> I've got a server running OpenIndiana 148 on a Supermicro *X8ST3-F* that
> has
> been working perfectly for months right up until I added some more storage.
>
> The board has 6 * SATA ports and 8 * SAS ports. Previously all the drives
> in
> my storage pool were attached to the 8 SAS ports and only my rpool drive
> was
> using one of the SATA ports.
>
> Now that I have added another 4 drives I've had to connect them to the SATA
> ports - this is when the system started to become unstable.
>
> I have had periods of very heavy usage that have cause no problems
> whatsoever (for example, I copied 4 TB of data on to the pool, most of
> which
> would have had to go on the new drives then did several scrubs over the
> next
> few days). The system seems perfectly happy to sustain a 350mb+ read or
> write (or a bit of both) for hours on end with no errors at all. Then other
> times, typically overnight or early morning when it's just ticking over
> with
> < 500k read/write, it will fall apart.
>
> There are three kinds of failure I'm experiencing, seemingly randomly:
>
> 1. Errors about failed read/write on 2 or 4 SATA drives in
> /var/adm/messages
> and system io hung - system has to have the power cut to recover - ssh
> won't
> connect, can't get past the username prompt on the terminal. No ZFS errors
> reported
> 2. Errors about failed read/write, system io NOT hung, ZFS reporting
> faulted
> drives (2 or 4) and hundreds of thousands of errors. In this scenario, the
> machine can be rebooted cleanly BUT the failed drives don't get detected by
> BIOS. Usually a full power down, wait 30 seconds, power back up will allow
> the drives to be detected again. When it powers back up ZFS will report
> lots
> of errors but sort itself out after a resilver - I haven't actually had any
> perminent data loss yet, zfs has always recovered.
> 3. No errors at all in either /var/adm/messages or zpool status but hung
> io.
>
>
> I've swaped the drive connections around to prove it isn't the new disks
> that are at fault and this has confirmed that it's whichever devices are
> connected to the SATA controller that are having the problem.
>
> When I rebooted the machine after the latest failure I checked the
> /var/adm/messages and there are thousands (9995 in total but that may be
> from several reboots) messages identical to the following:
>
> "[ID 954099 kern.info] NOTICE: IRQ19 is being shared by drivers with
> different interrupt levels."
>
> In case it's useful:
>
> cs2dsb at chronos:~$ echo ::interrupts -d | pfexec mdb -k
> IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# Driver Name(s)
> 9    0x80 9   PCI    Lvl Fixed  1   1     0x0/0x9   acpi_wrapper_isr
> 11   0xd1 14  PCI    Lvl Fixed  2   1     0x0/0xb   hpet_isr
> 16   0x84 9   PCI    Lvl Fixed  7   1     0x0/0x10  uhci#0
> 18   0x82 9   PCI    Lvl Fixed  5   2     0x0/0x12  uhci#5, ehci#0
> 19   0x86 9   PCI    Lvl Fixed  3   6     0x0/0x13  uhci#4, uhci#2,
> pci-ide#0,
> pci-ide#1, pci-ide#1, pci-ide#0
> 21   0x85 9   PCI    Lvl Fixed  0   1     0x0/0x15  uhci#1
> 23   0x83 9   PCI    Lvl Fixed  6   2     0x0/0x17  uhci#3, ehci#1
> 24   0x81 7   PCI    Edg MSI    4   1     -         pcieb#4
> 25   0x60 6   PCI    Edg MSI    1   1     -         e1000g#0
> 26   0x61 6   PCI    Edg MSI    2   1     -         e1000g#1
> 27   0x40 5   PCI    Edg MSI    3   1     -         mpt#0
> 32   0x20 2          Edg IPI    all 1     -         cmi_cmci_trap
> 160  0xa0 0          Edg IPI    all 0     -         poke_cpu
> 208  0xd0 14         Edg IPI    all 1     -         kcpc_hw_overflow_intr
> 209  0xd3 14         Edg IPI    all 1     -         cbe_fire
> 210  0xd4 14         Edg IPI    all 1     -         cbe_fire
> 240  0xe0 15         Edg IPI    all 1     -         xc_serv
> 241  0xe1 15         Edg IPI    all 1     -         apic_error_intr
>
> cs2dsb at chronos:~$ echo ::interrupts | pfexec mdb -k
> IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# ISR(s)
> 9    0x80 9   PCI    Lvl Fixed  1   1     0x0/0x9   acpi_wrapper_isr
> 11   0xd1 14  PCI    Lvl Fixed  2   1     0x0/0xb   hpet_isr
> 16   0x84 9   PCI    Lvl Fixed  7   1     0x0/0x10  uhci_intr
> 18   0x82 9   PCI    Lvl Fixed  5   2     0x0/0x12  uhci_intr, ehci_intr
> 19   0x86 9   PCI    Lvl Fixed  3   6     0x0/0x13  uhci_intr, uhci_intr,
> ata_intr, ata_intr, ata_intr, ata_intr
> 21   0x85 9   PCI    Lvl Fixed  0   1     0x0/0x15  uhci_intr
> 23   0x83 9   PCI    Lvl Fixed  6   2     0x0/0x17  uhci_intr, ehci_intr
> 24   0x81 7   PCI    Edg MSI    4   1     -         pcieb_intr_handler
> 25   0x60 6   PCI    Edg MSI    1   1     -         e1000g_intr_pciexpress
> 26   0x61 6   PCI    Edg MSI    2   1     -         e1000g_intr_pciexpress
> 27   0x40 5   PCI    Edg MSI    3   1     -         mpt_intr
> 32   0x20 2          Edg IPI    all 1     -         cmi_cmci_trap
> 160  0xa0 0          Edg IPI    all 0     -         poke_cpu
> 208  0xd0 14         Edg IPI    all 1     -         kcpc_hw_overflow_intr
> 209  0xd3 14         Edg IPI    all 1     -         cbe_fire
> 210  0xd4 14         Edg IPI    all 1     -         cbe_fire
> 240  0xe0 15         Edg IPI    all 1     -         xc_serv
> 241  0xe1 15         Edg IPI    all 1     -         apic_error_intr
>
>
> So, basically two questions:
>
> 1. How do I fix this IRQ issue so that I don't get those warnings during
> boot up?
> 2. Is this likely to be the cause of the drive problems described above?
>
> Any advice would be much appreciated.
>
> Thanks,
>
> Daniel
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
>
>
>
>
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
>
>


More information about the OpenIndiana-discuss mailing list