[OpenIndiana-discuss] COMSTAR qlt dropping link and resetting
Adrian Carpenter
tac12 at wbic.cam.ac.uk
Thu May 31 16:37:26 UTC 2012
A quick update for those who might be following this thread, I started to collect zilstats, and what I have found is that about once every four days a transaction takes over half an hour:
TIME txg N-Bytes N-Bytes/s N-Max-Rate B-Bytes B-Bytes/s B-Max-Rate ops <=4kB 4-32kB >=32kB
..
..
2012 May 31 15:21:36 475232 2044232 60124 390888 16531456 486219 2985984 175 0 0 175
2012 May 31 15:22:39 475233 2762416 43847 293064 19734528 313246 2244608 266 0 10 256
2012 May 31 16:00:06 475234 29059896 12927 3198840 148652032 66126 12713984 1825 0 181 1644
2012 May 31 16:08:05 475235 2544016 5311 657384 13819904 28851 3575808 182 0 2 180
at 15:32 xen pool master tried reseting the fibre channel HBAs, however since the volume was still blocked I presume, the pool master became very unhappy……
I then see the following in dmesg:
May 31 16:00:08 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 20300, topology Fabric Pt-to-Pt,speed 8G
May 31 16:00:09 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 10400, topology Fabric Pt-to-Pt,speed 8G
I've just taken delivery of some SSDs and will add them as mirrored ZILlog devices, hopefully this will help.
On 21 May 2012, at 16:47, Mike La Spina wrote:
> Hi Adrian,
>
> The SanBoxes? - Nexsan nothing in their logs
> OK
>
> Dmesg? :
>> May 17 17:33:47 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0
>> LINK UP, portid 20300, topology Fabric Pt-to-Pt,speed 8G May 17
>> 17:33:48 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP,
>> portid 10400, topology Fabric Pt-to-Pt,speed 8G
>
> Seeing a single LINKUP notice would normally only occur on init. I would
> say it's just that, otherwise you would have a LINKDOWN before the
> LINKUP, meaning an event on the fabric is your root issue.
>
> Stmf service? Nothing at all in the logs
> OK
>
> Are you running snapshots? yes am running auto snapshot service, in
> addition I'm running a script (hourly) that snapshots the volume and
> send it over ssh to another machine.
>
> I suspect an issue here. The snapshot service runs on fixed time
> intervals e.g. 15Min 1Hour 24Hhour 1Month if your also adding a snapshot
> that runs hourly to do a ZFS send/rec they will overlap. The overlap may
> cause an excessive blocking to stmf sbd access and result in a timeout
> for the XEN host initiators. I suggest you use the auto based existing
> hourly snaps and simply send them over to the remote host or file system
> using a script @ 15 minutes after the hour.
>
> Dedup?
>
> Off - OK
>
> Compression?
>
> Lzjb - OK
>
>
> IRQ sharing?
> echo ::interrupts | mdb -k
>
> IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# ISR(s)
> 1 0x41 5 ISA Edg Fixed 3 1 0x0/0x1 i8042_intr
> 3 0xb1 12 ISA Edg Fixed 39 1 0x0/0x3 asyintr
> 4 0xb0 12 ISA Edg Fixed 38 1 0x0/0x4 asyintr
> 5 0xb2 12 ISA Edg Fixed 40 1 0x0/0x5 asyintr
> 9 0x80 9 PCI Lvl Fixed 1 1 0x0/0x9 acpi_wrapper_isr
> 12 0x42 5 ISA Edg Fixed 4 1 0x0/0xc i8042_intr
> 16 0x83 9 PCI Lvl Fixed 7 2 0x0/0x10 ohci_intr, ohci_intr
> 17 0x81 9 PCI Lvl Fixed 5 1 0x0/0x11 ehci_intr
> 18 0x84 9 PCI Lvl Fixed 8 3 0x0/0x12 ohci_intr,
> ohci_intr,
> ohci_intr
> 19 0x82 9 PCI Lvl Fixed 6 1 0x0/0x13 ehci_intr
> 22 0x40 5 PCI Lvl Fixed 2 2 0x0/0x16 ata_intr, ata_intr
> 88 0x43 5 PCI Edg MSI-X 9 1 - ql_isr_aif
> 89 0x44 5 PCI Edg MSI-X 10 1 - ql_isr_aif
> 90 0x45 5 PCI Edg MSI-X 11 1 - ql_isr_aif
> 91 0x46 5 PCI Edg MSI-X 12 1 - ql_isr_aif
> 92 0x60 6 PCI Edg MSI-X 13 1 - igb_intr_tx_other
> 93 0x61 6 PCI Edg MSI-X 14 1 - igb_intr_rx
> 94 0x62 6 PCI Edg MSI-X 15 1 - igb_intr_tx_other
> 95 0x63 6 PCI Edg MSI-X 16 1 - igb_intr_rx
> 96 0x64 6 PCI Edg MSI-X 36 1 - igb_intr_tx_other
> 97 0x65 6 PCI Edg MSI-X 37 1 - igb_intr_rx
> 98 0x66 6 PCI Edg MSI-X 41 1 - igb_intr_tx_other
> 99 0x67 6 PCI Edg MSI-X 42 1 - igb_intr_rx
> 100 0x68 6 PCI Edg MSI-X 43 1 - igb_intr_tx_other
> 101 0x69 6 PCI Edg MSI-X 44 1 - igb_intr_rx
> 102 0x6a 6 PCI Edg MSI-X 45 1 - igb_intr_tx_other
> 103 0x6b 6 PCI Edg MSI-X 46 1 - igb_intr_rx
> 104 0x47 5 PCI Edg MSI 30 1 - qlt_isr
> 105 0x48 5 PCI Edg MSI 31 1 - qlt_isr
> 160 0xa0 0 Edg IPI all 0 - poke_cpu
> 208 0xd0 14 Edg IPI all 1 -
> kcpc_hw_overflow_intr
> 209 0xd1 14 Edg IPI all 1 - cbe_fire
> 210 0xd3 14 Edg IPI all 1 - cbe_fire
> 240 0xe0 15 Edg IPI all 1 - xc_serv
> 241 0xe1 15 Edg IPI all 1 - apic_error_intr
>
> OK
>
>
> Dr T Adrian Carpenter
> Reader in Imaging Sciences
> Wolfson Brain Imaging Centre
>
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
More information about the OpenIndiana-discuss
mailing list