[OpenIndiana-discuss] COMSTAR qlt dropping link and resetting
Adrian Carpenter
tac12 at wbic.cam.ac.uk
Thu Jun 7 15:17:04 UTC 2012
Well, this has happened again, exactly a week later, same time too….
So the SSD ZILS didnt do the trick.
I think I am going to turn off the ZFS auto snapshot service ….
Jun 7 15:50:22 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 20300, topology Fabric Pt-to-Pt,speed 8G
Jun 7 15:50:23 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 10400, topology Fabric Pt-to-Pt,speed 8G
On 31 May 2012, at 17:37, Adrian Carpenter wrote:
> A quick update for those who might be following this thread, I started to collect zilstats, and what I have found is that about once every four days a transaction takes over half an hour:
>
> TIME txg N-Bytes N-Bytes/s N-Max-Rate B-Bytes B-Bytes/s B-Max-Rate ops <=4kB 4-32kB >=32kB
> ..
> ..
> 2012 May 31 15:21:36 475232 2044232 60124 390888 16531456 486219 2985984 175 0 0 175
> 2012 May 31 15:22:39 475233 2762416 43847 293064 19734528 313246 2244608 266 0 10 256
> 2012 May 31 16:00:06 475234 29059896 12927 3198840 148652032 66126 12713984 1825 0 181 1644
> 2012 May 31 16:08:05 475235 2544016 5311 657384 13819904 28851 3575808 182 0 2 180
>
> at 15:32 xen pool master tried reseting the fibre channel HBAs, however since the volume was still blocked I presume, the pool master became very unhappy……
>
> I then see the following in dmesg:
>
> May 31 16:00:08 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 20300, topology Fabric Pt-to-Pt,speed 8G
> May 31 16:00:09 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 10400, topology Fabric Pt-to-Pt,speed 8G
>
> I've just taken delivery of some SSDs and will add them as mirrored ZILlog devices, hopefully this will help.
>
>
>
>
>
>
> On 21 May 2012, at 16:47, Mike La Spina wrote:
>
>> Hi Adrian,
>>
>> The SanBoxes? - Nexsan nothing in their logs
>> OK
>>
>> Dmesg? :
>>> May 17 17:33:47 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0
>>> LINK UP, portid 20300, topology Fabric Pt-to-Pt,speed 8G May 17
>>> 17:33:48 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP,
>>> portid 10400, topology Fabric Pt-to-Pt,speed 8G
>>
>> Seeing a single LINKUP notice would normally only occur on init. I would
>> say it's just that, otherwise you would have a LINKDOWN before the
>> LINKUP, meaning an event on the fabric is your root issue.
>>
>> Stmf service? Nothing at all in the logs
>> OK
>>
>> Are you running snapshots? yes am running auto snapshot service, in
>> addition I'm running a script (hourly) that snapshots the volume and
>> send it over ssh to another machine.
>>
>> I suspect an issue here. The snapshot service runs on fixed time
>> intervals e.g. 15Min 1Hour 24Hhour 1Month if your also adding a snapshot
>> that runs hourly to do a ZFS send/rec they will overlap. The overlap may
>> cause an excessive blocking to stmf sbd access and result in a timeout
>> for the XEN host initiators. I suggest you use the auto based existing
>> hourly snaps and simply send them over to the remote host or file system
>> using a script @ 15 minutes after the hour.
>>
>> Dedup?
>>
>> Off - OK
>>
>> Compression?
>>
>> Lzjb - OK
>>
>>
>> IRQ sharing?
>> echo ::interrupts | mdb -k
>>
>> IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# ISR(s)
>> 1 0x41 5 ISA Edg Fixed 3 1 0x0/0x1 i8042_intr
>> 3 0xb1 12 ISA Edg Fixed 39 1 0x0/0x3 asyintr
>> 4 0xb0 12 ISA Edg Fixed 38 1 0x0/0x4 asyintr
>> 5 0xb2 12 ISA Edg Fixed 40 1 0x0/0x5 asyintr
>> 9 0x80 9 PCI Lvl Fixed 1 1 0x0/0x9 acpi_wrapper_isr
>> 12 0x42 5 ISA Edg Fixed 4 1 0x0/0xc i8042_intr
>> 16 0x83 9 PCI Lvl Fixed 7 2 0x0/0x10 ohci_intr, ohci_intr
>> 17 0x81 9 PCI Lvl Fixed 5 1 0x0/0x11 ehci_intr
>> 18 0x84 9 PCI Lvl Fixed 8 3 0x0/0x12 ohci_intr,
>> ohci_intr,
>> ohci_intr
>> 19 0x82 9 PCI Lvl Fixed 6 1 0x0/0x13 ehci_intr
>> 22 0x40 5 PCI Lvl Fixed 2 2 0x0/0x16 ata_intr, ata_intr
>> 88 0x43 5 PCI Edg MSI-X 9 1 - ql_isr_aif
>> 89 0x44 5 PCI Edg MSI-X 10 1 - ql_isr_aif
>> 90 0x45 5 PCI Edg MSI-X 11 1 - ql_isr_aif
>> 91 0x46 5 PCI Edg MSI-X 12 1 - ql_isr_aif
>> 92 0x60 6 PCI Edg MSI-X 13 1 - igb_intr_tx_other
>> 93 0x61 6 PCI Edg MSI-X 14 1 - igb_intr_rx
>> 94 0x62 6 PCI Edg MSI-X 15 1 - igb_intr_tx_other
>> 95 0x63 6 PCI Edg MSI-X 16 1 - igb_intr_rx
>> 96 0x64 6 PCI Edg MSI-X 36 1 - igb_intr_tx_other
>> 97 0x65 6 PCI Edg MSI-X 37 1 - igb_intr_rx
>> 98 0x66 6 PCI Edg MSI-X 41 1 - igb_intr_tx_other
>> 99 0x67 6 PCI Edg MSI-X 42 1 - igb_intr_rx
>> 100 0x68 6 PCI Edg MSI-X 43 1 - igb_intr_tx_other
>> 101 0x69 6 PCI Edg MSI-X 44 1 - igb_intr_rx
>> 102 0x6a 6 PCI Edg MSI-X 45 1 - igb_intr_tx_other
>> 103 0x6b 6 PCI Edg MSI-X 46 1 - igb_intr_rx
>> 104 0x47 5 PCI Edg MSI 30 1 - qlt_isr
>> 105 0x48 5 PCI Edg MSI 31 1 - qlt_isr
>> 160 0xa0 0 Edg IPI all 0 - poke_cpu
>> 208 0xd0 14 Edg IPI all 1 -
>> kcpc_hw_overflow_intr
>> 209 0xd1 14 Edg IPI all 1 - cbe_fire
>> 210 0xd3 14 Edg IPI all 1 - cbe_fire
>> 240 0xe0 15 Edg IPI all 1 - xc_serv
>> 241 0xe1 15 Edg IPI all 1 - apic_error_intr
>>
>> OK
>>
>>
>> Dr T Adrian Carpenter
>> Reader in Imaging Sciences
>> Wolfson Brain Imaging Centre
>>
>>
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
More information about the OpenIndiana-discuss
mailing list