[OpenIndiana-discuss] COMSTAR qlt dropping link and resetting

Adrian Carpenter tac12 at wbic.cam.ac.uk
Thu Jun 7 15:17:04 UTC 2012


Well,   this has happened again,  exactly a week later,  same time too….

So the SSD ZILS didnt do the trick.


I think I am going to turn off the ZFS auto snapshot service ….

Jun  7 15:50:22 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 20300, topology Fabric Pt-to-Pt,speed 8G
Jun  7 15:50:23 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 10400, topology Fabric Pt-to-Pt,speed 8G


On 31 May 2012, at 17:37, Adrian Carpenter wrote:

> A quick update for those who might be following  this thread, I started to collect zilstats,  and what I have found is that about once every four days a transaction takes over half an hour:
> 
> TIME                        txg    N-Bytes  N-Bytes/s N-Max-Rate    B-Bytes  B-Bytes/s B-Max-Rate    ops  <=4kB 4-32kB >=32kB
> ..
> ..
> 2012 May 31 15:21:36     475232    2044232      60124     390888   16531456     486219    2985984    175      0      0    175
> 2012 May 31 15:22:39     475233    2762416      43847     293064   19734528     313246    2244608    266      0     10    256
> 2012 May 31 16:00:06     475234   29059896      12927    3198840  148652032      66126   12713984   1825      0    181   1644
> 2012 May 31 16:08:05     475235    2544016       5311     657384   13819904      28851    3575808    182      0      2    180
> 
> at 15:32  xen pool master  tried reseting the fibre channel HBAs,  however since the volume was still blocked I presume,  the pool master became very unhappy……
> 
> I then see the following in dmesg:
> 
> May 31 16:00:08 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 20300, topology Fabric Pt-to-Pt,speed 8G
> May 31 16:00:09 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 10400, topology Fabric Pt-to-Pt,speed 8G
> 
> I've just taken delivery of some SSDs and will add them as mirrored ZILlog devices,  hopefully this will help.
> 
> 
> 
> 
> 
> 
> On 21 May 2012, at 16:47, Mike La Spina wrote:
> 
>> Hi Adrian,
>> 
>> The SanBoxes?   -   Nexsan nothing in their logs
>> OK
>> 
>> Dmesg? :  
>>> May 17 17:33:47 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0 
>>> LINK UP, portid 20300, topology Fabric Pt-to-Pt,speed 8G May 17 
>>> 17:33:48 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, 
>>> portid 10400, topology Fabric Pt-to-Pt,speed 8G
>> 
>> Seeing a single LINKUP notice would normally only occur on init. I would
>> say it's just that, otherwise you would have a LINKDOWN before the
>> LINKUP, meaning an event on the fabric is your root issue.
>> 
>> Stmf service?  Nothing at all in the logs
>> OK
>> 
>> Are you running snapshots?    yes am running auto snapshot service,  in
>> addition I'm running a script (hourly) that snapshots the volume and
>> send it over ssh to another machine.
>> 
>> I suspect an issue here. The snapshot service runs on fixed time
>> intervals e.g. 15Min 1Hour 24Hhour 1Month if your also adding a snapshot
>> that runs hourly to do a ZFS send/rec they will overlap. The overlap may
>> cause an excessive blocking to stmf sbd access and result in a timeout
>> for the XEN host initiators. I suggest you use the auto based existing
>> hourly snaps and simply send them over to the remote host or file system
>> using a script @ 15 minutes after the hour.
>> 
>> Dedup?   
>> 
>> Off - OK
>> 
>> Compression?
>> 
>> Lzjb - OK
>> 
>> 
>> IRQ sharing?
>> echo ::interrupts | mdb -k
>> 
>> IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# ISR(s) 
>> 1    0x41 5   ISA    Edg Fixed  3   1     0x0/0x1   i8042_intr
>> 3    0xb1 12  ISA    Edg Fixed  39  1     0x0/0x3   asyintr
>> 4    0xb0 12  ISA    Edg Fixed  38  1     0x0/0x4   asyintr
>> 5    0xb2 12  ISA    Edg Fixed  40  1     0x0/0x5   asyintr
>> 9    0x80 9   PCI    Lvl Fixed  1   1     0x0/0x9   acpi_wrapper_isr
>> 12   0x42 5   ISA    Edg Fixed  4   1     0x0/0xc   i8042_intr
>> 16   0x83 9   PCI    Lvl Fixed  7   2     0x0/0x10  ohci_intr, ohci_intr
>> 17   0x81 9   PCI    Lvl Fixed  5   1     0x0/0x11  ehci_intr
>> 18   0x84 9   PCI    Lvl Fixed  8   3     0x0/0x12  ohci_intr,
>> ohci_intr, 
>> ohci_intr
>> 19   0x82 9   PCI    Lvl Fixed  6   1     0x0/0x13  ehci_intr
>> 22   0x40 5   PCI    Lvl Fixed  2   2     0x0/0x16  ata_intr, ata_intr
>> 88   0x43 5   PCI    Edg MSI-X  9   1     -         ql_isr_aif
>> 89   0x44 5   PCI    Edg MSI-X  10  1     -         ql_isr_aif
>> 90   0x45 5   PCI    Edg MSI-X  11  1     -         ql_isr_aif
>> 91   0x46 5   PCI    Edg MSI-X  12  1     -         ql_isr_aif
>> 92   0x60 6   PCI    Edg MSI-X  13  1     -         igb_intr_tx_other
>> 93   0x61 6   PCI    Edg MSI-X  14  1     -         igb_intr_rx
>> 94   0x62 6   PCI    Edg MSI-X  15  1     -         igb_intr_tx_other
>> 95   0x63 6   PCI    Edg MSI-X  16  1     -         igb_intr_rx
>> 96   0x64 6   PCI    Edg MSI-X  36  1     -         igb_intr_tx_other
>> 97   0x65 6   PCI    Edg MSI-X  37  1     -         igb_intr_rx
>> 98   0x66 6   PCI    Edg MSI-X  41  1     -         igb_intr_tx_other
>> 99   0x67 6   PCI    Edg MSI-X  42  1     -         igb_intr_rx
>> 100  0x68 6   PCI    Edg MSI-X  43  1     -         igb_intr_tx_other
>> 101  0x69 6   PCI    Edg MSI-X  44  1     -         igb_intr_rx
>> 102  0x6a 6   PCI    Edg MSI-X  45  1     -         igb_intr_tx_other
>> 103  0x6b 6   PCI    Edg MSI-X  46  1     -         igb_intr_rx
>> 104  0x47 5   PCI    Edg MSI    30  1     -         qlt_isr
>> 105  0x48 5   PCI    Edg MSI    31  1     -         qlt_isr
>> 160  0xa0 0          Edg IPI    all 0     -         poke_cpu
>> 208  0xd0 14         Edg IPI    all 1     -
>> kcpc_hw_overflow_intr
>> 209  0xd1 14         Edg IPI    all 1     -         cbe_fire
>> 210  0xd3 14         Edg IPI    all 1     -         cbe_fire
>> 240  0xe0 15         Edg IPI    all 1     -         xc_serv
>> 241  0xe1 15         Edg IPI    all 1     -         apic_error_intr
>> 
>> OK
>> 
>> 
>> Dr T Adrian Carpenter
>> Reader in Imaging Sciences
>> Wolfson Brain Imaging Centre
>> 
>> 
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>> 
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
> 
> 
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss




More information about the OpenIndiana-discuss mailing list