[OpenIndiana-discuss] ZFS read speed(iSCSI)

Fri Jun 7 22:36:22 UTC 2013

In the debug info I see 1000's of the following events:

FROM STMF:0149225: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149225: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149225: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149226: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149226: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149226: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149227: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149227: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149227: abort_task_offline called for LPORT: lport abort timed
out
emlxs1:0149228: port state change from 11 to 11
FROM STMF:0149228: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149228: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149228: abort_task_offline called for LPORT: lport abort timed
out
:0149228: fct_port_shutdown: port-ffffff1157ff1278, fct_process_logo:
unable to
clean up I/O. iport-ffffff1157ff1378, icmd-ffffff1195463110
FROM STMF:0149229: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149229: abort_task_offline called for LPORT: lport abort timed
out
FROM STMF:0149229: abort_task_offline called for LPORT: lport abort timed
out

And then the following as the port recovers.

emlxs1:0150128: port state change from 11 to 11
emlxs1:0150128: port state change from 11 to 0
emlxs1:0150128: port state change from 0 to 11
emlxs1:0150128: port state change from 11 to 0
:0150850: fct_port_initialize: port-ffffff1157ff1278, emlxs initialize
emlxs1:0150950: port state change from 0 to e
emlxs1:0150953: Posting sol ELS 3 (PLOGI) rp_id=fffffd lp_id=22000
emlxs1:0150953: Processing sol ELS 3 (PLOGI) rp_id=fffffd
emlxs1:0150953: Sol ELS 3 (PLOGI) completed with status 0, did/fffffd
emlxs1:0150953: Posting sol ELS 62 (SCR) rp_id=fffffd lp_id=22000
emlxs1:0150953: Processing sol ELS 62 (SCR) rp_id=fffffd
emlxs1:0150953: Sol ELS 62 (SCR) completed with status 0, did/fffffd
emlxs1:0151053: Posting sol ELS 3 (PLOGI) rp_id=fffffc lp_id=22000
emlxs1:0151053: Processing sol ELS 3 (PLOGI) rp_id=fffffc
emlxs1:0151053: Sol ELS 3 (PLOGI) completed with status 0, did/fffffc
emlxs1:0151054: Posting unsol ELS 3 (PLOGI) rp_id=fffc02 lp_id=22000
emlxs1:0151054: Processing unsol ELS 3 (PLOGI) rp_id=fffc02
emlxs1:0151054: Posting unsol ELS 20 (PRLI) rp_id=fffc02 lp_id=22000
emlxs1:0151054: Processing unsol ELS 20 (PRLI) rp_id=fffc02
emlxs1:0151055: Posting unsol ELS 5 (LOGO) rp_id=fffc02 lp_id=22000
emlxs1:0151055: Processing unsol ELS 5 (LOGO) rp_id=fffc02
emlxs1:0151146: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000
emlxs1:0151146: Processing unsol ELS 3 (PLOGI) rp_id=21500
emlxs1:0151146: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000
emlxs1:0151146: Processing unsol ELS 20 (PRLI) rp_id=21500
emlxs1:0151146: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000
emlxs1:0151146: Processing unsol ELS 3 (PLOGI) rp_id=21600
emlxs1:0151146: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000
emlxs1:0151146: Processing unsol ELS 20 (PRLI) rp_id=21600
emlxs1:0151338: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000
emlxs1:0151338: Processing unsol ELS 3 (PLOGI) rp_id=21500
emlxs1:0151338: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000
emlxs1:0151338: Processing unsol ELS 20 (PRLI) rp_id=21500
emlxs1:0151338: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000
emlxs1:0151338: Processing unsol ELS 3 (PLOGI) rp_id=21600
emlxs1:0151338: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000
emlxs1:0151338: Processing unsol ELS 20 (PRLI) rp_id=21600
emlxs1:0151428: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000
emlxs1:0151428: Processing unsol ELS 3 (PLOGI) rp_id=21500
emlxs1:0151428: port state change from e to 4
emlxs1:0151428: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000
emlxs1:0151428: Processing unsol ELS 20 (PRLI) rp_id=21500
emlxs1:0151428: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000
emlxs1:0151428: Processing unsol ELS 3 (PLOGI) rp_id=21600
emlxs1:0151428: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000
emlxs1:0151428: Processing unsol ELS 20 (PRLI) rp_id=21600

To be honest it does not really tell me much since I do not understand
comstar to these depths. It would appear that the link fails so either
driver problem or hardware issue? I will replace the LPe11002 with a brand
new unopened one and then  give up on FC on OI.

On Fri, Jun 7, 2013 at 4:54 PM, Heinrich van Riel <
heinrich.vanriel at gmail.com> wrote:

> I did find this in my inbox from 2009, I have been using FC with ZFS for
> quite sometime and only recently retired an install with OI a5 that was
> upgraded from opensolaris. It did not do real heavy duty stuff, but I had a
> similar problem where we were stuck on build 99 for quite some time.
>
> To  Jean-Yves Chevallier at emulex
> Any comments on the future of Emulex with regards to the COMSTAR project?
> It seems I am not the only one that have problems using Emulex in later
> builds. For now I am stuck with build 99.
> As always any feedback would be greatly appreciated since we have to make
> a decision of sticking with Opensolaris & COMSTAR or start migrating to
> another solution since we cannot stay on build 99 forever.
> What I am really trying to find out is if there is a roadmap/decision to
> ultimately only support Qlogic HBA’s in target mode.
>
> Response:
>
>
> Sorry for the delay in answering you. I do have news for you.
> First off, the interface used by COMSTAR has changed in recent Nevada
> releases (NV120 and up I believe). Since it is not a public interface we
> had no prior indication on this.
> We know of a number of issues, some on our driver, some on the COMSTAR
> stack. Based on the information we have from you and other community
> members, we have addressed all these issues in our next driver version – we
> will know for sure after we run our DVT (driver verification testing) next
> week. Depending on progress, this driver will be part of NV 128 or else NV
> 130.
> I believe it is worth taking another look based on these upcoming builds,
> which I imagine might also include fixes to the rest of the COMSTAR stack.
>
> Best regards.
>
>
> I can confirm that this was fixed in 128 and all I did was update from 99
> to 128 and there were no problems.
> Seem like the same problem has now returned and emulex does not appear to
> be a good fit since sun mostly used qlogic.
>
> guess it is back to iscsi only for now.
>
>
>
> On Fri, Jun 7, 2013 at 4:40 PM, Heinrich van Riel <
> heinrich.vanriel at gmail.com> wrote:
>
>> I changed the settings. I do see it writing all the time now, but the
>> link still dies after a a few min
>>
>> Jun  7 16:30:57  emlxs: [ID 349649 kern.info] [ 5.0608]emlxs1: NOTICE:
>> 730: Link reset. (Disabling link...)
>> Jun  7 16:30:57 emlxs: [ID 349649 kern.info] [ 5.0333]emlxs1: NOTICE:
>> 710: Link down.
>> Jun  7 16:33:16 emlxs: [ID 349649 kern.info] [ 5.055D]emlxs1: NOTICE:
>> 720: Link up. (4Gb, fabric, target)
>> Jun  7 16:33:16 fct: [ID 132490 kern.notice] NOTICE: emlxs1 LINK UP,
>> portid 22000, topology Fabric Pt-to-Pt,speed 4G
>>
>>
>>
>>
>> On Fri, Jun 7, 2013 at 3:06 PM, Jim Klimov <jimklimov at cos.ru> wrote:
>>
>>> Comment below
>>>
>>>
>>> On 2013-06-07 20:42, Heinrich van Riel wrote:
>>>
>>>> One sec apart cloning 150GB vm from a datastore on EMC to OI.
>>>>
>>>> alloc free read write read write
>>>> ----- ----- ----- ----- ----- -----
>>>> 309G 54.2T 81 48 452K 1.34M
>>>> 309G 54.2T 0 8.17K 0 258M
>>>> 310G 54.2T 0 16.3K 0 510M
>>>> 310G 54.2T 0 0 0 0
>>>> 310G 54.2T 0 0 0 0
>>>> 310G 54.2T 0 0 0 0
>>>> 310G 54.2T 0 10.1K 0 320M
>>>> 311G 54.2T 0 26.1K 0 820M
>>>> 311G 54.2T 0 0 0 0
>>>> 311G 54.2T 0 0 0 0
>>>> 311G 54.2T 0 0 0 0
>>>> 311G 54.2T 0 10.6K 0 333M
>>>> 313G 54.2T 0 27.4K 0 860M
>>>> 313G 54.2T 0 0 0 0
>>>> 313G 54.2T 0 0 0 0
>>>> 313G 54.2T 0 0 0 0
>>>> 313G 54.2T 0 9.69K 0 305M
>>>> 314G 54.2T 0 10.8K 0 337M
>>>>
>>> ...
>>> Were it not for your complaints about link resets and "unusable"
>>> connections, I'd say this looks like a normal behavior for async
>>> writes: they get cached up, and every 5 sec you have a transaction
>>> group (TXG) sync which flushes the writes from cache to disks.
>>>
>>> In fact, the picture still looks like that, and possibly is the
>>> reason for hiccups.
>>>
>>> The TXG sync may be an IO intensive process, which may block or
>>> delay many other system tasks; previously when the interval
>>> defaulted to 30 sec we got unusable SSH connections and temporarily
>>> "hung" disk requests on the storage server every half a minute when
>>> it was really busy (i.e. initial filling up with data from older
>>> boxes). It cached up about 10 seconds worth of writes, then spewed
>>> them out and could do nothing else. I don't think I ever saw network
>>> connections timing out or NICs reporting resets due to this, but I
>>> wouldn't be surprised if this were the cause for your case, though
>>> (i.e. disk IO threads preempting HBA/NIC threads for too long somehow,
>>> making the driver very puzzled about staleness state of its card).
>>>
>>> At the very least, TXG syncs can be tuned by two knobs: the time
>>> limit (5 sec default) and the size limit (when the cache is "this"
>>> full, begin the sync to disk). The latter is a realistic figure that
>>> can allow you to sync in shorter bursts - with less interruptions
>>> to smooth IO and process work.
>>>
>>> A somewhat related tunable is the number of requests that ZFS would
>>> queue up for a disk. Depending on its NCQ/TCQ abilities and random
>>> IO abilities (HDD vs. SSD), long or short queues may be preferable.
>>> See also: http://www.solarisinternals.**com/wiki/index.php/ZFS_Evil_**
>>> Tuning_Guide#Device_I.2FO_**Queue_Size_.28I.2FO_**Concurrency.29<http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29>
>>>
>>> These tunables can be set at runtime with "mdb -K", as well as in
>>> the /etc/system file to survive reboots. One of our storage boxes
>>> has these example values in /etc/system:
>>>
>>> *# default: flush txg every 5sec (may be max 30sec, optimize
>>> *# for 5 sec writing)
>>> set zfs:zfs_txg_synctime = 5
>>>
>>> *# Spool to disk when the ZFS cache is 0x18000000 (384Mb) full
>>> set zfs:zfs_write_limit_override = 0x18000000
>>> *# ...for realtime changes use mdb.
>>> *# Example sets 0x18000000 (384Mb, 402653184 b):
>>> *# echo zfs_write_limit_override/**W0t402653184 | mdb -kw
>>>
>>> *# ZFS queue depth per disk
>>> set zfs:zfs_vdev_max_pending = 3
>>>
>>> HTH,
>>> //Jim Klimov
>>>
>>>
>>> ______________________________**_________________
>>> OpenIndiana-discuss mailing list
>>> OpenIndiana-discuss@**openindiana.org<OpenIndiana-discuss at openindiana.org>
>>> http://openindiana.org/**mailman/listinfo/openindiana-**discuss<http://openindiana.org/mailman/listinfo/openindiana-discuss>
>>>
>>
>>
>