[OpenIndiana-discuss] ZFS read speed(iSCSI)

Sat Jun 8 00:38:05 UTC 2013

New card, different PCI-E slot (removed the other one) different FC switch
(same model with same code) older hba firmware (2.72a2)  = same result.

On the setting changes when it boots it complains about this option, does
not exist: szfs_txg_synctime
The changes still allowed for a constant write, but at a max of 100Mb/s so
not much better than iscsi over 1Gbe. I guess I would need to increase
write_limit_override. if i disable the settings again it shows 240MB/s with
bursts up to 300, both stats are from VMware's disk perf monitoring while
cloning the same VM.

All iSCSI luns remain active with no impact.
So I will conclude, I guess, it seems to be the problem that was there in
2009 from build ~100 to 128. When I search the error messages all posts
date back to 2009.

I will try one more thing to reinstall with 151a5 since a server that was
removed from the env was running this with no issues, but was using an
older emulex HBA, LP10000 PCIX
Looking at the notable changes in the release notes past a5 I do see
anything that changed that I would think would cause the behavior. Would
this just be a waste of time?

On Fri, Jun 7, 2013 at 6:36 PM, Heinrich van Riel <
heinrich.vanriel at gmail.com> wrote:

> In the debug info I see 1000's of the following events:
>
> FROM STMF:0149225: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149225: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149225: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149226: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149226: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149226: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149227: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149227: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149227: abort_task_offline called for LPORT: lport abort timed
> out
> emlxs1:0149228: port state change from 11 to 11
> FROM STMF:0149228: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149228: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149228: abort_task_offline called for LPORT: lport abort timed
> out
> :0149228: fct_port_shutdown: port-ffffff1157ff1278, fct_process_logo:
> unable to
> clean up I/O. iport-ffffff1157ff1378, icmd-ffffff1195463110
> FROM STMF:0149229: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149229: abort_task_offline called for LPORT: lport abort timed
> out
> FROM STMF:0149229: abort_task_offline called for LPORT: lport abort timed
> out
>
>
> And then the following as the port recovers.
>
> emlxs1:0150128: port state change from 11 to 11
> emlxs1:0150128: port state change from 11 to 0
> emlxs1:0150128: port state change from 0 to 11
> emlxs1:0150128: port state change from 11 to 0
> :0150850: fct_port_initialize: port-ffffff1157ff1278, emlxs initialize
> emlxs1:0150950: port state change from 0 to e
> emlxs1:0150953: Posting sol ELS 3 (PLOGI) rp_id=fffffd lp_id=22000
> emlxs1:0150953: Processing sol ELS 3 (PLOGI) rp_id=fffffd
> emlxs1:0150953: Sol ELS 3 (PLOGI) completed with status 0, did/fffffd
> emlxs1:0150953: Posting sol ELS 62 (SCR) rp_id=fffffd lp_id=22000
> emlxs1:0150953: Processing sol ELS 62 (SCR) rp_id=fffffd
> emlxs1:0150953: Sol ELS 62 (SCR) completed with status 0, did/fffffd
> emlxs1:0151053: Posting sol ELS 3 (PLOGI) rp_id=fffffc lp_id=22000
> emlxs1:0151053: Processing sol ELS 3 (PLOGI) rp_id=fffffc
> emlxs1:0151053: Sol ELS 3 (PLOGI) completed with status 0, did/fffffc
> emlxs1:0151054: Posting unsol ELS 3 (PLOGI) rp_id=fffc02 lp_id=22000
> emlxs1:0151054: Processing unsol ELS 3 (PLOGI) rp_id=fffc02
> emlxs1:0151054: Posting unsol ELS 20 (PRLI) rp_id=fffc02 lp_id=22000
> emlxs1:0151054: Processing unsol ELS 20 (PRLI) rp_id=fffc02
> emlxs1:0151055: Posting unsol ELS 5 (LOGO) rp_id=fffc02 lp_id=22000
> emlxs1:0151055: Processing unsol ELS 5 (LOGO) rp_id=fffc02
> emlxs1:0151146: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000
> emlxs1:0151146: Processing unsol ELS 3 (PLOGI) rp_id=21500
> emlxs1:0151146: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000
>  emlxs1:0151146: Processing unsol ELS 20 (PRLI) rp_id=21500
> emlxs1:0151146: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000
> emlxs1:0151146: Processing unsol ELS 3 (PLOGI) rp_id=21600
> emlxs1:0151146: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000
> emlxs1:0151146: Processing unsol ELS 20 (PRLI) rp_id=21600
> emlxs1:0151338: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000
> emlxs1:0151338: Processing unsol ELS 3 (PLOGI) rp_id=21500
> emlxs1:0151338: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000
> emlxs1:0151338: Processing unsol ELS 20 (PRLI) rp_id=21500
> emlxs1:0151338: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000
>  emlxs1:0151338: Processing unsol ELS 3 (PLOGI) rp_id=21600
> emlxs1:0151338: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000
> emlxs1:0151338: Processing unsol ELS 20 (PRLI) rp_id=21600
> emlxs1:0151428: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000
> emlxs1:0151428: Processing unsol ELS 3 (PLOGI) rp_id=21500
> emlxs1:0151428: port state change from e to 4
> emlxs1:0151428: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000
> emlxs1:0151428: Processing unsol ELS 20 (PRLI) rp_id=21500
> emlxs1:0151428: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000
> emlxs1:0151428: Processing unsol ELS 3 (PLOGI) rp_id=21600
> emlxs1:0151428: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000
> emlxs1:0151428: Processing unsol ELS 20 (PRLI) rp_id=21600
>
> To be honest it does not really tell me much since I do not understand
> comstar to these depths. It would appear that the link fails so either
> driver problem or hardware issue? I will replace the LPe11002 with a brand
> new unopened one and then  give up on FC on OI.
>
>
>
>
> On Fri, Jun 7, 2013 at 4:54 PM, Heinrich van Riel <
> heinrich.vanriel at gmail.com> wrote:
>
>> I did find this in my inbox from 2009, I have been using FC with ZFS for
>> quite sometime and only recently retired an install with OI a5 that was
>> upgraded from opensolaris. It did not do real heavy duty stuff, but I had a
>> similar problem where we were stuck on build 99 for quite some time.
>>
>> To  Jean-Yves Chevallier at emulex
>> Any comments on the future of Emulex with regards to the COMSTAR project?
>> It seems I am not the only one that have problems using Emulex in later
>> builds. For now I am stuck with build 99.
>> As always any feedback would be greatly appreciated since we have to make
>> a decision of sticking with Opensolaris & COMSTAR or start migrating to
>> another solution since we cannot stay on build 99 forever.
>> What I am really trying to find out is if there is a roadmap/decision to
>> ultimately only support Qlogic HBA’s in target mode.
>>
>> Response:
>>
>>
>> Sorry for the delay in answering you. I do have news for you.
>> First off, the interface used by COMSTAR has changed in recent Nevada
>> releases (NV120 and up I believe). Since it is not a public interface we
>> had no prior indication on this.
>> We know of a number of issues, some on our driver, some on the COMSTAR
>> stack. Based on the information we have from you and other community
>> members, we have addressed all these issues in our next driver version – we
>> will know for sure after we run our DVT (driver verification testing) next
>> week. Depending on progress, this driver will be part of NV 128 or else NV
>> 130.
>> I believe it is worth taking another look based on these upcoming builds,
>> which I imagine might also include fixes to the rest of the COMSTAR stack.
>>
>> Best regards.
>>
>>
>> I can confirm that this was fixed in 128 and all I did was update from 99
>> to 128 and there were no problems.
>> Seem like the same problem has now returned and emulex does not appear to
>> be a good fit since sun mostly used qlogic.
>>
>> guess it is back to iscsi only for now.
>>
>>
>>
>> On Fri, Jun 7, 2013 at 4:40 PM, Heinrich van Riel <
>> heinrich.vanriel at gmail.com> wrote:
>>
>>> I changed the settings. I do see it writing all the time now, but the
>>> link still dies after a a few min
>>>
>>> Jun  7 16:30:57  emlxs: [ID 349649 kern.info] [ 5.0608]emlxs1: NOTICE:
>>> 730: Link reset. (Disabling link...)
>>> Jun  7 16:30:57 emlxs: [ID 349649 kern.info] [ 5.0333]emlxs1: NOTICE:
>>> 710: Link down.
>>> Jun  7 16:33:16 emlxs: [ID 349649 kern.info] [ 5.055D]emlxs1: NOTICE:
>>> 720: Link up. (4Gb, fabric, target)
>>> Jun  7 16:33:16 fct: [ID 132490 kern.notice] NOTICE: emlxs1 LINK UP,
>>> portid 22000, topology Fabric Pt-to-Pt,speed 4G
>>>
>>>
>>>
>>>
>>> On Fri, Jun 7, 2013 at 3:06 PM, Jim Klimov <jimklimov at cos.ru> wrote:
>>>
>>>> Comment below
>>>>
>>>>
>>>> On 2013-06-07 20:42, Heinrich van Riel wrote:
>>>>
>>>>> One sec apart cloning 150GB vm from a datastore on EMC to OI.
>>>>>
>>>>> alloc free read write read write
>>>>> ----- ----- ----- ----- ----- -----
>>>>> 309G 54.2T 81 48 452K 1.34M
>>>>> 309G 54.2T 0 8.17K 0 258M
>>>>> 310G 54.2T 0 16.3K 0 510M
>>>>> 310G 54.2T 0 0 0 0
>>>>> 310G 54.2T 0 0 0 0
>>>>> 310G 54.2T 0 0 0 0
>>>>> 310G 54.2T 0 10.1K 0 320M
>>>>> 311G 54.2T 0 26.1K 0 820M
>>>>> 311G 54.2T 0 0 0 0
>>>>> 311G 54.2T 0 0 0 0
>>>>> 311G 54.2T 0 0 0 0
>>>>> 311G 54.2T 0 10.6K 0 333M
>>>>> 313G 54.2T 0 27.4K 0 860M
>>>>> 313G 54.2T 0 0 0 0
>>>>> 313G 54.2T 0 0 0 0
>>>>> 313G 54.2T 0 0 0 0
>>>>> 313G 54.2T 0 9.69K 0 305M
>>>>> 314G 54.2T 0 10.8K 0 337M
>>>>>
>>>> ...
>>>> Were it not for your complaints about link resets and "unusable"
>>>> connections, I'd say this looks like a normal behavior for async
>>>> writes: they get cached up, and every 5 sec you have a transaction
>>>> group (TXG) sync which flushes the writes from cache to disks.
>>>>
>>>> In fact, the picture still looks like that, and possibly is the
>>>> reason for hiccups.
>>>>
>>>> The TXG sync may be an IO intensive process, which may block or
>>>> delay many other system tasks; previously when the interval
>>>> defaulted to 30 sec we got unusable SSH connections and temporarily
>>>> "hung" disk requests on the storage server every half a minute when
>>>> it was really busy (i.e. initial filling up with data from older
>>>> boxes). It cached up about 10 seconds worth of writes, then spewed
>>>> them out and could do nothing else. I don't think I ever saw network
>>>> connections timing out or NICs reporting resets due to this, but I
>>>> wouldn't be surprised if this were the cause for your case, though
>>>> (i.e. disk IO threads preempting HBA/NIC threads for too long somehow,
>>>> making the driver very puzzled about staleness state of its card).
>>>>
>>>> At the very least, TXG syncs can be tuned by two knobs: the time
>>>> limit (5 sec default) and the size limit (when the cache is "this"
>>>> full, begin the sync to disk). The latter is a realistic figure that
>>>> can allow you to sync in shorter bursts - with less interruptions
>>>> to smooth IO and process work.
>>>>
>>>> A somewhat related tunable is the number of requests that ZFS would
>>>> queue up for a disk. Depending on its NCQ/TCQ abilities and random
>>>> IO abilities (HDD vs. SSD), long or short queues may be preferable.
>>>> See also: http://www.solarisinternals.**com/wiki/index.php/ZFS_Evil_**
>>>> Tuning_Guide#Device_I.2FO_**Queue_Size_.28I.2FO_**Concurrency.29<http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29>
>>>>
>>>> These tunables can be set at runtime with "mdb -K", as well as in
>>>> the /etc/system file to survive reboots. One of our storage boxes
>>>> has these example values in /etc/system:
>>>>
>>>> *# default: flush txg every 5sec (may be max 30sec, optimize
>>>> *# for 5 sec writing)
>>>> set zfs:zfs_txg_synctime = 5
>>>>
>>>> *# Spool to disk when the ZFS cache is 0x18000000 (384Mb) full
>>>> set zfs:zfs_write_limit_override = 0x18000000
>>>> *# ...for realtime changes use mdb.
>>>> *# Example sets 0x18000000 (384Mb, 402653184 b):
>>>> *# echo zfs_write_limit_override/**W0t402653184 | mdb -kw
>>>>
>>>> *# ZFS queue depth per disk
>>>> set zfs:zfs_vdev_max_pending = 3
>>>>
>>>> HTH,
>>>> //Jim Klimov
>>>>
>>>>
>>>> ______________________________**_________________
>>>> OpenIndiana-discuss mailing list
>>>> OpenIndiana-discuss@**openindiana.org<OpenIndiana-discuss at openindiana.org>
>>>> http://openindiana.org/**mailman/listinfo/openindiana-**discuss<http://openindiana.org/mailman/listinfo/openindiana-discuss>
>>>>
>>>
>>>
>>
>