[OpenIndiana-discuss] ZFS read speed(iSCSI)

Heinrich van Riel heinrich.vanriel at gmail.com
Mon Jun 10 23:36:41 UTC 2013


switch to the qlogic adpater using solaris 11.1. Problem resolved.... well
for now. Not as fast as OI with the emulex adapter, perhaps it is the older
pool/fs version since I want to keep my options open for now. I am getting
around 200MB/s when cloning. At least backups can run for now. Getting a
license for 11.1 for one year. I will worry about it again after that.
Never had problems with any device connected fc like this, that is usually
the beauty of it but expensive. Downside right now is  qlt card I have only
has a single port.
thanks,



On Mon, Jun 10, 2013 at 2:46 PM, Heinrich van Riel <
heinrich.vanriel at gmail.com> wrote:

> Just want to provide an update here.
>
> Installed Solaris 11.1 reconfigured everything. Went back to Emulex card
> since it is a dual port for connect to both switches. Same problem, well
> the link does not fail, but it is writing at 20k/s.
>
>
> I am really not sure what to do anymore other that to accept fc target is
> no longer an option, but I will post in the ora solaris forum. Either this
> has been an issue for some time or it is a hardware combination or perhaps
> I am doing something seriously wrong.
>
>
>
>
>
> On Sat, Jun 8, 2013 at 6:57 PM, Heinrich van Riel <
> heinrich.vanriel at gmail.com> wrote:
>
>> I took a look at every server that I knew I could power down or that is
>> slated for removal in the future and I found a qlogic adapter not in use.
>>
>> HBA Port WWN: 2100001b3280b
>>         Port Mode: Target
>>         Port ID: 12000
>>         OS Device Name: Not Applicable
>>         Manufacturer: QLogic Corp.
>>         Model: QLE2460
>>         Firmware Version: 5.2.1
>>         FCode/BIOS Version: N/A
>>         Serial Number: not available
>>         Driver Name: COMSTAR QLT
>>         Driver Version: 20100505-1.05
>>         Type: F-port
>>         State: online
>>         Supported Speeds: 1Gb 2Gb 4Gb
>>         Current Speed: 4Gb
>>         Node WWN: 2000001b3280b
>>
>>
>> Link does not go down but useless, right from the start it is as slow as
>> the emulex after I made the xfer change.
>> So it is not a driver issue.
>>
>> alloc free read write read write
>> ----- ----- ----- ----- ----- -----
>> 681G 53.8T 5 12 29.9K 51.3K
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 88 0 221K
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 163 0 812K
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 198 0 1.13M
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 88 0 221K
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 187 0 1.02M
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>> 681G 53.8T 0 0 0 0
>>
>> This is a clean install of a7 with nothing done other than nic config in
>> lacp. I did not attempt a reinstall of a5 yet and prob won't either.
>> I dont know what to do anymore I was going to try OmniOS but there is no
>> way of knowing if it would work.
>>
>>
>> I will see if I can get approved for a solaris license for one year, if
>> not I am switching back to windows storage spaces. Cant backup the current
>> lab on the EMC array to this node in any event since there is no ip
>> connectivity and fc is a dream.
>>
>> Guess I am the only one trying to use it as an fc target and these
>> problems are not noticed.
>>
>>
>>
>> On Sat, Jun 8, 2013 at 4:55 PM, Heinrich van Riel <
>> heinrich.vanriel at gmail.com> wrote:
>>
>>> changing max-xfer-size causes the link to stay up and no problem are
>>> reported from stmf.
>>>
>>> #       Memory_model       max-xfer-size
>>> #     ----------------------------------------
>>> #       Small              131072 - 339968
>>> #       Medium             339969 - 688128
>>> #       Large              688129 - 1388544
>>> #
>>> # Range:  Min:131072   Max:1388544   Default:339968
>>> #
>>> max-xfer-size=339968;
>>>
>>> as soon as I changed it to 339969 the there is no link loss, but I would
>>> be so lucky that is solves my problem. after a few min it would grind to a
>>> crawl, so much so that in vmware it will take well over a min to just
>>> browse a folder, we talking are a few k/s.
>>>
>>> Setting it to the max causes the the link to go down again and smtf
>>> reports the following again:
>>> FROM STMF:0062568: abort_task_offline called for LPORT: lport abort
>>> timed out
>>>
>>> I also played around with the buffer settings.
>>>
>>> Any ideas?
>>> Thanks,
>>>
>>>
>>>
>>>  On Fri, Jun 7, 2013 at 8:38 PM, Heinrich van Riel <
>>> heinrich.vanriel at gmail.com> wrote:
>>>
>>>> New card, different PCI-E slot (removed the other one) different FC
>>>> switch (same model with same code) older hba firmware (2.72a2)  = same
>>>> result.
>>>>
>>>> On the setting changes when it boots it complains about this option,
>>>> does not exist: szfs_txg_synctime
>>>> The changes still allowed for a constant write, but at a max of 100Mb/s
>>>> so not much better than iscsi over 1Gbe. I guess I would need to increase
>>>> write_limit_override. if i disable the settings again it shows 240MB/s
>>>> with bursts up to 300, both stats are from VMware's disk perf monitoring
>>>> while cloning the same VM.
>>>>
>>>> All iSCSI luns remain active with no impact.
>>>> So I will conclude, I guess, it seems to be the problem that was there
>>>> in 2009 from build ~100 to 128. When I search the error messages all posts
>>>> date back to 2009.
>>>>
>>>> I will try one more thing to reinstall with 151a5 since a server that
>>>> was removed from the env was running this with no issues, but was using an
>>>> older emulex HBA, LP10000 PCIX
>>>> Looking at the notable changes in the release notes past a5 I do see
>>>> anything that changed that I would think would cause the behavior. Would
>>>> this just be a waste of time?
>>>>
>>>>
>>>>
>>>> On Fri, Jun 7, 2013 at 6:36 PM, Heinrich van Riel <
>>>> heinrich.vanriel at gmail.com> wrote:
>>>>
>>>>> In the debug info I see 1000's of the following events:
>>>>>
>>>>> FROM STMF:0149225: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149225: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149225: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149226: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149226: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149226: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149227: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149227: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149227: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> emlxs1:0149228: port state change from 11 to 11
>>>>> FROM STMF:0149228: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149228: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149228: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> :0149228: fct_port_shutdown: port-ffffff1157ff1278, fct_process_logo:
>>>>> unable to
>>>>> clean up I/O. iport-ffffff1157ff1378, icmd-ffffff1195463110
>>>>> FROM STMF:0149229: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149229: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>> FROM STMF:0149229: abort_task_offline called for LPORT: lport abort
>>>>> timed out
>>>>>
>>>>>
>>>>> And then the following as the port recovers.
>>>>>
>>>>> emlxs1:0150128: port state change from 11 to 11
>>>>> emlxs1:0150128: port state change from 11 to 0
>>>>> emlxs1:0150128: port state change from 0 to 11
>>>>> emlxs1:0150128: port state change from 11 to 0
>>>>> :0150850: fct_port_initialize: port-ffffff1157ff1278, emlxs initialize
>>>>> emlxs1:0150950: port state change from 0 to e
>>>>> emlxs1:0150953: Posting sol ELS 3 (PLOGI) rp_id=fffffd lp_id=22000
>>>>> emlxs1:0150953: Processing sol ELS 3 (PLOGI) rp_id=fffffd
>>>>> emlxs1:0150953: Sol ELS 3 (PLOGI) completed with status 0, did/fffffd
>>>>> emlxs1:0150953: Posting sol ELS 62 (SCR) rp_id=fffffd lp_id=22000
>>>>> emlxs1:0150953: Processing sol ELS 62 (SCR) rp_id=fffffd
>>>>> emlxs1:0150953: Sol ELS 62 (SCR) completed with status 0, did/fffffd
>>>>> emlxs1:0151053: Posting sol ELS 3 (PLOGI) rp_id=fffffc lp_id=22000
>>>>> emlxs1:0151053: Processing sol ELS 3 (PLOGI) rp_id=fffffc
>>>>> emlxs1:0151053: Sol ELS 3 (PLOGI) completed with status 0, did/fffffc
>>>>> emlxs1:0151054: Posting unsol ELS 3 (PLOGI) rp_id=fffc02 lp_id=22000
>>>>> emlxs1:0151054: Processing unsol ELS 3 (PLOGI) rp_id=fffc02
>>>>> emlxs1:0151054: Posting unsol ELS 20 (PRLI) rp_id=fffc02 lp_id=22000
>>>>> emlxs1:0151054: Processing unsol ELS 20 (PRLI) rp_id=fffc02
>>>>> emlxs1:0151055: Posting unsol ELS 5 (LOGO) rp_id=fffc02 lp_id=22000
>>>>> emlxs1:0151055: Processing unsol ELS 5 (LOGO) rp_id=fffc02
>>>>> emlxs1:0151146: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000
>>>>> emlxs1:0151146: Processing unsol ELS 3 (PLOGI) rp_id=21500
>>>>> emlxs1:0151146: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000
>>>>>  emlxs1:0151146: Processing unsol ELS 20 (PRLI) rp_id=21500
>>>>> emlxs1:0151146: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000
>>>>> emlxs1:0151146: Processing unsol ELS 3 (PLOGI) rp_id=21600
>>>>> emlxs1:0151146: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000
>>>>> emlxs1:0151146: Processing unsol ELS 20 (PRLI) rp_id=21600
>>>>> emlxs1:0151338: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000
>>>>> emlxs1:0151338: Processing unsol ELS 3 (PLOGI) rp_id=21500
>>>>> emlxs1:0151338: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000
>>>>> emlxs1:0151338: Processing unsol ELS 20 (PRLI) rp_id=21500
>>>>> emlxs1:0151338: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000
>>>>>  emlxs1:0151338: Processing unsol ELS 3 (PLOGI) rp_id=21600
>>>>> emlxs1:0151338: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000
>>>>> emlxs1:0151338: Processing unsol ELS 20 (PRLI) rp_id=21600
>>>>> emlxs1:0151428: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000
>>>>> emlxs1:0151428: Processing unsol ELS 3 (PLOGI) rp_id=21500
>>>>> emlxs1:0151428: port state change from e to 4
>>>>> emlxs1:0151428: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000
>>>>> emlxs1:0151428: Processing unsol ELS 20 (PRLI) rp_id=21500
>>>>> emlxs1:0151428: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000
>>>>> emlxs1:0151428: Processing unsol ELS 3 (PLOGI) rp_id=21600
>>>>> emlxs1:0151428: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000
>>>>> emlxs1:0151428: Processing unsol ELS 20 (PRLI) rp_id=21600
>>>>>
>>>>> To be honest it does not really tell me much since I do not understand
>>>>> comstar to these depths. It would appear that the link fails so either
>>>>> driver problem or hardware issue? I will replace the LPe11002 with a brand
>>>>> new unopened one and then  give up on FC on OI.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 7, 2013 at 4:54 PM, Heinrich van Riel <
>>>>> heinrich.vanriel at gmail.com> wrote:
>>>>>
>>>>>> I did find this in my inbox from 2009, I have been using FC with ZFS
>>>>>> for quite sometime and only recently retired an install with OI a5 that was
>>>>>> upgraded from opensolaris. It did not do real heavy duty stuff, but I had a
>>>>>> similar problem where we were stuck on build 99 for quite some time.
>>>>>>
>>>>>> To  Jean-Yves Chevallier at emulex
>>>>>> Any comments on the future of Emulex with regards to the COMSTAR
>>>>>> project?
>>>>>> It seems I am not the only one that have problems using Emulex in
>>>>>> later builds. For now I am stuck with build 99.
>>>>>> As always any feedback would be greatly appreciated since we have to
>>>>>> make a decision of sticking with Opensolaris & COMSTAR or start migrating
>>>>>> to another solution since we cannot stay on build 99 forever.
>>>>>> What I am really trying to find out is if there is a roadmap/decision
>>>>>> to ultimately only support Qlogic HBA’s in target mode.
>>>>>>
>>>>>> Response:
>>>>>>
>>>>>>
>>>>>> Sorry for the delay in answering you. I do have news for you.
>>>>>> First off, the interface used by COMSTAR has changed in recent Nevada
>>>>>> releases (NV120 and up I believe). Since it is not a public interface we
>>>>>> had no prior indication on this.
>>>>>> We know of a number of issues, some on our driver, some on the
>>>>>> COMSTAR stack. Based on the information we have from you and other
>>>>>> community members, we have addressed all these issues in our next driver
>>>>>> version – we will know for sure after we run our DVT (driver verification
>>>>>> testing) next week. Depending on progress, this driver will be part of NV
>>>>>> 128 or else NV 130.
>>>>>> I believe it is worth taking another look based on these upcoming
>>>>>> builds, which I imagine might also include fixes to the rest of the COMSTAR
>>>>>> stack.
>>>>>>
>>>>>> Best regards.
>>>>>>
>>>>>>
>>>>>> I can confirm that this was fixed in 128 and all I did was update
>>>>>> from 99 to 128 and there were no problems.
>>>>>> Seem like the same problem has now returned and emulex does not
>>>>>> appear to be a good fit since sun mostly used qlogic.
>>>>>>
>>>>>> guess it is back to iscsi only for now.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 7, 2013 at 4:40 PM, Heinrich van Riel <
>>>>>> heinrich.vanriel at gmail.com> wrote:
>>>>>>
>>>>>>> I changed the settings. I do see it writing all the time now, but
>>>>>>> the link still dies after a a few min
>>>>>>>
>>>>>>> Jun  7 16:30:57  emlxs: [ID 349649 kern.info] [ 5.0608]emlxs1:
>>>>>>> NOTICE: 730: Link reset. (Disabling link...)
>>>>>>> Jun  7 16:30:57 emlxs: [ID 349649 kern.info] [ 5.0333]emlxs1:
>>>>>>> NOTICE: 710: Link down.
>>>>>>> Jun  7 16:33:16 emlxs: [ID 349649 kern.info] [ 5.055D]emlxs1:
>>>>>>> NOTICE: 720: Link up. (4Gb, fabric, target)
>>>>>>> Jun  7 16:33:16 fct: [ID 132490 kern.notice] NOTICE: emlxs1 LINK UP,
>>>>>>> portid 22000, topology Fabric Pt-to-Pt,speed 4G
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 7, 2013 at 3:06 PM, Jim Klimov <jimklimov at cos.ru> wrote:
>>>>>>>
>>>>>>>> Comment below
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2013-06-07 20:42, Heinrich van Riel wrote:
>>>>>>>>
>>>>>>>>> One sec apart cloning 150GB vm from a datastore on EMC to OI.
>>>>>>>>>
>>>>>>>>> alloc free read write read write
>>>>>>>>> ----- ----- ----- ----- ----- -----
>>>>>>>>> 309G 54.2T 81 48 452K 1.34M
>>>>>>>>> 309G 54.2T 0 8.17K 0 258M
>>>>>>>>> 310G 54.2T 0 16.3K 0 510M
>>>>>>>>> 310G 54.2T 0 0 0 0
>>>>>>>>> 310G 54.2T 0 0 0 0
>>>>>>>>> 310G 54.2T 0 0 0 0
>>>>>>>>> 310G 54.2T 0 10.1K 0 320M
>>>>>>>>> 311G 54.2T 0 26.1K 0 820M
>>>>>>>>> 311G 54.2T 0 0 0 0
>>>>>>>>> 311G 54.2T 0 0 0 0
>>>>>>>>> 311G 54.2T 0 0 0 0
>>>>>>>>> 311G 54.2T 0 10.6K 0 333M
>>>>>>>>> 313G 54.2T 0 27.4K 0 860M
>>>>>>>>> 313G 54.2T 0 0 0 0
>>>>>>>>> 313G 54.2T 0 0 0 0
>>>>>>>>> 313G 54.2T 0 0 0 0
>>>>>>>>> 313G 54.2T 0 9.69K 0 305M
>>>>>>>>> 314G 54.2T 0 10.8K 0 337M
>>>>>>>>>
>>>>>>>> ...
>>>>>>>> Were it not for your complaints about link resets and "unusable"
>>>>>>>> connections, I'd say this looks like a normal behavior for async
>>>>>>>> writes: they get cached up, and every 5 sec you have a transaction
>>>>>>>> group (TXG) sync which flushes the writes from cache to disks.
>>>>>>>>
>>>>>>>> In fact, the picture still looks like that, and possibly is the
>>>>>>>> reason for hiccups.
>>>>>>>>
>>>>>>>> The TXG sync may be an IO intensive process, which may block or
>>>>>>>> delay many other system tasks; previously when the interval
>>>>>>>> defaulted to 30 sec we got unusable SSH connections and temporarily
>>>>>>>> "hung" disk requests on the storage server every half a minute when
>>>>>>>> it was really busy (i.e. initial filling up with data from older
>>>>>>>> boxes). It cached up about 10 seconds worth of writes, then spewed
>>>>>>>> them out and could do nothing else. I don't think I ever saw network
>>>>>>>> connections timing out or NICs reporting resets due to this, but I
>>>>>>>> wouldn't be surprised if this were the cause for your case, though
>>>>>>>> (i.e. disk IO threads preempting HBA/NIC threads for too long
>>>>>>>> somehow, making the driver very puzzled about staleness state of its card).
>>>>>>>>
>>>>>>>> At the very least, TXG syncs can be tuned by two knobs: the time
>>>>>>>> limit (5 sec default) and the size limit (when the cache is "this"
>>>>>>>> full, begin the sync to disk). The latter is a realistic figure that
>>>>>>>> can allow you to sync in shorter bursts - with less interruptions
>>>>>>>> to smooth IO and process work.
>>>>>>>>
>>>>>>>> A somewhat related tunable is the number of requests that ZFS would
>>>>>>>> queue up for a disk. Depending on its NCQ/TCQ abilities and random
>>>>>>>> IO abilities (HDD vs. SSD), long or short queues may be preferable.
>>>>>>>> See also: http://www.solarisinternals.**
>>>>>>>> com/wiki/index.php/ZFS_Evil_**Tuning_Guide#Device_I.2FO_**
>>>>>>>> Queue_Size_.28I.2FO_**Concurrency.29<http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29>
>>>>>>>>
>>>>>>>> These tunables can be set at runtime with "mdb -K", as well as in
>>>>>>>> the /etc/system file to survive reboots. One of our storage boxes
>>>>>>>> has these example values in /etc/system:
>>>>>>>>
>>>>>>>> *# default: flush txg every 5sec (may be max 30sec, optimize
>>>>>>>> *# for 5 sec writing)
>>>>>>>> set zfs:zfs_txg_synctime = 5
>>>>>>>>
>>>>>>>> *# Spool to disk when the ZFS cache is 0x18000000 (384Mb) full
>>>>>>>> set zfs:zfs_write_limit_override = 0x18000000
>>>>>>>> *# ...for realtime changes use mdb.
>>>>>>>> *# Example sets 0x18000000 (384Mb, 402653184 b):
>>>>>>>> *# echo zfs_write_limit_override/**W0t402653184 | mdb -kw
>>>>>>>>
>>>>>>>> *# ZFS queue depth per disk
>>>>>>>> set zfs:zfs_vdev_max_pending = 3
>>>>>>>>
>>>>>>>> HTH,
>>>>>>>> //Jim Klimov
>>>>>>>>
>>>>>>>>
>>>>>>>> ______________________________**_________________
>>>>>>>> OpenIndiana-discuss mailing list
>>>>>>>> OpenIndiana-discuss@**openindiana.org<OpenIndiana-discuss at openindiana.org>
>>>>>>>> http://openindiana.org/**mailman/listinfo/openindiana-**discuss<http://openindiana.org/mailman/listinfo/openindiana-discuss>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


More information about the OpenIndiana-discuss mailing list