[OpenIndiana-discuss] ZFS read speed(iSCSI)

Heinrich van Riel heinrich.vanriel at gmail.com
Fri Jun 7 20:54:28 UTC 2013


I did find this in my inbox from 2009, I have been using FC with ZFS for
quite sometime and only recently retired an install with OI a5 that was
upgraded from opensolaris. It did not do real heavy duty stuff, but I had a
similar problem where we were stuck on build 99 for quite some time.

To  Jean-Yves Chevallier at emulex
Any comments on the future of Emulex with regards to the COMSTAR project?
It seems I am not the only one that have problems using Emulex in later
builds. For now I am stuck with build 99.
As always any feedback would be greatly appreciated since we have to make a
decision of sticking with Opensolaris & COMSTAR or start migrating to
another solution since we cannot stay on build 99 forever.
What I am really trying to find out is if there is a roadmap/decision to
ultimately only support Qlogic HBA’s in target mode.

Response:


Sorry for the delay in answering you. I do have news for you.
First off, the interface used by COMSTAR has changed in recent Nevada
releases (NV120 and up I believe). Since it is not a public interface we
had no prior indication on this.
We know of a number of issues, some on our driver, some on the COMSTAR
stack. Based on the information we have from you and other community
members, we have addressed all these issues in our next driver version – we
will know for sure after we run our DVT (driver verification testing) next
week. Depending on progress, this driver will be part of NV 128 or else NV
130.
I believe it is worth taking another look based on these upcoming builds,
which I imagine might also include fixes to the rest of the COMSTAR stack.

Best regards.


I can confirm that this was fixed in 128 and all I did was update from 99
to 128 and there were no problems.
Seem like the same problem has now returned and emulex does not appear to
be a good fit since sun mostly used qlogic.

guess it is back to iscsi only for now.



On Fri, Jun 7, 2013 at 4:40 PM, Heinrich van Riel <
heinrich.vanriel at gmail.com> wrote:

> I changed the settings. I do see it writing all the time now, but the link
> still dies after a a few min
>
> Jun  7 16:30:57  emlxs: [ID 349649 kern.info] [ 5.0608]emlxs1: NOTICE:
> 730: Link reset. (Disabling link...)
> Jun  7 16:30:57 emlxs: [ID 349649 kern.info] [ 5.0333]emlxs1: NOTICE:
> 710: Link down.
> Jun  7 16:33:16 emlxs: [ID 349649 kern.info] [ 5.055D]emlxs1: NOTICE:
> 720: Link up. (4Gb, fabric, target)
> Jun  7 16:33:16 fct: [ID 132490 kern.notice] NOTICE: emlxs1 LINK UP,
> portid 22000, topology Fabric Pt-to-Pt,speed 4G
>
>
>
>
> On Fri, Jun 7, 2013 at 3:06 PM, Jim Klimov <jimklimov at cos.ru> wrote:
>
>> Comment below
>>
>>
>> On 2013-06-07 20:42, Heinrich van Riel wrote:
>>
>>> One sec apart cloning 150GB vm from a datastore on EMC to OI.
>>>
>>> alloc free read write read write
>>> ----- ----- ----- ----- ----- -----
>>> 309G 54.2T 81 48 452K 1.34M
>>> 309G 54.2T 0 8.17K 0 258M
>>> 310G 54.2T 0 16.3K 0 510M
>>> 310G 54.2T 0 0 0 0
>>> 310G 54.2T 0 0 0 0
>>> 310G 54.2T 0 0 0 0
>>> 310G 54.2T 0 10.1K 0 320M
>>> 311G 54.2T 0 26.1K 0 820M
>>> 311G 54.2T 0 0 0 0
>>> 311G 54.2T 0 0 0 0
>>> 311G 54.2T 0 0 0 0
>>> 311G 54.2T 0 10.6K 0 333M
>>> 313G 54.2T 0 27.4K 0 860M
>>> 313G 54.2T 0 0 0 0
>>> 313G 54.2T 0 0 0 0
>>> 313G 54.2T 0 0 0 0
>>> 313G 54.2T 0 9.69K 0 305M
>>> 314G 54.2T 0 10.8K 0 337M
>>>
>> ...
>> Were it not for your complaints about link resets and "unusable"
>> connections, I'd say this looks like a normal behavior for async
>> writes: they get cached up, and every 5 sec you have a transaction
>> group (TXG) sync which flushes the writes from cache to disks.
>>
>> In fact, the picture still looks like that, and possibly is the
>> reason for hiccups.
>>
>> The TXG sync may be an IO intensive process, which may block or
>> delay many other system tasks; previously when the interval
>> defaulted to 30 sec we got unusable SSH connections and temporarily
>> "hung" disk requests on the storage server every half a minute when
>> it was really busy (i.e. initial filling up with data from older
>> boxes). It cached up about 10 seconds worth of writes, then spewed
>> them out and could do nothing else. I don't think I ever saw network
>> connections timing out or NICs reporting resets due to this, but I
>> wouldn't be surprised if this were the cause for your case, though
>> (i.e. disk IO threads preempting HBA/NIC threads for too long somehow,
>> making the driver very puzzled about staleness state of its card).
>>
>> At the very least, TXG syncs can be tuned by two knobs: the time
>> limit (5 sec default) and the size limit (when the cache is "this"
>> full, begin the sync to disk). The latter is a realistic figure that
>> can allow you to sync in shorter bursts - with less interruptions
>> to smooth IO and process work.
>>
>> A somewhat related tunable is the number of requests that ZFS would
>> queue up for a disk. Depending on its NCQ/TCQ abilities and random
>> IO abilities (HDD vs. SSD), long or short queues may be preferable.
>> See also: http://www.solarisinternals.**com/wiki/index.php/ZFS_Evil_**
>> Tuning_Guide#Device_I.2FO_**Queue_Size_.28I.2FO_**Concurrency.29<http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29>
>>
>> These tunables can be set at runtime with "mdb -K", as well as in
>> the /etc/system file to survive reboots. One of our storage boxes
>> has these example values in /etc/system:
>>
>> *# default: flush txg every 5sec (may be max 30sec, optimize
>> *# for 5 sec writing)
>> set zfs:zfs_txg_synctime = 5
>>
>> *# Spool to disk when the ZFS cache is 0x18000000 (384Mb) full
>> set zfs:zfs_write_limit_override = 0x18000000
>> *# ...for realtime changes use mdb.
>> *# Example sets 0x18000000 (384Mb, 402653184 b):
>> *# echo zfs_write_limit_override/**W0t402653184 | mdb -kw
>>
>> *# ZFS queue depth per disk
>> set zfs:zfs_vdev_max_pending = 3
>>
>> HTH,
>> //Jim Klimov
>>
>>
>> ______________________________**_________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss@**openindiana.org<OpenIndiana-discuss at openindiana.org>
>> http://openindiana.org/**mailman/listinfo/openindiana-**discuss<http://openindiana.org/mailman/listinfo/openindiana-discuss>
>>
>
>


More information about the OpenIndiana-discuss mailing list