[OpenIndiana-discuss] ZFS read speed(iSCSI)

Heinrich van Riel heinrich.vanriel at gmail.com
Fri Jun 7 20:40:57 UTC 2013


I changed the settings. I do see it writing all the time now, but the link
still dies after a a few min

Jun  7 16:30:57  emlxs: [ID 349649 kern.info] [ 5.0608]emlxs1: NOTICE: 730:
Link reset. (Disabling link...)
Jun  7 16:30:57 emlxs: [ID 349649 kern.info] [ 5.0333]emlxs1: NOTICE: 710:
Link down.
Jun  7 16:33:16 emlxs: [ID 349649 kern.info] [ 5.055D]emlxs1: NOTICE: 720:
Link up. (4Gb, fabric, target)
Jun  7 16:33:16 fct: [ID 132490 kern.notice] NOTICE: emlxs1 LINK UP, portid
22000, topology Fabric Pt-to-Pt,speed 4G




On Fri, Jun 7, 2013 at 3:06 PM, Jim Klimov <jimklimov at cos.ru> wrote:

> Comment below
>
>
> On 2013-06-07 20:42, Heinrich van Riel wrote:
>
>> One sec apart cloning 150GB vm from a datastore on EMC to OI.
>>
>> alloc free read write read write
>> ----- ----- ----- ----- ----- -----
>> 309G 54.2T 81 48 452K 1.34M
>> 309G 54.2T 0 8.17K 0 258M
>> 310G 54.2T 0 16.3K 0 510M
>> 310G 54.2T 0 0 0 0
>> 310G 54.2T 0 0 0 0
>> 310G 54.2T 0 0 0 0
>> 310G 54.2T 0 10.1K 0 320M
>> 311G 54.2T 0 26.1K 0 820M
>> 311G 54.2T 0 0 0 0
>> 311G 54.2T 0 0 0 0
>> 311G 54.2T 0 0 0 0
>> 311G 54.2T 0 10.6K 0 333M
>> 313G 54.2T 0 27.4K 0 860M
>> 313G 54.2T 0 0 0 0
>> 313G 54.2T 0 0 0 0
>> 313G 54.2T 0 0 0 0
>> 313G 54.2T 0 9.69K 0 305M
>> 314G 54.2T 0 10.8K 0 337M
>>
> ...
> Were it not for your complaints about link resets and "unusable"
> connections, I'd say this looks like a normal behavior for async
> writes: they get cached up, and every 5 sec you have a transaction
> group (TXG) sync which flushes the writes from cache to disks.
>
> In fact, the picture still looks like that, and possibly is the
> reason for hiccups.
>
> The TXG sync may be an IO intensive process, which may block or
> delay many other system tasks; previously when the interval
> defaulted to 30 sec we got unusable SSH connections and temporarily
> "hung" disk requests on the storage server every half a minute when
> it was really busy (i.e. initial filling up with data from older
> boxes). It cached up about 10 seconds worth of writes, then spewed
> them out and could do nothing else. I don't think I ever saw network
> connections timing out or NICs reporting resets due to this, but I
> wouldn't be surprised if this were the cause for your case, though
> (i.e. disk IO threads preempting HBA/NIC threads for too long somehow,
> making the driver very puzzled about staleness state of its card).
>
> At the very least, TXG syncs can be tuned by two knobs: the time
> limit (5 sec default) and the size limit (when the cache is "this"
> full, begin the sync to disk). The latter is a realistic figure that
> can allow you to sync in shorter bursts - with less interruptions
> to smooth IO and process work.
>
> A somewhat related tunable is the number of requests that ZFS would
> queue up for a disk. Depending on its NCQ/TCQ abilities and random
> IO abilities (HDD vs. SSD), long or short queues may be preferable.
> See also: http://www.solarisinternals.**com/wiki/index.php/ZFS_Evil_**
> Tuning_Guide#Device_I.2FO_**Queue_Size_.28I.2FO_**Concurrency.29<http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29>
>
> These tunables can be set at runtime with "mdb -K", as well as in
> the /etc/system file to survive reboots. One of our storage boxes
> has these example values in /etc/system:
>
> *# default: flush txg every 5sec (may be max 30sec, optimize
> *# for 5 sec writing)
> set zfs:zfs_txg_synctime = 5
>
> *# Spool to disk when the ZFS cache is 0x18000000 (384Mb) full
> set zfs:zfs_write_limit_override = 0x18000000
> *# ...for realtime changes use mdb.
> *# Example sets 0x18000000 (384Mb, 402653184 b):
> *# echo zfs_write_limit_override/**W0t402653184 | mdb -kw
>
> *# ZFS queue depth per disk
> set zfs:zfs_vdev_max_pending = 3
>
> HTH,
> //Jim Klimov
>
>
> ______________________________**_________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss@**openindiana.org<OpenIndiana-discuss at openindiana.org>
> http://openindiana.org/**mailman/listinfo/openindiana-**discuss<http://openindiana.org/mailman/listinfo/openindiana-discuss>
>


More information about the OpenIndiana-discuss mailing list