[OpenIndiana-discuss] ZFS read speed(iSCSI)

Thu Jun 6 00:34:35 UTC 2013

On 2013-06-06 00:52, Heinrich van Riel wrote:
> Any pointers around iSCSI performance focused on read speed? Did not find
> much.
>
> I have 2 x rz2 of 10x 3TB NL-SAS each in the pool. The OI server has 4
> interfaces configured to the switch in LACP, mtu=9000. The switch (jumbo
> enabled) shows all interfaces are active in the port channel. How can I can
> verify it on the OI side? dladm shows that it is active mode

Possibly, "netstat -s" for per-interface statistics. I think there were
iostat-like tools for network, and the closest one is netstat again:
"netstat -i 1" or "netstat -i -I e1000g0 1"

> The Hyper-v systems has 2 interfaces in LACP and all show as active and
> windows indicate 2Gbps, never go over 54% util.

To me this indicates that only one (gigabit) link is used by one of the
sides, maybe both. This is not very unexpected, because by default the
link aggregation gives "statistically" big bandwidth, if many hosts are
connected to many, with each connection only using a certain path.
Usually the "connection" is a couple of host NICs (i.e. take their MACs,
add them up as numbers, divide by amount of active local NICs in the
aggregation, use the remainder as NIC number for the connection - or
anything like that).

Similar algorithmic limitations may be imposed by both OSes and by
your switches. I believe, there were recently projects for illumos
to extend the number of factors which influence the path-picking,
such as involve the IP address and port numbers to give a more even
distribution with just a couple of aggregated hosts. I am not sure
if MS Windows has similar tunables to cause it to use several NICs
when talking to one server as well.

Unfortunately, this is where my erudition stops today - I can't
tell how to check or tune this even on illumos. Hopefully others
would chime in :)

> When I copy to an iSCSI disk from a local disk, it copies at around 200MB/s
> and thats fine.

Likely, this goes into cache. 1Gbit would flat out at 125Mbyte/s
however, which may hint that either the hosts do manage to use
several links, or compression comes into play somewhere early.
Possibly, this goes into the iSCSI client's cache first - thus
is fast and limited by local disk read speed. Such tests should
be done with preferably uncompressible data samples (movies, etc)
much larger than local RAM; look at speeds after the RAM cache
is expected to be exhausted and only disk/net speeds remain -
this would show your limits. With ZFS recommendations, there is
a lot of RAM on the storage server, so much that your working
sets are ideally not limited nor served by HDDs ;)

 > When I copy from the iSCSI disk to the local disk I get no
> more that 80-90MB/s and that is after messing around with the TCP/IP
> setting on Windows. Before the changes it was 47MB/s max. (copy from the
> local disk to the local disk I get 107MB/s so that is not the issue)
> VMware 5.0 will not get more than that either.

This does seem like you're hitting the 1Gbit ceiling (including
the protocol overheads, etc) while reading from the storage server,
especially if you repeat the same requests and these are reads
from cache. Then it is the one-link case, IMHO.

>
> Even when I do a zfs send/recv is seems that reads are slower. I assume
> this is the expected behavior.
>
> It will run only VMs for lab and I have the follwoing questions:
>
>             * With this type of workload load would there be a noticeable
> improvement in by adding a cache disk?
>                Looking at the OCZ Talos 2 SAS. Any feedback would be
> appreciated before spending the $900 on the disk.

There are scripts to help monitoring your read and write cache usage,
and ZIL activity (sync writes mostly), so that you can estimate on an
SSD-less system whether there would be much work for an L2ARC or a
SLOG device. Again, either search the archives or wait for "those in
the know" to answer more specifically :)

Depending on your actual working set size (the amount of data read
and written regularly enough for caching in RAM or SSD to make a
difference) and workload intensity, you additional caching might be
good; whether it is worth its price - that's for you to decide ;)

>             * System has 2x 6core E5 2.4, 64GB mem; would compression help?

Likely yes, at least lightweight compression - many deployments have
it always on. There are but a few cases where it might hurt ;)

Also, for blocked data such as disk images, you might want to play
with blocksizes; it was said on these lists that generally 32-64kb
ZFS blocks are more appropriate for HDD images and databases - not
the application's block size (4-16Kb) because there's also some
considerable overhead of ZFS metadata to account for more small
blocks.

>             * Would it make more sense to create the pool with mirror sets?
> ( Wanted to use the extra space for backups)

For speed - likely yes. For reliability - theories vary; 2-way mirrors
with modern huge disks may be risky (rebuilds are too long while you
have no redundancy) and 3-way/4-way mirrors are somewhat expensive,
and obviously serve up less space with the same amount of disks.

If you have just a few VMs and a moderate IO workload, it is possible
that the RAM or SSD based caches will suffice; HDDs are assumed to be
the slow tier whichever way you organize them.

HTH,
//Jim Klimov