[OpenIndiana-discuss] ZFS read speed(iSCSI)

Thu Jun 6 00:57:42 UTC 2013

> From: Christopher Chan [mailto:christopher.chan at bradbury.edu.hk]
> 
> I may be wrong but I don't think a single connection will be split over
> two interfaces in LACP to achieve higher throughput. If you have
> concurrent connections then perhaps you may see more throughput.

That is correct.  But there is a very good chance the iscsi initiator is intelligent enough to launch multiple threads.  However, the OP says Hyper-V system shows never more than 54% utilization, which tends to support this theory.  The MS system might be single-threaded, and therefore limited to 1Gbit on the 2x bonded LACP.

> zfs has to seek all over the place to find the snapshots which slows
> things down.

That is also correct, but only if the pool is heavily fragmented.  This will occur if you've had systems in production for a long time, creating, modifying, and deleting snapshots.  I may be wrong, but I assume the OP is deploying a new setup, which probably hasn't seen such dramatic use of snapshot fragmentation over time.

> From: Heinrich van Riel [mailto:heinrich.vanriel at gmail.com]
> 
> I have 2 x rz2 of 10x 3TB NL-SAS each in the pool.

Do you have any log devices?  I presume you have the default sync=enabled.

> When I copy to an iSCSI disk from a local disk, it copies at around 200MB/s
> and thats fine. 

I question this result.  Although 200MB/s (1.6Gbit/sec) is a reasonable rate (slightly slow) across 2Gbit network, you're writing to iscsi without any log device, and you should be hammered by sync operations.  I have a feeling you're seeing data written into memory cache, and not actually flushed to disk at this rate.

To really find out the actual throughput to disk, run "zpool iostat" on the server.  I would say "zpool iostat 30"

> When I copy from the iSCSI disk to the local disk I get no
> more that 80-90MB/s 

You have raidz3 on the server.  If you're doing sequential operations (read or write) then it should perform very well.  But if you're doing random operations, it should perform no better than a single disk.  (Probably half as good as a single disk.)  

If your system has been in production a while, getting fragmented, then you probably see a lot of random IO, as Christopher suggested.

If you're successfully using your LACP bonding with multiple threads, guess what, multiple threads implies random IO.  It's lose-lose.

How are you measuring your throughput?  If you want to see good throughput, make sure you're serially reading/writing sequential data.

> Even when I do a zfs send/recv is seems that reads are slower. I assume
> this is the expected behavior.

Now this I have to know more about.  Because you're obviously not doing zfs send/recv on Hyper-V.  Do you have another OI system connected over iscsi?

Also, the instantaneous performance of zfs send/recv varies quite a lot.  You have to watch it for several minutes, and see how high it peaks, and how low it bottoms out.

>            * With this type of workload load would there be a noticeable
> improvement in by adding a cache disk?

Adding cache helps improve read performance.  Generally, the more cache, the better.  (But cache does leave a RAM footprint, so you want to have lots of RAM too).

Adding log helps improve sync writes.  If you're using iscsi, then basically all writes are sync writes.  It really doesn't matter how much log you add; the smallest size possible would be plenty.  (Even 4G.)  The only thing that matters is super fast IO, and obviously, nonvolatile.

If you want to temporarily measure the benefits of adding log, you can "zfs set sync=disabled" and see how fast it goes.  This will give you an upper bound.  But for a VM environment over iscsi, it's not safe to leave it that way.

>            * System has 2x 6core E5 2.4, 64GB mem; would compression help?

For most work loads, compression improves performance, but it depends on your work load.  Lots of uncompressible data is, of course, uncompressible.

>            * Would it make more sense to create the pool with mirror sets?

A bunch of mirrors can do much faster random IO than a raidz.  But a raidz can do the same sustained serial speed for lower cost.  All of this is only relevant if your bottleneck is the actual storage, and not the network.