[OpenIndiana-discuss] Recommendations for fast storage (OpenIndiana-discuss Digest, Vol 33, Issue 20)

Edward Ned Harvey (openindiana) openindiana at nedharvey.com
Tue Apr 16 14:49:26 UTC 2013


> From: Bob Friesenhahn [mailto:bfriesen at simple.dallas.tx.us]
> 
> It would be difficult to believe that 10Gbit Ethernet offers better
> bandwidth than 56Gbit Infiniband (the current offering).  The swiching
> model is quite similar.  The main reason why IB offers better latency
> is a better HBA hardware interface and a specialized stack.  5X is 5X.

Put another way, the reason infiniband is so much higher throughput and lower latency than ethernet is because the switching (at the physical layer) is completely different from ethernet, and messages are passed directly from user-level to user-level on remote system ram via RDMA, bypassing the OSI layer model and other kernel overhead.  I read a paper from vmware, where they implemented RDMA over ethernet and doubled the speed of vmotion (but still not as fast as infiniband, by like 4x.)

Beside the bypassing of OSI layers and kernel latency, IB latency is lower because Ethernet switches use store-and-forward buffering managed by the backplane in the switch, in which a sender sends a packet to a buffer on the switch, which then pushes it through the backplane, and finally to another buffer on the destination.  IB uses cross-bar, or cut-through switching, in which the sending host channel adapter signals the destination address to the switch, then waits for the channel to be opened.  Once the channel is opened, it stays open, and the switch in between is nothing but signal amplification (as well as additional virtual lanes for congestion management, and other functions).  The sender writes directly to RAM on the destination via RDMA, no buffering in between.  Bypassing the OSI layer model.  Hence much lower latency.

IB also has native link aggregation into data-striped lanes, hence the 1x, 2x, 4x, 16x designations, and the 40Gbit specifications.  Something which is quasi-possible in ethernet via LACP, but not as good and not the same.  IB guarantees packets delivered in the right order, with native congestion control as compared to ethernet which may drop packets and TCP must detect and retransmit...  

Ethernet includes a lot of support for IP addressing, and variable link speeds (some 10Gbit, 10/100, 1G etc) and all of this asynchronous.  For these reasons, IB is not a suitable replacement for IP communications done on ethernet, with a lot of variable peer-to-peer and broadcast traffic.  IB is designed for networks where systems want to establish connections to other systems, and those connections remain mostly statically connected.  Primarily clustering & storage networks.  Not primarily TCP/IP.




More information about the OpenIndiana-discuss mailing list