[OpenIndiana-discuss] Aggregation performance measurement

James Carlson carlsonj at workingcode.com
Mon Oct 17 15:39:12 UTC 2011


Gabriele Bulfon wrote:
> Hi,
> I recently could join an old Sun Sparc machine to an OpenIndiana storage, by adding a couple of
> ethernets to the Sparc machine running Solaris 10/08.
> The Sparc machine is now connected to the storage switch, LACP capable.
> I understand that dladm on Solaris 10 is still different from the OpenIndiana version, so maybe
> the problem I see comes from here.
> I created two trunks of two ports each on the switch, connected the Sparc on one trunk, the storage
> on the other trunk, aggregated cards on both machines through dladm, configured aggr1 on both
> machines, verified that machines can see each other.
> All four links are up and quiet on the switch, nothing was moving.
> So I used "netio" to check network throughput, resulting around 90MB/sec: strange, this is
> less than 1Gb/sec! Is aggregation workgin?

There's no way to know based on that.

Since performance sounds exceptionally low, I would look for low-level
errors first.  A fairly common (and unfortunate) cause of low
performance like this occurs when a system administrator attempts to
"force" Ethernet duplex without fully understanding what "forced" mode
means.  (When duplex is "forced" on one side of a link, this disables
normal autonegotiation, and that causes the other side to drop into the
minimum supported configuration; typically half-duplex.  If you feel
compelled to force Ethernet parameters, you must do it on both sides of
the link.)

Another common cause is plain old bad cables.  Swapping out cables is a
good thing to try.

Look for errors on the link.  Look at kstats.  Try running without
trunking first to see if you can get decent single-link performance
before dealing with the complexity of trunking.  There's no reason to
reach for a complicated solution if the simple things aren't working right.

> So, I started transfering data from the Sparc to the storage via NFS, and looking at the switch,
> I could see one machine working on both ports of his trunk, while the other one was almost
> working on just one port......that is something more than 1Gb/sec!....why??

That's not too surprising, depending on how the trunks are configured
and what sort of traffic is passing over the links.

Standards-compliant Ethernet trunking requires the peers to distribute
packets among the links in a way that precludes accidental packet
reordering.  802 just doesn't allow reordering.  Generally, this means
that the peers must hash based on some kind of flow-identifying
information.  At a minimum, this is usually the MAC addresses.  But it
can include any other data the peers like, as long as reordering within
a flow is precluded.

The more information used, and the more individual flows present, the
better statistical balance you get.  But there are no guarantees in
life.  Sometimes everything hashes to one link, even if you try hard.

Look at the configuration again.  How are the trunks set up?  What sort
of flow identification is used?  It is the sender (not recipient) who is
responsible for picking the output port used for any given packet, so
look at the sending side of the "unbalanced" traffic.

Look at the traffic itself.  Is there more than one flow in that
direction?  If no, then seeing it all on one link is perfectly normal
and expected.  Trunking doesn't just add the two bandwidths together!

> So I tried removing the "sleeping" cable, and measured the same throughput....
> Strangely, I tried removing the "working" cable, and insterted the "sleeping" one, and still
> everything was working on one port, the "sleeping" one became operational.
> Added again the second one, and that was become the "sleeping" one....
> Looks like aggregation is not using all the available throughput......what am I missing? is there
> any incompatibility between Solaris10/08 and OpenIndiana?

I don't expect that there should be any.  More likely, I expect a
misconfiguration somewhere in this set-up.

-- 
James Carlson         42.703N 71.076W         <carlsonj at workingcode.com>



More information about the OpenIndiana-discuss mailing list