Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Aug 2013 12:21:09 +0200
From:      Luigi Rizzo <rizzo@iet.unipi.it>
To:        Lawrence Stewart <lstewart@freebsd.org>
Cc:        FreeBSD Net <net@freebsd.org>
Subject:   it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
Message-ID:  <20130814102109.GA63246@onelab2.iet.unipi.it>
In-Reply-To: <520B3056.1000804@freebsd.org>
References:  <520A6D07.5080106@freebsd.org> <520AFBE8.1090109@freebsd.org> <520B24A0.4000706@freebsd.org> <520B3056.1000804@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 14, 2013 at 05:23:02PM +1000, Lawrence Stewart wrote:
> On 08/14/13 16:33, Julian Elischer wrote:
> > On 8/14/13 11:39 AM, Lawrence Stewart wrote:
> >> On 08/14/13 03:29, Julian Elischer wrote:
> >>> I have been tracking down a performance embarrassment on AMAZON EC2 and
> >>> have found it I think.
> >> Let us please avoid conflating performance with throughput. The
> >> behaviour you go on to describe as a performance embarrassment is
> >> actually a throughput difference, and the FreeBSD behaviour you're
> >> describing is essentially sacrificing throughput and CPU cycles for
> >> lower latency. That may not be a trade-off you like, but it is an
> >> important factor in this discussion.
...
> Sure, there's nothing wrong with holding throughput up as a key
> performance metric for your use case.
> 
> I'm just trying to pre-empt a discussion that focuses on one metric and
> fails to consider the bigger picture.
...
> > I could see no latency reversion.
> 
> You wouldn't because it would be practically invisible in the sorts of
> tests/measurements you're doing. Our good friends over at HRT on the
> other hand would be far more likely to care about latency on the order
> of microseconds. Again, the use case matters a lot.
...
> > so, does "Software LRO" mean that LRO on hte NIC should be ON or OFF to
> > see this?
> 
> I think (check the driver code in question as I'm not sure) that if you
> "ifconfig <if> lro" and the driver has hardware support or has been made
> aware of our software implementation, it should DTRT.

The "lower throughput than linux" that julian was seeing is either
because of a slow (CPU-bound) sender or slow receiver. Given that
the FreeBSD tx path is quite expensive (redoing route and arp lookups
on every packet, etc.) I highly suspect the sender side is at fault.

Ack coalescing, LRO, GRO are limited to the set of packets that you
receive in the same batch, which in turn is upper bounded by the
interrupt moderation delay. Apart from simple benchmarks with only
a few flows, it is very hard that ack/lro/gro can coalesce more
than a few segments for the same flow.

	But the real fix is in tcp_output.

In fact, it has never been the case that an ack (single or coalesced)
triggers an immediate transmission in the output path.  We had this
in the past (Silly Window Syndrome) and there is code that avoids
sending less than 1-mtu under appropriate conditions (there is more
data to push out anyways, no NODELAY, there are outstanding acks,
the window can open further).  In all these cases there is no
reasonable way to experience the difference in terms of latency.

If one really cares, e.g. the High Speed Trading example, this is
a non issue because any reasonable person would run with TCP_NODELAY
(and possibly disable interrupt moderation), and optimize for latency
even on a per flow basis.

In terms of coding effort, i suspect that by replacing the 1-mtu
limit (t_maxseg i believe is the variable that we use in the SWS
avoidance code) with 1-max-tso-segment we can probably achieve good
results with little programming effort.

Then the problem remains that we should keep a copy of route and
arp information in the socket instead of redoing the lookups on
every single transmission, as they consume some 25% of the time of
a sendto(), and probably even more when it comes to large tcp
segments, sendfile() and the like.

	cheers
	luigi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130814102109.GA63246>