Date: Fri, 16 Aug 2013 16:54:29 +0800 From: Julian Elischer <julian@freebsd.org> To: Luigi Rizzo <rizzo@iet.unipi.it> Cc: Lawrence Stewart <lstewart@freebsd.org>, FreeBSD Net <net@freebsd.org> Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) Message-ID: <520DE8C5.8070508@freebsd.org> In-Reply-To: <20130814102109.GA63246@onelab2.iet.unipi.it> References: <520A6D07.5080106@freebsd.org> <520AFBE8.1090109@freebsd.org> <520B24A0.4000706@freebsd.org> <520B3056.1000804@freebsd.org> <20130814102109.GA63246@onelab2.iet.unipi.it>
next in thread | previous in thread | raw e-mail | index | archive | help
On 8/14/13 6:21 PM, Luigi Rizzo wrote: > On Wed, Aug 14, 2013 at 05:23:02PM +1000, Lawrence Stewart wrote: >> On 08/14/13 16:33, Julian Elischer wrote: >>> On 8/14/13 11:39 AM, Lawrence Stewart wrote: >>>> On 08/14/13 03:29, Julian Elischer wrote: >>>>> I have been tracking down a performance embarrassment on AMAZON EC2 and >>>>> have found it I think. >>>> Let us please avoid conflating performance with throughput. The >>>> behaviour you go on to describe as a performance embarrassment is >>>> actually a throughput difference, and the FreeBSD behaviour you're >>>> describing is essentially sacrificing throughput and CPU cycles for >>>> lower latency. That may not be a trade-off you like, but it is an >>>> important factor in this discussion. > ... >> Sure, there's nothing wrong with holding throughput up as a key >> performance metric for your use case. >> >> I'm just trying to pre-empt a discussion that focuses on one metric and >> fails to consider the bigger picture. > ... >>> I could see no latency reversion. >> You wouldn't because it would be practically invisible in the sorts of >> tests/measurements you're doing. Our good friends over at HRT on the >> other hand would be far more likely to care about latency on the order >> of microseconds. Again, the use case matters a lot. > ... >>> so, does "Software LRO" mean that LRO on hte NIC should be ON or OFF to >>> see this? >> I think (check the driver code in question as I'm not sure) that if you >> "ifconfig <if> lro" and the driver has hardware support or has been made >> aware of our software implementation, it should DTRT. > The "lower throughput than linux" that julian was seeing is either > because of a slow (CPU-bound) sender or slow receiver. Given that > the FreeBSD tx path is quite expensive (redoing route and arp lookups > on every packet, etc.) I highly suspect the sender side is at fault. if we send bigger packets then we do less lookups do we not? > > Ack coalescing, LRO, GRO are limited to the set of packets that you > receive in the same batch, which in turn is upper bounded by the > interrupt moderation delay. Apart from simple benchmarks with only > a few flows, it is very hard that ack/lro/gro can coalesce more > than a few segments for the same flow. > > But the real fix is in tcp_output. > > In fact, it has never been the case that an ack (single or coalesced) > triggers an immediate transmission in the output path. We had this > in the past (Silly Window Syndrome) and there is code that avoids > sending less than 1-mtu under appropriate conditions (there is more > data to push out anyways, no NODELAY, there are outstanding acks, > the window can open further). In all these cases there is no > reasonable way to experience the difference in terms of latency. > > If one really cares, e.g. the High Speed Trading example, this is > a non issue because any reasonable person would run with TCP_NODELAY > (and possibly disable interrupt moderation), and optimize for latency > even on a per flow basis. > > In terms of coding effort, i suspect that by replacing the 1-mtu > limit (t_maxseg i believe is the variable that we use in the SWS > avoidance code) with 1-max-tso-segment we can probably achieve good > results with little programming effort. > > Then the problem remains that we should keep a copy of route and > arp information in the socket instead of redoing the lookups on > every single transmission, as they consume some 25% of the time of > a sendto(), and probably even more when it comes to large tcp > segments, sendfile() and the like. > > cheers > luigi > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?520DE8C5.8070508>