From owner-freebsd-net@FreeBSD.ORG Tue Sep 3 00:05:12 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 854D3D21; Tue, 3 Sep 2013 00:05:12 +0000 (UTC) (envelope-from lists@rewt.org.uk) Received: from hosted.mx.as41113.net (abby.lhr1.as41113.net [91.208.177.20]) by mx1.freebsd.org (Postfix) with ESMTP id EA32424FA; Tue, 3 Sep 2013 00:05:10 +0000 (UTC) Received: from jwhlaptop (unknown [91.208.177.70]) (using TLSv1.2 with cipher AES128-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: lists@rewt.org.uk) by hosted.mx.as41113.net (Postfix) with ESMTPSA id 3cTT1m17NYz1Mh; Tue, 3 Sep 2013 01:05:00 +0100 (BST) From: "Joe Holden" To: "'Barney Cordoba'" , "'Adrian Chadd'" References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> <1377952913.44129.YahooMailNeo@web121605.mail.ne1.yahoo.com> <1378001733.36695.YahooMailNeo@web121606.mail.ne1.yahoo.com> <1378050319.62710.YahooMailNeo@web121601.mail.ne1.yahoo.com> <1378126037.56348.YahooMailNeo@web121603.mail.ne1.yahoo.com> In-Reply-To: <1378126037.56348.YahooMailNeo@web121603.mail.ne1.yahoo.com> Subject: RE: Flow ID, LACP, and igb Date: Tue, 3 Sep 2013 01:04:53 +0100 Message-ID: <25fd01cea839$39cbc1a0$ad6344e0$@rewt.org.uk> X-Mailer: Microsoft Outlook 15.0 Thread-Index: AQFi1TQRBLb279T2Rv2tkFTwf9fLJgJpf8UqAiRrO1IBHW2yLwKNzOQLANQ6q8kCYv/xbwH3hdr4AlvyGOUBIldnpwJhCGKaAkN+YYQC66diKQEG3zmDmb7bh6A= Content-Language: en-gb Cc: 'Andre Oppermann' , 'Alan Somers' , net@freebsd.org, 'Jack F Vogel' , "'Justin T. Gibbs'" , 'Luigi Rizzo' , "'T.C. Gubatayao'" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 00:05:12 -0000 Your argument is horseshit on the basis that many x86 and non-x86 (especially mips) usable NICs will happily do linerate (I see you don't understand how network interfaces actually work... that is pps and frame sizes are relevant not throughput) on stock FreeBSD without any tuning whatsoever. Also: a modern Realtek will do higher pps before becoming useless than a 2 or 3 generation old 1000 G/CT. This is *with* PCI-X at 133mhz and 64bit as well as PCIe gen2. You should also consider the people buying interfaces from people like Chelsio (who support FreeBSD rather well considering their customer base includes basically 0 FreeBSD users) who sell 20/80G PCIe interface cards. In reality CPU load is entirely irrelevant since 10G won't bother a decent CPU even with the glaring inefficiencies of the FreeBSD stack - as long as it isn't live locked who cares? Ultimately there are very few driver problems and some quite serious stack design problems which driver behaviour exacerbates. > -----Original Message----- > From: owner-freebsd-net@freebsd.org [mailto:owner-freebsd- > net@freebsd.org] On Behalf Of Barney Cordoba > Sent: 02 September 2013 13:47 > To: Adrian Chadd > Cc: Andre Oppermann; Alan Somers; net@freebsd.org; Jack F Vogel; Justin T. > Gibbs; Luigi Rizzo; T.C. Gubatayao > Subject: Re: Flow ID, LACP, and igb > > Are you using a pcie3 bus? Of course this is only an issue for 10g; what pct of > FreeBSD users have a load over 9.5Gb/s? It's completely unnecessary for igb > or em driver, so why is it used? because it's there. > > Here's my argument against it. The handful of brains capable of doing driver > development become consumed with BS like LRO and the things that need > to be fixed, like buffer management and basic driver design flaws, never get > fixed. The offload code makes the driver code a virtual mess that can only be > maintained by Jack and > 1 other guy in the entire world. And it takes 10 times longer to make a simple > change or to add support for a new NIC. > > In a week I ripped out the offload crap and the 9000 sysctls, eliminated the > "consumer buffer" problem, reduced locking by 40% and now the igb driver > uses 20% less cpu with a full gig load. > > And the code is cleaner and more easily maintained. > > BC > > > ________________________________ > From: Adrian Chadd > To: Barney Cordoba > Cc: Andre Oppermann ; Alan Somers > ; "net@freebsd.org" ; Jack F > Vogel ; Justin T. Gibbs ; Luigi Rizzo > ; T.C. Gubatayao > Sent: Sunday, September 1, 2013 4:51 PM > Subject: Re: Flow ID, LACP, and igb > > > Yo, > > LRO is an interesting hack that seems to do a good trick of hiding the > ridiculous locking and unfriendly cache behaviour that we do per-packet. > > It helps with LAN test traffic where things are going out in batches from the > TCP layer so the RX layer "sees" these frames in-order and can do LRO. > When you disable it, I don't easily get 10GE LAN TCP performance. That has > to be fixed. Given how fast the CPU cores, bus interconnect and memory > interconnects are, I don't think there should be any reason why we can't hit > 10GE traffic on a LAN with LRO disabled (in both software and hardware.) > > Now that I have the PMC sandy bridge stuff working right (but no PEBS, I > have to talk to Intel about that in a bit more detail before I think about > hacking that in) we can get actual live information about this stuff. But the > last time I looked, there's just too much per-packet latency going on. > The root cause looks like it's a toss up between scheduling, locking and just > lots of code running to completion per-frame. As I said, that all has to die > somehow. > > 2c, > > > > -adrian > > > > On 1 September 2013 08:45, Barney Cordoba > wrote: > > > > > > > Comcast sends packets OOO. With any decent number of internet hops > > you're likely to encounter a load balancer or packet shaper that sends > > packets OOO, so you just can't be worried about it. In fact, your > > designs MUST work with OOO packets. > > > > Getting balance on your load balanced lines is certainly a bigger > > upside than the additional CPU used. > > You can buy a faster processor for your "stack" for a lot less than > > you can buy bandwidth. > > > > Frankly my opinion of LRO is that it's a science project suitable for > > labs only. It's a trick to get more bandwidth than your bus capacity; > > the answer is to not run PCIe2 if you need pcie3. > > You can use it internally if you have > > control of all of the machines. When I modify a driver the first thing > > that I do is rip it out. > > > > BC > > > > > > ________________________________ > >  From: Luigi Rizzo > > To: Barney Cordoba > > Cc: Andre Oppermann ; Alan Somers > >; "net@freebsd.org" ; Jack F > >Vogel ; Justin T. Gibbs ; T.C. > >Gubatayao < tgubatayao@barracuda.com> > > Sent: Saturday, August 31, 2013 10:27 PM > > Subject: Re: Flow ID, LACP, and igb > > > > > > On Sun, Sep 1, 2013 at 4:15 AM, Barney Cordoba > > > >wrote: > > > > > ... > > > > > > > [your point on testing with realistic assumptions is surely a valid > > one] > > > > > > > > > > Of course there's nothing really wrong with OOO packets. We had this > > > discussion before; lots of people have round robin dual homing > > > without any ill effects. It's just not an issue. > > > > > > > It depends on where you are. > > It may not be an issue if the reordering is not large enough to > > trigger retransmissions, but even then it is annoying as it causes > > more work in the endpoint -- it prevents LRO from working, and even > > on the host stack it takes more work to sort where an out of order > > segment goes than appending an in-order one to the socket buffer. > > > > cheers > > luigi > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"