Date: Mon, 31 Dec 2001 21:46:50 -0500 From: "Louis A. Mamakos" <louie@TransSys.COM> To: Terry Lambert <tlambert2@mindspring.com> Cc: Matthew Dillon <dillon@apollo.backplane.com>, Julian Elischer <julian@elischer.org>, Mike Silbersack <silby@silby.com>, Josef Karthauser <joe@tao.org.uk>, Tomas Svensson <tsn@gbdev.net>, freebsd-hackers@FreeBSD.ORG Subject: Re: FreeBSD performing worse than Linux? Message-ID: <200201010246.g012ko721041@whizzo.transsys.com> In-Reply-To: Your message of "Mon, 31 Dec 2001 18:11:21 PST." <3C311AC9.99B5FC9C@mindspring.com> References: <Pine.BSF.4.21.0112311225150.94344-100000@InterJet.elischer.org> <200112312327.fBVNRt719835@whizzo.transsys.com> <200201010043.g010h0i36281@apollo.backplane.com> <3C311AC9.99B5FC9C@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Disabling Nagle's algorithm for no good reason has very poor scaling behavior. This is what happens when TCP_NODELAY is enabled on a socket. If you look at the work function for most network elements, the part that runs out of gas first is per-packet forwarding performance. Sure, you need to have adequate bus bandwidth to move stuff through a box, but it's performing per-packet forwarding operations and policy which is the resource that's most difficult to make more of. I think this is true for toy routers based on PC platform as well as high-end boxes like the Cisco 12000 series. Juniper managed adequate forwarding performance using specialized ASIC implementions in the forwarding path. Of this statement, I'm sure; in my day job at UUNET, I talk to all the major backbone router vendors, and forwarding performance (and also reasonable routing protocol implementions) is a show-stopper requirement they labor mightily over. So here was have a mechanism with wonderful properties - it's a trivial yet clever implementation of a self tuning mechanism to prevent tinygrams from being generated by a TCP without all manner of complicated timers. It give great performance on LAN and other high-speed interconnects where remote echo type applications are demanding, yet over long delay paths where remote echo is gonna suck no matter what you do, it automatically aggregates packets. Nagle's algorithm and Van Jacobson's slow-start algorithm allowed the Internet to survive over congested paths. And they did so with a bunch of self-tuning behavior independent of the bandwidth*delay product of the path the connection was running over. It was and is amazing stuff. Likewise, the original problem in this thread is likely caused by some part of the USB Ethernet implementation having inadequate per-packet resources. It's probably not about the number of bytes, but the number of transactions. You see here a modern reimplementation of essentially the same problem that the 3COM 3C501 ISA ethernet card had 15 years ago - back to back packets were consistantly dropped because of the poor per-packet buffering implementation. It was absolutely repeatable. Sure, it's "legal" to generate streams of tinygrams and not use Nagle's algorithm to aggregate the sender's traffic, but it's just plain rude and on low bandwidth links, it sucks because of all the extra 40 byte headers you're carrying around. I'm sure TCP_NODELAY got added because it sounds REALLY C00L to make the interactive thing go better. But clearly people don't understand the impact of turning on the cleverly named option and how it probably doesn't really improve things. louie To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200201010246.g012ko721041>