From owner-freebsd-net@FreeBSD.ORG Sat Sep 6 09:07:11 2003 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 45E0A16A4BF for ; Sat, 6 Sep 2003 09:07:11 -0700 (PDT) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 87E5A44003 for ; Sat, 6 Sep 2003 09:07:10 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.8p1/8.12.3) with ESMTP id h86G71kN003228; Sat, 6 Sep 2003 09:07:01 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.8p1/8.12.3/Submit) id h86G712V003227; Sat, 6 Sep 2003 09:07:01 -0700 (PDT) (envelope-from rizzo) Date: Sat, 6 Sep 2003 09:07:01 -0700 From: Luigi Rizzo To: Andrew Gallatin Message-ID: <20030906090701.A3163@xorpc.icir.org> References: <16216.63066.954104.582195@grasshopper.cs.duke.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <16216.63066.954104.582195@grasshopper.cs.duke.edu>; from gallatin@cs.duke.edu on Fri, Sep 05, 2003 at 04:47:22PM -0400 cc: freebsd-net@freebsd.org Subject: Re: TCP Segmentation Offload X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Sep 2003 16:07:11 -0000 On Fri, Sep 05, 2003 at 04:47:22PM -0400, Andrew Gallatin wrote: > > I've been reading a little about TCP Segmentation Offload (aka TSO). > We don't appear to support it, but at least 2 of our supported nics > (e1000 and bge) apparently could support it. i believe there is more commercial hype than actual savings in doing TCP Segmentation Offload. With delayed acks (or better, "ack every second packet"), the sender's TCP typically sends out two packets at a time. Without delayed acks, it is just one at a time. So yes, you avoid looping in tcp_output() twice, but on the other hand in the second execution everything is already in cache, jump prediction units get it right in many places, and in the end i believe the actual savings will be a lot lower than 50% (in a real-life case, with multiple TCP flows active, and not dumping an entire 64k window onto the net at the first transmission). (btw the e1000 is really cheap!) cheers luigi > The gist is that TCP pretends the nic has a large mtu, and passes a > large (> the mtu on the link layer) packet down to driver and then the > nic. The nic then fragments the large packet into smaller (<=mtu) > packets. It uses the initial TCP header as a template to construct > the headers for the "fragments.". The people who implemented it on > linux claim a 50% CPU savings for an Intel 1Gb/s adaptor with a 1500 > byte mtu. > > It seems like it could be implemented rather easily by adding an > if_hwassist flag (CSUM_TSO). If this flag is set on the interface > found by tcp_mss(), then the mss is set to 56k. This causes TCP to > generate huge packets. We then add a check in ip_output() after the > (near the existing CSUM_FRAGMENT check) which checks to see if its > both a TCP packet, and if CSUM_TSO is set in the if_hwassist flags. > If so, the huge packet is passed on down to the driver. Does this > sound reasonable? The only other thing I can think of is that some > nics might not be able to handle such a large mss, and we might want > to stuff the maximum mss value into the ifnet struct. > > I don't have a bge or an e1000, so I'm not ready to actually implement > this. I'm more considering firmware optimizations for our product, > and would implement it in a few months, after making the firmware > changes. > > Drew > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"