From owner-freebsd-net@FreeBSD.ORG Fri Sep 29 22:45:34 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D250D16A494; Fri, 29 Sep 2006 22:45:34 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id E059943D76; Fri, 29 Sep 2006 22:45:31 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.13.6/8.13.6) with ESMTP id k8TMjSLL000487 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Fri, 29 Sep 2006 18:45:28 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id k8TMjNdL062030; Fri, 29 Sep 2006 18:45:23 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17693.41475.778558.381395@grasshopper.cs.duke.edu> Date: Fri, 29 Sep 2006 18:45:23 -0400 (EDT) To: Andre Oppermann In-Reply-To: <451D9E59.9050000@freebsd.org> References: <451C4850.5030302@freebsd.org> <451D884F.1030807@cisco.com> <20060929213722.GR80527@funkthat.com> <451D9440.6060105@cisco.com> <17693.39106.950631.742167@grasshopper.cs.duke.edu> <451D9E59.9050000@freebsd.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Cc: freebsd-net@freebsd.org, freebsd-current@freebsd.org Subject: Re: Much improved sosend_*() functions X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Sep 2006 22:45:34 -0000 Andre Oppermann writes: > Andrew Gallatin wrote: > > Andre, > > > > I meant to ask: Did you try 16KB jumbos? Did they perform > > any better than page-sized jumbos? > > No, I didn't try 16K jumbos. The problem with anything larger than > page size is that it may look contigous in kernel memory but isn't > in physical memory. Thus you need the same number of descriptors > for the network card as with page sized (4K) clusters. But it would allow you to do one copyin, rather than 4. I don't know how much this would help, but it might be worth looking at. > > Also, if we're going to change how mbufs work, let's add something > > like Linux's skb_frag_t frags[MAX_SKB_FRAGS]; In FreeBSD parlence, > > this embeds something like an array of sf_bufs pointers in mbuf. The > > big difference to a chain of M_EXT mbufs is that you need to allocate > > only one mbuf wrapper, rather than one for each item in the list. > > Also, the reference is kept in the page (or sf_buf) itself, and the > > data offset is kept in the skbbuf (or mbuf). > > We are not going to change how mbufs work. > > > This allows us to do cool things like allocate a single page, and use > > both halves of it for 2 separate 1500 byte frames. This allows us to > > achieve *amazing* results in combination with LRO, because it allows > > us to do, on average, many fewer allocations per byte. Especially in > > combination with Linux's "high order" page allocations. Using order-2 > > allocations and LRO, I've actually seen 10GbE line rate receives on a > > wimpy 2.0GHz Athlon64. > > I have just started tackling the receive path. Lets see what comes out > of it first before we jump to conclusions. It could be mbufs are cheaper to get than skbs and pages on linux, but I doubt it. FWIW, linux has an skb chaining mechanism (frag_list). My first LRO experiment was based on allocating "normal" skbs and chaining them. That maxed out at around 5.2Gb/s (on the same hardware I see line rate on). Drew