From owner-freebsd-current Sat Nov 23 16:48:37 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7CF0337B401; Sat, 23 Nov 2002 16:48:34 -0800 (PST) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id A80CE43E3B; Sat, 23 Nov 2002 16:48:33 -0800 (PST) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id TAA11206; Sat, 23 Nov 2002 19:48:19 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.6/8.9.1) id gAO0ln572530; Sat, 23 Nov 2002 19:47:49 -0500 (EST) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15840.8629.324788.887872@grasshopper.cs.duke.edu> Date: Sat, 23 Nov 2002 19:47:49 -0500 (EST) To: Robert Watson Cc: Luigi Rizzo , current@freebsd.org Subject: Re: mbuf header bloat ? In-Reply-To: References: <20021121111709.A23435@xorpc.icir.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Robert Watson writes: > > On Thu, 21 Nov 2002, Luigi Rizzo wrote: <...> > > The label is 5 ints, the pkthdr a total of 11 ints (and m_hdr takes > > another 6, for a total of 136 bytes of header info on 64-bit > > architectures). > > > > Of the pkthdr, only 3 fields (rcvif, len, tags) are of really general > > use, the rest being used only in certain cases and for very specific > > purposes (e.g. reassembly of fragments, or hw capabilities, or MAC). > > > > Now that Sam has done the excellent work of integrating packet tags to > > carry annotations around, i really believe that we should try to move > > out of the pkthdr all non-general fields, and move them to m_tags so we > > only pay the cost when needed and not in all cases. Also this pays a > > lot in terms of ABI compatibility and extensibility. I understand that > > for 5.0 it is a bit late to act, but i do hope that we can reconsider > > this issue for 5.1 and pull out of the pkthdr at least the MAC label, > > and possibly also the csum_* fields, much in the same way it has been > > done for VLAN labels. > > In the original MAC label design for mbufs, and up until very recently, > m_tag wasn't available, so that design didn't use it. We traded off > various things, including measured packet lengths, and decided the 20 > bytes was an acceptable cost given the available extension services. I'm > certainly willing to re-consider that notion now that we have general > extensibility, and now that we have a good, SMP-safe slab allocator in > 5.0. However, one thing to keep in mind is that in a MAC environment, > every packet header mbuf does have a valid label, and as a result, > inserting additional memory allocations for every packet handled can have > substantial cost. I've had a number of requests to make "options MAC" a > default-shipped option: it's not ready yet (as an experimental feature), > but it may well be by 6.0 we are ready for that, assuming the performance > numbers are right. In that situation, it could well be that it does make > sense to maintain the label data in the mbuf pkthdr, since it really will > be used for all pkthdr's. > > There's a hard tradeoff here, and it applies to all of the data in the > packet header. On the one hand, we want to keep the mbuf packet header > data small: any data there cuts into the space available for fast packet > storage without a cluster. We also want to keep it protocol-independent, > since we use mbufs for all protocols, as well (in a number of situations) > as a general purpose memory allocator for the network stack. On the other > hand, we also want to use the highest performance configuration for the > common case, and the reality is that our current common case is IPv4 > networking. I'm not a big fan of performance hacks, but if we're reaching > a time when >50% of network cards provide support for IP checksum handling > on the card, paying a few bytes cost per mbuf header may be a definite > win. As you suggest, this is probably a question we need to revisit once > 5.0 is out the door, because we really can't make changes like this right > now. On the contrary, I think that if anything is going to be done, it must be done now, so as to not break binary network driver compatability like we did in 4.1.1 when the size of mbufs changed. Otherwise, we're stuck with it until 6.0. As you eloquently state, there are a number of tradeoffs involved. On a 64-bit platform, 99% of users are paying 40 bytes/pkt for something that they will never use. On x86, 99.99% of users are paying 20 bytes/pkt for a feature they will never use. At least a signifigant fraction of nics make use of csum offloading (xl, ti, bge, em, myri). I propose that we make struct label portion of the pkthdr compile-time conditional on MAC. The assumption is that you will move the MAC label to an m_tag sometime after 5.0-RELEASE. This will immediately reduce the size of mbufs for the vast majority of users, and will prevent a 4.1.1 like flag-day for 3rd party network driver vendors. The only downside is that the few MAC users will not be able to use 3rd party binary network drivers until the MAC label is put into an m_tag. This seems fair, as the only people inconvienced are the people who want the labels and they are motivated to move them to an m_tag. But that's easy for me to say, since I don't run MAC, and I may be missing something big. Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message