From owner-freebsd-current Fri Nov 22 6:43:57 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 65BEF37B401 for ; Fri, 22 Nov 2002 06:43:54 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2A58043EAA for ; Fri, 22 Nov 2002 06:43:51 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.12.6/8.12.5) with SMTP id gAMEhjBF058615; Fri, 22 Nov 2002 09:43:45 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Fri, 22 Nov 2002 09:43:45 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Luigi Rizzo Cc: current@freebsd.org Subject: Re: mbuf header bloat ? In-Reply-To: <20021121111709.A23435@xorpc.icir.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, 21 Nov 2002, Luigi Rizzo wrote: > [Bcc to -net because it is relevant there. This email has been triggered > by a private discussion i was having with other committers (who will > easily recognise themselves :) which suggested the possibility of adding > more fields to mbuf headers] > > Just recently came up to my attention that we have the following code in > > > #define MAC_MAX_POLICIES 4 > struct label { > int l_flags; > union { > void *l_ptr; > long l_long; > } l_perpolicy[MAC_MAX_POLICIES]; > }; > > (what are l_perpolicy[], ints ? Could this be written a bit better ?) > and then in MAC labels provide general purpose security label storage across a variety of kernel objects. Each MAC label is made up of a number of "slots" which are allocated to statically linked or dynamically loaded policies. The union is used to allow policies to use their slot for either the purposes of an integer store, or as a pointer with the semantics they define. Some policies simply use the long to represent their policy information, perhaps by itself (a partition number), or to key into an existing array. Other policies use the pointer to point to shared reference-counted, static, or per-label data. Probably the "long" should be changed to some other integer type that better matches the notion of pointer length. > struct pkthdr { > struct ifnet *rcvif; /* rcv interface */ > int len; /* total packet length */ > /* variables for ip and tcp reassembly */ > void *header; /* pointer to packet header */ > /* variables for hardware checksum */ > int csum_flags; /* flags regarding checksum */ > int csum_data; /* data field used by csum routines */ > SLIST_HEAD(packet_tags, m_tag) tags; /* list of packet tags */ > struct label label; /* MAC label of data in packet */ > }; > > The label is 5 ints, the pkthdr a total of 11 ints (and m_hdr takes > another 6, for a total of 136 bytes of header info on 64-bit > architectures). > > Of the pkthdr, only 3 fields (rcvif, len, tags) are of really general > use, the rest being used only in certain cases and for very specific > purposes (e.g. reassembly of fragments, or hw capabilities, or MAC). > > Now that Sam has done the excellent work of integrating packet tags to > carry annotations around, i really believe that we should try to move > out of the pkthdr all non-general fields, and move them to m_tags so we > only pay the cost when needed and not in all cases. Also this pays a > lot in terms of ABI compatibility and extensibility. I understand that > for 5.0 it is a bit late to act, but i do hope that we can reconsider > this issue for 5.1 and pull out of the pkthdr at least the MAC label, > and possibly also the csum_* fields, much in the same way it has been > done for VLAN labels. In the original MAC label design for mbufs, and up until very recently, m_tag wasn't available, so that design didn't use it. We traded off various things, including measured packet lengths, and decided the 20 bytes was an acceptable cost given the available extension services. I'm certainly willing to re-consider that notion now that we have general extensibility, and now that we have a good, SMP-safe slab allocator in 5.0. However, one thing to keep in mind is that in a MAC environment, every packet header mbuf does have a valid label, and as a result, inserting additional memory allocations for every packet handled can have substantial cost. I've had a number of requests to make "options MAC" a default-shipped option: it's not ready yet (as an experimental feature), but it may well be by 6.0 we are ready for that, assuming the performance numbers are right. In that situation, it could well be that it does make sense to maintain the label data in the mbuf pkthdr, since it really will be used for all pkthdr's. There's a hard tradeoff here, and it applies to all of the data in the packet header. On the one hand, we want to keep the mbuf packet header data small: any data there cuts into the space available for fast packet storage without a cluster. We also want to keep it protocol-independent, since we use mbufs for all protocols, as well (in a number of situations) as a general purpose memory allocator for the network stack. On the other hand, we also want to use the highest performance configuration for the common case, and the reality is that our current common case is IPv4 networking. I'm not a big fan of performance hacks, but if we're reaching a time when >50% of network cards provide support for IP checksum handling on the card, paying a few bytes cost per mbuf header may be a definite win. As you suggest, this is probably a question we need to revisit once 5.0 is out the door, because we really can't make changes like this right now. One thing I'd like to consider looking at at some point in the future, BTW, is whether there are benefits to moving away from the use of mbufs as general purpose storage in the network stack. I think mbufs work quite well for packets which do use the incremental parsing, appending, etc, operation. But for things like IP fragment queues, socket options, etc, mbufs are a much weaker match. Experimenting with UMA as a vehicle for those types of allocations might make sense. Previously, for locking and spl safety reasons, this often wouldn't have been possible, but perhaps it is now. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message