From owner-freebsd-current Wed Nov 27 7:31:49 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9115037B401 for ; Wed, 27 Nov 2002 07:31:46 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id F0E8943EA9 for ; Wed, 27 Nov 2002 07:31:45 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.12.6/8.12.5) with SMTP id gARFVeBF045622; Wed, 27 Nov 2002 10:31:40 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Wed, 27 Nov 2002 10:31:39 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Andrew Gallatin Cc: Luigi Rizzo , current@freebsd.org Subject: Re: mbuf header bloat ? In-Reply-To: <15840.8629.324788.887872@grasshopper.cs.duke.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Andrew, Thanks for your patience as I finished some research and experimentation regarding the options there. Some more details below. On Sat, 23 Nov 2002, Andrew Gallatin wrote: > On the contrary, I think that if anything is going to be done, it must > be done now, so as to not break binary network driver compatability like > we did in 4.1.1 when the size of mbufs changed. Otherwise, we're stuck > with it until 6.0. Per an on-going discussion on -arch, it seems there's a reasonable concencus that the kernel driver ABI will not be frozen until 5.1, since we need continued flexibility to mature the fine-grained locking, KSE, and MAC technologies. This will allow us some wiggle room in resolving these sorts of issues. > As you eloquently state, there are a number of tradeoffs involved. On a > 64-bit platform, 99% of users are paying 40 bytes/pkt for something that > they will never use. On x86, 99.99% of users are paying 20 bytes/pkt > for a feature they will never use. At least a signifigant fraction of > nics make use of csum offloading (xl, ti, bge, em, myri). > > I propose that we make struct label portion of the pkthdr compile-time > conditional on MAC. The assumption is that you will move the MAC label > to an m_tag sometime after 5.0-RELEASE. For a variety of reasons, I'm averse to the notion of compile-time components in the struct mbuf (and other) vital kernel structures. One of the design requirements for the MAC Framework was that it be possible for third party vendors to distribute security modules that plug in without necessarily being part of the FreeBSD build infrastructure. While it is true we currently require options MAC to be compiled into the kernel, we don't require that you manually integrate module source into the kernel source so that it builds as part of a kernel. Due to the way that separately shipped modules build out of the context of a kernel configuration, this would introduce substantial problems. However, since we believe that the kernel ABI will not be frozen until 5.1, if we have an alternative place to put the label that doesn't expand the pkthdr, then we can change it once we think the solution is ready. On the topic of m_tag: I've spent a few days working with m_tag now to see if it can meet the needs of the MAC Framework. My conclusion is that, in the form it's currently in the tree, it cannot meet the requirements. However, I believe with a relatively straight forward set of modifications, it can. As such, the proposed 5.1 time frame for moving the MAC Framework to using m_tag is realistic. I'm currently exchanging patches with Sam Leffler looking at how to tweak the various protocol stacks to properly maintain m_tag chains on mbufs when mbufs are copied, etc. These problems largely stem from a failure to maintain the tag chains on mbufs over some of the copy/... operations that occur. The result is that the MAC labels stored in mbufs are often discarded or lost, and many packets float around the system without proper protection. For policies that rely on ubituitous labeling, this results in rapid assertion failures (yes, we fail very closed :-). I hope to post patches for these changes in the next few days once I've had a perform more extensive testing. Sam and I are having an on-going conversation about whether it would be safe to introduce some of these changes before 5.0. There are some downsides to moving to m_tag for MAC labels. One is that it effectively doubles the number of memory allocations in the system for every packet delivered through the system when running with MAC if we maintain the current semantic that all packets are labeled. This means users will pay a higher cost for MAC even if they don't label packets, which is unfortunate. I'm currently exploring the impact -- my hope is that changes to the memory allocators since 4.x, such as the new mbuf allocator and introduction of UMA, will largely mitigate that effect. A fair amount of interest has been expressed in supporting MAC in the GENERIC kernel eventually: if and when that becomes the case, we may find that the rationale for moving the label out of the mbuf is reversed. > This will immediately reduce the size of mbufs for the vast majority of > users, and will prevent a 4.1.1 like flag-day for 3rd party network > driver vendors. The only downside is that the few MAC users will not be > able to use 3rd party binary network drivers until the MAC label is put > into an m_tag. This seems fair, as the only people inconvienced are the > people who want the labels and they are motivated to move them to an > m_tag. But that's easy for me to say, since I don't run MAC, and I may > be missing something big. I think you under-estimate the complexity of variably sized key kernel data structures. mbuf.h is included all over the kernel, as well as in many user applications (although often for bogus reasons). My proposed strategy is the following: (1) For 5.0, we either maintain the current storage of the struct label in struct mbuf, or move to m_tag's if there is a concensus the set of supporting changes is correct (move to m_dup_pkthdr() in a number of places, introduce proper handling of wait dispositions for tag allocation, and so on). (2) For 5.1, assuming we're not already in m_tags, we move to m_tags for the label. This is acceptable because we are opting not to freeze the kernel driver ABI between 5.0 and 5.1, which will permit infrastructural changes necessary to improve the performance and stability of the 5.x branch without locking it entirely to the current set of structure layout assumptions. I'd like to continue to explore options for reducing the number of memory allocations to extend storage on mbufs. One idea I've been tossing around is adopting Jeff Roberson's extension model used in struct proc and related structures. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message