Date: Wed, 27 Nov 2002 10:58:02 -0500 (EST) From: Andrew Gallatin <gallatin@cs.duke.edu> To: Robert Watson <rwatson@freebsd.org> Cc: Luigi Rizzo <rizzo@icir.org>, current@freebsd.org Subject: Re: mbuf header bloat ? Message-ID: <15844.60298.44810.750373@grasshopper.cs.duke.edu> In-Reply-To: <Pine.NEB.3.96L.1021127095837.43889C-100000@fledge.watson.org> References: <15840.8629.324788.887872@grasshopper.cs.duke.edu> <Pine.NEB.3.96L.1021127095837.43889C-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson writes: > > Andrew, > > Thanks for your patience as I finished some research and experimentation > regarding the options there. Some more details below. > > On Sat, 23 Nov 2002, Andrew Gallatin wrote: > > > On the contrary, I think that if anything is going to be done, it must > > be done now, so as to not break binary network driver compatability like > > we did in 4.1.1 when the size of mbufs changed. Otherwise, we're stuck > > with it until 6.0. > > Per an on-going discussion on -arch, it seems there's a reasonable > concencus that the kernel driver ABI will not be frozen until 5.1, since > we need continued flexibility to mature the fine-grained locking, KSE, and > MAC technologies. This will allow us some wiggle room in resolving these > sorts of issues. I agree. > > As you eloquently state, there are a number of tradeoffs involved. On a > > 64-bit platform, 99% of users are paying 40 bytes/pkt for something that > > they will never use. On x86, 99.99% of users are paying 20 bytes/pkt > > for a feature they will never use. At least a signifigant fraction of > > nics make use of csum offloading (xl, ti, bge, em, myri). > > > > I propose that we make struct label portion of the pkthdr compile-time > > conditional on MAC. The assumption is that you will move the MAC label > > to an m_tag sometime after 5.0-RELEASE. > > For a variety of reasons, I'm averse to the notion of compile-time > components in the struct mbuf (and other) vital kernel structures. One of > the design requirements for the MAC Framework was that it be possible for > third party vendors to distribute security modules that plug in without > necessarily being part of the FreeBSD build infrastructure. While it is > true we currently require options MAC to be compiled into the kernel, we > don't require that you manually integrate module source into the kernel > source so that it builds as part of a kernel. Due to the way that I'm not at all certain that I understand this objection. I agree its a bit ugly and it was proposed as a last resort when I thought we'd be freezing the ABI at 5.0. So I'm not strongly advocating it, but I'm very curious as to why you think things need to be manually integrated for it to work. What I (as a 3rd party driver author working in a GNUish autoconf/gnumake environment) do is to require a user building from source to specify the location of a configured kernel tree where make depend has been run (defaulting to GENERIC). I then pickup the various option and bus files out of that directory. When I build binary modules, I build from source as a normal user (using a 4.1.1 system in a chroot). Using an approach like this, a vendor could ship a MAC aware driver by picking up the options files from a MAC kernel build directory. How is one supposed to build a 3rd party module these days? > separately shipped modules build out of the context of a kernel > configuration, this would introduce substantial problems. However, since > we believe that the kernel ABI will not be frozen until 5.1, if we have an > alternative place to put the label that doesn't expand the pkthdr, then we > can change it once we think the solution is ready. Agreed. > On the topic of m_tag: I've spent a few days working with m_tag now to see > if it can meet the needs of the MAC Framework. My conclusion is that, in > the form it's currently in the tree, it cannot meet the requirements. > However, I believe with a relatively straight forward set of > modifications, it can. As such, the proposed 5.1 time frame for moving > the MAC Framework to using m_tag is realistic. I'm currently exchanging > patches with Sam Leffler looking at how to tweak the various protocol > stacks to properly maintain m_tag chains on mbufs when mbufs are copied, > etc. These problems largely stem from a failure to maintain the tag > chains on mbufs over some of the copy/... operations that occur. The > result is that the MAC labels stored in mbufs are often discarded or lost, > and many packets float around the system without proper protection. For > policies that rely on ubituitous labeling, this results in rapid assertion > failures (yes, we fail very closed :-). I hope to post patches for these > changes in the next few days once I've had a perform more extensive > testing. Sam and I are having an on-going conversation about whether it > would be safe to introduce some of these changes before 5.0. > > There are some downsides to moving to m_tag for MAC labels. One is that > it effectively doubles the number of memory allocations in the system for > every packet delivered through the system when running with MAC if we > maintain the current semantic that all packets are labeled. This means > users will pay a higher cost for MAC even if they don't label packets, > which is unfortunate. I'm currently exploring the impact -- my hope is > that changes to the memory allocators since 4.x, such as the new mbuf > allocator and introduction of UMA, will largely mitigate that effect. A > fair amount of interest has been expressed in supporting MAC in the > GENERIC kernel eventually: if and when that becomes the case, we may find > that the rationale for moving the label out of the mbuf is reversed. Jeff's UMA is really amazing. I think it has sped things up considerably, but I've never measured things of course ;) At least on alpha and ia64, small allocations are amazingly fast because UMA grabs pages and feeds the kernel their K0SEG address, meaning almost no interaction with the VM system (no vm map manipulations, no IPIs, etc). > > This will immediately reduce the size of mbufs for the vast majority of > > users, and will prevent a 4.1.1 like flag-day for 3rd party network > > driver vendors. The only downside is that the few MAC users will not be > > able to use 3rd party binary network drivers until the MAC label is put > > into an m_tag. This seems fair, as the only people inconvienced are the > > people who want the labels and they are motivated to move them to an > > m_tag. But that's easy for me to say, since I don't run MAC, and I may > > be missing something big. > > I think you under-estimate the complexity of variably sized key kernel > data structures. mbuf.h is included all over the kernel, as well as in > many user applications (although often for bogus reasons). My proposed > strategy is the following: Bizzare. I had no idea userland apps used mbuf.h. That does indeed sound bogus. > (1) For 5.0, we either maintain the current storage of the struct label in > struct mbuf, or move to m_tag's if there is a concensus the set of > supporting changes is correct (move to m_dup_pkthdr() in a number of > places, introduce proper handling of wait dispositions for tag > allocation, and so on). I think its too late in the freeze for that, myself. > (2) For 5.1, assuming we're not already in m_tags, we move to m_tags for > the label. This is acceptable because we are opting not to freeze the > kernel driver ABI between 5.0 and 5.1, which will permit > infrastructural changes necessary to improve the performance and > stability of the 5.x branch without locking it entirely to the current > set of structure layout assumptions. That sounds correct to me. > I'd like to continue to explore options for reducing the number of memory > allocations to extend storage on mbufs. One idea I've been tossing around > is adopting Jeff Roberson's extension model used in struct proc and > related structures. > Thanks for working so hard on this and being so flexible. Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15844.60298.44810.750373>