Date: Mon, 25 Nov 2002 11:31:39 -0500 (EST) From: Robert Watson <rwatson@freebsd.org> To: Andrew Gallatin <gallatin@cs.duke.edu> Cc: Luigi Rizzo <rizzo@icir.org>, current@freebsd.org Subject: Re: mbuf header bloat ? Message-ID: <Pine.NEB.3.96L.1021125111737.36232C-100000@fledge.watson.org> In-Reply-To: <15840.8629.324788.887872@grasshopper.cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 23 Nov 2002, Andrew Gallatin wrote:
> I propose that we make struct label portion of the pkthdr compile-time
> conditional on MAC. The assumption is that you will move the MAC label
> to an m_tag sometime after 5.0-RELEASE.
This weekend I spent about six hours looking at what it would take to move
MAC label data into m_tags. While in theory it is a workable idea, it
turns out our m_tag implementation is fairly far from being ready to
handle something like this. I ran into the following immediate problems:
(1) When packet headers are copied using m_copy_pkthdr(), different
consumers have different expectations for what the resulting semantics
are for m_tag data -- some want it duplicated, others want it moved.
In practice, it is only ever moved, so consumers that expect
duplication are in for a surprise. We need to re-implement the packet
header copying code so that it can generate a failure (because it
involves allocation), and separate the duplicate and move abstractions
to get clean semantics. I exchanged some e-mail with Sam Leffler on
the topic, and apparently OpenBSD has already made these changes, or
similar ones, and we should do the same for 5.1.
(2) m_tag's don't have a notion that the data carried in a tag is
multi-dimmensional and may require special destructor behavior. While
it does centralize copying and free'ing of data, it handles this
purely with bcopy, malloc, and free, which is not appropriate for use
with MAC labels, since they may contain a variety of per-policy data
that may require special handling (reference count management, etc).
I tried putting tag-specific release/free/... code in the m_tag
central routines, and it looks like that would work, although it
eventually would lead to a lot of junk in the m_tag code. We might
want to consider m_tag entry free/copy pointers of some sort, but I'm
not sure if we want to go there. Adding the MAC stuff to the
m_tag_{free,copy,...} calls won't break the ABI, whereas adding free
and copy pointers to the tags themselves would.
(3) Not all code generating packets properly initializes m_tag field. The
one example I've come across so far is the use of m_getcl() to grab
mbufs with an attached cluster -- it never initializes the SLIST
properly, as far as I can tell. Right now that's used in device
drivers, and also in the BPF packet generation code. If the header is
zero'd, this may be OK due to an implicit proper initialization, but
this is concerning. We need to do more work to normalize packet
handling.
(4) Code still exists that improperly hand-copies mbuf packet header data.
if_loop.c contains some particular bogus code that also triggers a
panic in the MAC code for the same reason. If m_tag data ever passes
through if_loop and hits the re-alignment case introduced by KAME, the
system will panic when the tag data is free'd. This code all needs to
be normalized, and proper semantics must be enforced.
> This will immediately reduce the size of mbufs for the vast majority of
> users, and will prevent a 4.1.1 like flag-day for 3rd party network
> driver vendors. The only downside is that the few MAC users will not be
> able to use 3rd party binary network drivers until the MAC label is put
> into an m_tag. This seems fair, as the only people inconvienced are the
> people who want the labels and they are motivated to move them to an
> m_tag. But that's easy for me to say, since I don't run MAC, and I may
> be missing something big.
In the past I have looked at adding conditionally-defined components to
struct mbuf and other key kernel data structures. While the condition of
the tree is improving from this perspective due to better isolation of
user and kernel data structures, the result is still incredibly messy,
especially if you key the conditionally defined sections on a kernel
option. mbuf.h is included in a number of userland applications -- some
expected, such as the ipfilter test framework, but others less expected --
such as BIND. I'm very wary of the notion of adding conditionally defined
portions of struct mbuf on this (and other) bases. I'll take a look at
whether many of the obvious foot-shooting scenarios still exist since I
last tried it. Moving to m_tag looks like a reasonable long-term
strategy, but until the m_tag code is substantially more mature, it isn't
realistic. Otherwise, I might have attempted to push through a change to
it now before RC1.
BTW, do you have any recent large-scale measurements of packet size
distribution? In local tests and measurements, the additional 20 bytes on
i386 didn't bump the remaining mbuf data space sufficiently low to
substantially change the behavior of the stack. However, I haven't done
measurements against the 64-bit variation. In practice, a number of
network interfaces now seem to use clustered mbufs and not attempt to use
the in-mbuf storage space... All my packet distribution measurements come
from a typical ISP environment, but may not match what is seen in
large-scale backbone environments.
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org Network Associates Laboratories
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1021125111737.36232C-100000>
