Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Nov 2002 09:43:45 -0500 (EST)
From:      Robert Watson <rwatson@freebsd.org>
To:        Luigi Rizzo <rizzo@icir.org>
Cc:        current@freebsd.org
Subject:   Re: mbuf header bloat ?
Message-ID:  <Pine.NEB.3.96L.1021122093018.57728C-100000@fledge.watson.org>
In-Reply-To: <20021121111709.A23435@xorpc.icir.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Thu, 21 Nov 2002, Luigi Rizzo wrote:

> [Bcc to -net because it is relevant there. This email has been triggered
> by a private discussion i was having with other committers (who will
> easily recognise themselves :) which suggested the possibility of adding
> more fields to mbuf headers]
> 
> Just recently came up to my attention that we have the following code in
> <sys/_label.h>
> 
>     #define MAC_MAX_POLICIES        4
>     struct label {
>         int     l_flags;
>         union {
>                 void    *l_ptr;
>                 long     l_long;
>         }       l_perpolicy[MAC_MAX_POLICIES];
>     };
> 
> (what are l_perpolicy[], ints ? Could this be written a bit better ?)
> and then in <sys/mbuf.h>

MAC labels provide general purpose security label storage across a variety
of kernel objects.  Each MAC label is made up of a number of "slots" which
are allocated to statically linked or dynamically loaded policies.  The
union is used to allow policies to use their slot for either the purposes
of an integer store, or as a pointer with the semantics they define.  Some
policies simply use the long to represent their policy information,
perhaps by itself (a partition number), or to key into an existing array. 
Other policies use the pointer to point to shared reference-counted,
static, or per-label data.  Probably the "long" should be changed to some
other integer type that better matches the notion of pointer length.

>     struct pkthdr {
>         struct  ifnet *rcvif;           /* rcv interface */
>         int     len;                    /* total packet length */
>         /* variables for ip and tcp reassembly */
>         void    *header;                /* pointer to packet header */
>         /* variables for hardware checksum */
>         int     csum_flags;             /* flags regarding checksum */
>         int     csum_data;              /* data field used by csum routines */
>         SLIST_HEAD(packet_tags, m_tag) tags; /* list of packet tags */
>         struct  label label;            /* MAC label of data in packet */  
>     };
>  
> The label is 5 ints, the pkthdr a total of 11 ints (and m_hdr takes
> another 6, for a total of 136 bytes of header info on 64-bit
> architectures). 
> 
> Of the pkthdr, only 3 fields (rcvif, len, tags) are of really general
> use, the rest being used only in certain cases and for very specific
> purposes (e.g. reassembly of fragments, or hw capabilities, or MAC). 
> 
> Now that Sam has done the excellent work of integrating packet tags to
> carry annotations around, i really believe that we should try to move
> out of the pkthdr all non-general fields, and move them to m_tags so we
> only pay the cost when needed and not in all cases.  Also this pays a
> lot in terms of ABI compatibility and extensibility.  I understand that
> for 5.0 it is a bit late to act, but i do hope that we can reconsider
> this issue for 5.1 and pull out of the pkthdr at least the MAC label,
> and possibly also the csum_* fields, much in the same way it has been
> done for VLAN labels. 

In the original MAC label design for mbufs, and up until very recently,
m_tag wasn't available, so that design didn't use it.  We traded off
various things, including measured packet lengths, and decided the 20
bytes was an acceptable cost given the available extension services.  I'm
certainly willing to re-consider that notion now that we have general
extensibility, and now that we have a good, SMP-safe slab allocator in
5.0.  However, one thing to keep in mind is that in a MAC environment,
every packet header mbuf does have a valid label, and as a result,
inserting additional memory allocations for every packet handled can have
substantial cost.  I've had a number of requests to make "options MAC" a
default-shipped option: it's not ready yet (as an experimental feature),
but it may well be by 6.0 we are ready for that, assuming the performance
numbers are right.  In that situation, it could well be that it does make
sense to maintain the label data in the mbuf pkthdr, since it really will
be used for all pkthdr's.

There's a hard tradeoff here, and it applies to all of the data in the
packet header.  On the one hand, we want to keep the mbuf packet header
data small: any data there cuts into the space available for fast packet
storage without a cluster.  We also want to keep it protocol-independent,
since we use mbufs for all protocols, as well (in a number of situations) 
as a general purpose memory allocator for the network stack.  On the other
hand, we also want to use the highest performance configuration for the
common case, and the reality is that our current common case is IPv4
networking.  I'm not a big fan of performance hacks, but if we're reaching
a time when >50% of network cards provide support for IP checksum handling
on the card, paying a few bytes cost per mbuf header may be a definite
win.  As you suggest, this is probably a question we need to revisit once
5.0 is out the door, because we really can't make changes like this right
now.

One thing I'd like to consider looking at at some point in the future,
BTW, is whether there are benefits to moving away from the use of mbufs as
general purpose storage in the network stack.  I think mbufs work quite
well for packets which do use the incremental parsing, appending, etc,
operation.  But for things like IP fragment queues, socket options, etc,
mbufs are a much weaker match.  Experimenting with UMA as a vehicle for
those types of allocations might make sense.  Previously, for locking and
spl safety reasons, this often wouldn't have been possible, but perhaps it
is now. 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1021122093018.57728C-100000>