Date: Mon, 27 Sep 2010 19:22:29 +0200 From: Andre Oppermann <andre@freebsd.org> To: Julian Elischer <julian@freebsd.org> Cc: Jeff Roberson <jeff@freebsd.org>, Luigi Rizzo <rizzo@iet.unipi.it>, FreeBSD Net <net@freebsd.org> Subject: Re: mbuf changes Message-ID: <4CA0D2D5.6070801@freebsd.org> In-Reply-To: <4CA0C1B5.2090309@freebsd.org> References: <4C9DA26D.7000309@freebsd.org> <4C9DB0C3.5010601@freebsd.org> <20100925163010.GA76213@onelab2.iet.unipi.it> <4CA09451.7010401@freebsd.org> <20100927131836.GA99909@onelab2.iet.unipi.it> <4CA098BA.2010106@freebsd.org> <4CA0C1B5.2090309@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 27.09.2010 18:09, Julian Elischer wrote: > On 9/27/10 6:14 AM, Andre Oppermann wrote: >> On 27.09.2010 15:18, Luigi Rizzo wrote: >>> On Mon, Sep 27, 2010 at 02:55:45PM +0200, Andre Oppermann wrote: >>> ... >>>>> my idea was to have an extra field in the mbuf to tell how much room >>>>> should be reserved/used for metadata (such as mtags) after >>>>> the payload area so you don't need to change the allocator, and >>>>> possibly can even modify this on an existing mbuf. >>>>> Almost always mbufs have spare room (e.g. incoming pkts have all >>>>> data in the cluster and mostly empty mdata; outgoing, except >>>>> for rare cases, tend to be in a similar situation. >>>>> So this approach would allow to take an already allocated >>>>> mbuf and put the mtag in the spare area after the data. >>>> >>>> For incoming data this approach could work as usually 2K mbuf clusters >>>> are used and they have trailing space available, or rather the normal >>>> mbuf referencing the cluster doesn't have its own data section unused. >>>> >>>> When trailing space should be used the M_TAILINGSPACE() needs modifications >>>> and a full tree audit is required to make sure that all mbuf consumers are >>>> correctly using it and not some own version that directly assumes certain >>>> mbuf sizes, etc. A lot of work. >>>> >>>> For locally generated mbufs and socket buffers we try to use the mbufs to >>>> their maximal extent. When the socket buffer data is packetized it normally >>>> is referenced then we get the normal mbuf with its data portion unused. So >>>> that could work. >>>> >>>> A complication is the m_tag_free() field and function which puts the memory >>>> deallocation into the hands of the mtag user. That means all mtag consumers >>>> have to made aware of provided storage w/o having to return the memory >>>> directly >>>> to the memory allocator (malloc/UMA). >>>> >>>> So the only way I realistically see is to make use of the mbuf's unused >>>> data portion when it has external storage to it. This should probably >>>> cover about 98% of all cases. The rest has to malloc() the mtag storage >>>> as usual. >>> >>> so it wouldn't be bad -- i cannot judge the numbers, but definitely >>> it would work for all incoming traffic, plus all tcp data packets >>> (as the payload is in the cluster), plus all pure acks (which are small), >>> plus all UDP above some 200 bytes... >> >> Yes, about that. >> >>>> I could whip up a prototype for review in the next weeks. >>> >>> I seem to remember that jeffr had already something done in Perforce. >> >> That's a more general overhaul of the way mbuf's are structured and >> allocated with UMA. I'm not sure it provides for the mtag issue. Will >> check though. > > I'd like to see if we can go over his stuff and any other suggested changes before 9.0 > and see if we can agree on a change for 9.0 > > Jeff, we discussed this a year ago.. do you still have your suggested changes? In other recent communication Jeff indicated to revisit the mbuf/UMA situation at around end of this year. -- Andre
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CA0D2D5.6070801>