Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 05 Oct 2010 12:01:42 -0400
From:      Karim Fodil-Lemelin <kfl@xiplink.com>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        Juli Mallett <jmallett@freebsd.org>, Ryan Stone <rysto32@gmail.com>, Robert Watson <rwatson@freebsd.org>, Rui Paulo <rpaulo@freebsd.org>, FreeBSD Net <net@freebsd.org>
Subject:   Re: mbuf changes
Message-ID:  <4CAB4BE6.3070307@xiplink.com>
In-Reply-To: <20101003131330.GA85551@onelab2.iet.unipi.it>
References:  <4C9DA26D.7000309@freebsd.org>	<AANLkTim7oRyVYY3frn7=cn4Et8Acbcq9cXja_bR6YWvP@mail.gmail.com>	<4CA51024.8020307@freebsd.org>	<alpine.BSF.2.00.1010021627230.49031@fledge.watson.org>	<9AD4923A-72AE-4FE3-A869-3AF8ECBF17E2@FreeBSD.org>	<AANLkTi=uxARo5O9MASrx9mg-39E2=x05RXxcKUB62JrB@mail.gmail.com>	<0DB8120D-C02A-49A1-8013-1ED818EDE7E6@freebsd.org> <20101003131330.GA85551@onelab2.iet.unipi.it>

next in thread | previous in thread | raw e-mail | index | archive | help
  On 03/10/2010 9:13 AM, Luigi Rizzo wrote:
> On Sun, Oct 03, 2010 at 12:29:21AM +0100, Rui Paulo wrote:
>> On 2 Oct 2010, at 21:35, Juli Mallett wrote:
>>
>>> On Sat, Oct 2, 2010 at 12:07, Rui Paulo<rpaulo@freebsd.org>  wrote:
>>>> On 2 Oct 2010, at 16:29, Robert Watson wrote:
>>>>> On Thu, 30 Sep 2010, Julian Elischer wrote:
>>>>>> On 9/30/10 10:49 AM, Ryan Stone wrote:
>>>>>>> It's not a big thing but it would be nice to replace the m_next and m_nextpkt fields with queue.h macros.
>>>>>> funny, I've never even thought of that..
>>>>> I have, and it's a massive change touching code all over the kernel in vast quantities.  While in principle it's a good idea (consistently avoid hand-crafted linked lists), it's something I'd discourage on the basis that it probably won't significant reduce the kernel bug count, but will make it even harder for vendors with large local changes to the network stack to keep up.
>>>> I think it could also increase the kernel bug count. Unfortunately, we can't do this incrementally.
>>> Can't we?  What about a union, so that we can gradually convert things
>>> but keep ABI and API compatibility?  I mean, as long as we use the
>>> right queue.h type, anyway, it should be consistent?  STAILQ,
>>> presumably.
>> Well, I don't have the layout of the mbuf struct offhand, but it's an idea worth investigating.
> what is the point of refactoring part of a struct that no new code is
> touching ?
>
> I'd like to keep this discussion on the original topics,
> i.e. performance-related issues (make room to embed mtags and other
> metadata such as FIB; have flexible per-socket initial padding so
> we don't always waste 100+ bytes just because ipv6+ipsec is compiled
> in; and so on).
> Please open another thread if you want to propose cosmetics or
> code refactoring or other unrelated changes
>
Hi,

I will share some of the experience I had doing embed mtags. Hopefully 
its relevant :)

The idea of carrying a certain amount of mbuf tags within the mbuf 
structure is somewhat similar but much cleaner, imo, then Linux's skbuff 
char cb[40 - 48] (it was 40bytes in 2.4.x ...). Now this idea is not new 
although as you know the devil is in the details...

What we did for BSD is create a container in the mbuf and extend the API 
with functions we (pompously) called m_tag_fast_alloc() and 
m_tag_fast_free(). This means the standard m_tag_alloc() is still 
supported across the system and the old behavior is unchanged (list of 
allocated struct attached to the packet header). Whats different is the 
availability of a 'fast' call that directly uses the container within 
the mbuf, effectively avoiding those malloc and cache misses. I'll 
explain later how we effectively support calling m_tag_delete on a 
'fast' tag.

The trick to save CPU cycles was also to quickly revert back to the 
standard tag mechanism if some component in the system is manipulating 
the tag list by deleting elements. Effectively, the m_tag_fast_free is a 
NOP and fast tags are not deleted once allocated (unless m_free is 
called on the mbuf of course). When m_tag_delete is called the container 
simply becomes 'fast tag' invalid for further additions. This is not 
flexible but has the merit of reducing the overall number of operations 
given that almost no components are deleting tags without deleting the 
mbuf (loopback does but its a special case).

One last thing we did is perform various operational tests to come up 
with the most statistically optimized container size. Now this is much 
easier to do on a proprietary system then for a general purpose OS but 
its certainly possible.

Finally, we did see speed increase for our application and if someone is 
interested I could provide a patch although I would have to rewrite it 
without the proprietary bits in it.

Best regards,

Karim.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CAB4BE6.3070307>