Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Nov 2011 12:24:03 +1100
From:      Lawrence Stewart <lstewart@freebsd.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Julien Ridoux <jrid@cubinlab.ee.unimelb.edu.au>, Ben Kaduk <minimarmot@gmail.com>
Subject:   Re: svn commit: r227778 - head/sys/net
Message-ID:  <4ED43433.7090400@freebsd.org>
In-Reply-To: <201111281420.43310.jhb@freebsd.org>
References:  <201111210417.pAL4HOdi023556@svn.freebsd.org> <8805563E-60F9-4BF4-A292-1A43C4FA368A@cubinlab.ee.unimelb.edu.au> <4ECCA2A1.4020408@freebsd.org> <201111281420.43310.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/29/11 06:20, John Baldwin wrote:
> On Wednesday, November 23, 2011 2:37:05 am Lawrence Stewart wrote:
>> On 11/23/11 17:42, Julien Ridoux wrote:
>>>
>>> On 23/11/2011, at 1:00 AM, Lawrence Stewart wrote:
>>>
>>>> On 11/23/11 00:30, John Baldwin wrote:
>>>>> Think of standalone modules that are not built as part of a
>>>>> kernel (e.g. 3rd party device drivers).  In general we should
>>>>> avoid having structures change size for kernel options,
>>>>> especially common structures.  It just adds lots of pain and
>>>>> suffering and complexity.  We are stuck with it for PAE on i386
>>>>> (which causes pain), and for LOCK_PROFILING (but that is
>>>>> sufficiently rare and expensive it seems to be ok).  I think 8
>>>>> bytes for bpf packet is not sufficiently expensive to justify the
>>>>> extra headache.  Just always leave the new field in.
>>>>
>>>> hmm... Julien almost has a patch finished which accomplishes what
>>>> my most recent email in this thread describes. Julien, I suggest we
>>>> get it finished and follow up to this thread with a pointer to the
>>>> patch for people to look at. If there's still a strong feeling that
>>>> what it does is going to bring pain we can do away with the new
>>>> BPF_FFCOUNTER config option and have the bpf header struct just
>>>> grow by 8 bytes.
>>>>
>>>> Stay tuned...
>>>
>>> Thanks all for the feedback. With some delay, I have a patch against
>>> r227871 that implements what Lawrence proposed. You can find it
>>> here:
>>> http://www.cubinlab.ee.unimelb.edu.au/~jrid/patches/ffclock-bpf-header-
> r227871.patch
>>
>> There are a few nits, but the patch implements what I envisaged, thanks
>> Julien.
>>
>>>   I have tested this under a few typical scenario, it works as
>>> expected but already brings some headaches (hence the long delay
>>> mentioned above :-)).
>>>
>>> I thought a bit more of user cases. I believe many of them call for
>>> having both feed-forward counter and its conversion in second be
>>> present in the BPF header. For example, this allows to have absolute
>>> packet departure/arrival times (as per usual), but also provides the
>>> opportunity to compute inter-arrival times accurately using the
>>> difference clock. There are other examples I can think of, and if one
>>> believe the feed-forward clock approach becomes more popular, such
>>> usages will be more and more common.
>>>
>>> Assuming the BPF header grows by 8 bytes independent of any kernel
>>> option, I admit that the current implementation is a bit ugly. The
>>> BPF structure is not nicely packed and looks clunky. Ideally, the
>>> feed-forward counter should be placed just below the bh_tstamp
>>> member, but this would require libpcap and all ports depending on it
>>> to be recompiled after this change.
>>
>> Even though it looks a bit gross, we would still add it at the end to
>> avoid gratuitously breaking binaries. We would then also add some
>> explicit padding in the struct to soak up the redundant space left in
>> between it and the second last struct member.
>
> If this is not something you expect to MFC, then I would just add it
> where it makes the most sense.  One question I have is if this affects
> the file format of what tcpdump -w writes out and tcpdump -r reads in?

My understanding from discussion with Julien and a brief code inspection 
is that libpcap asks for the fields from the BPF header by name, and 
therefore any new field is opaque until libpcap is patched to care about it.

> If that is affected then changing this will need much more thought.  If
> not, then I think it should be fixed "right" in 10.  If you need to MFC
> it then you may need to do some gymnatics to preserve the ABI in stable
> branches, but we prefer to keep HEAD clean so we don't build up layer
> upon layer of compat hacks.

hmm, ok.

>>> What is your favourite option?
>>
>> FreeBSD parlance is to ask what colour you would like to paint the
>> bikeshed ;)
>>
>> As I've never experienced the pain John refers to, I'll defer to the
>> wisdom of others on whether the proposed patch will create pain down the
>> road. I think it's ok, but if consensus is 8bytes per packet isn't going
>> to break the bank, I guess we just go for it - but I guess I am cautious
>> about this route as we can push a lot of packets per second through the
>> stack.
>
> In my limited experience the limit on pushing pps through bpf(4) isn't due
> to the size of the bpf packet header itself but has more to do with disk
> I/O transactions, etc.

I was more concerned about the fact that when pushing lots of packets 
per second, an extra 8 bytes could add up to a noticeable increase in 
memory usage, though it wouldn't really be an issue until we're talking 
about Mpps - I guess at 10GigE rates we can assume memory is likely to 
be abundant and a worst case extra ~100Mbytes of kernel mem usage won't 
hurt?

Cheers,
Lawrence



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4ED43433.7090400>