Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 8 Dec 2012 14:02:45 +0000
From:      David Chisnall <theraven@theravensnest.org>
To:        Aleksandr Rybalko <ray@FreeBSD.org>
Cc:        svn-src-projects@FreeBSD.org, Roman Divacky <rdivacky@FreeBSD.org>, src-committers@FreeBSD.org, Jung-uk Kim <jkim@FreeBSD.org>
Subject:   Re: svn commit: r243914 - projects/bpfjit
Message-ID:  <2434306D-5AC7-4624-B9E8-7C682350B78F@theravensnest.org>
In-Reply-To: <20121208152447.5b2958d2.ray@freebsd.org>
References:  <201212052312.qB5NC2Hn056351@svn.freebsd.org> <20121206084936.GA58940@freebsd.org> <50C0DFB0.6030007@FreeBSD.org> <20121208152447.5b2958d2.ray@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 8 Dec 2012, at 13:24, Aleksandr Rybalko wrote:

> On Thu, 06 Dec 2012 13:10:56 -0500
> Jung-uk Kim <jkim@FreeBSD.org> wrote:
>=20
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>=20
>> On 2012-12-06 03:49:36 -0500, Roman Divacky wrote:
>>> Hi,
>>>=20
>>> David Chisnall started bpf jitter based on llvm. You can check it
>>> out here:
>>>=20
>>> http://people.freebsd.org/~theraven/bpfjit/
>>>=20
>>>=20
>>> It's based on the idea of jitting the code in userspace and
>>> passing the resulting code to the kernel via some interface (this
>>> part is not done yet).
>>=20
>> Long time ago (about 10 years ago), I implemented something like that
>> (i.e., compile BPF program to native machine code in userspace, then
>> upload to kernel space) for my $job but I quickly replace it with
>> BPF_JITTER for several reasons.  First of all, there is a big =
security
>> risk.  A BPF filter program can be easily validated by kernel with
>> bpf_validate(9).  We cannot do that for native machine code and we
>> must not allow uploading arbitrary code to kernel space.  You may say
>> it is well protected by /dev/bpf permissions but it is not good
>> enough, i.e., all you need is read permission to inject code to =
kernel
>> space.
>> Second, LLVM is too heavy for BPF filter machine.  For example,
>=20
> +1
> Embedded FreeBSD will lost BPF if LLVM will be used for compilation :)

Really?  I've run LLVM JITs for more complex languages than BPF on =
machines with only 128MB of RAM.  LLVM itself takes about 5MB of storage =
space and 20MB of RAM (used only during compilation, unloaded =
immediately afterwards).  One REALLY embedded systems, the filter rules =
can be run on another host and provided in the form of a kernel module =
using exactly the same code.

>> libtrace did that long ago:
>>=20
>> http://www.wand.net.nz/trac/libtrace/changeset/1586
>>=20
>> Someone actually benchmarked it with other JIT implementations:
>>=20
>> http://carnivore.it/2011/12/28/bpf_performance

Reading the description there, I found it hard to believe that someone =
had actually written that LLVM implementation.  It is a case study in =
how not to implement an LLVM JIT.

>> LLVM compilation took too much time to be useful:
>>=20
>> engine		filter cycles	compile cycles
>> - ---------------+---------------+----------------
>> jit-linux 	106468		33126+72796
>> jit-freebsd 	113958		48292+72796
>> llvm 		157394		380843640+72796
>> pcap 		276910		72796
>> linux	 	351391		9245+72796
>>=20
>> I haven't tried theraven's implementation but I am afraid the result
>> may be similar.  On top of that, it cannot be easily embedded in
>> kernel.

Note that mine is a proof-of-concept prototype, however in my ad-hoc =
testing its output was about a third the size of the output of the =
current JIT.  A simpler JIT loses a lot through not being able to do =
even simple optimisations such as common subexpression elimination and =
through a very primitive register allocator. =20

The extra cost comes in the form of more CPU cycles spent actually =
running the optimisation.  JIT compilation is always a trade: is the =
result being run enough times to offset the time spent optimising.  I'd =
have thought this would be obvious for something that is run on every =
packet.  Even a very slow optimiser will be a net win after a while.  =
More importantly, the optimisation happens at the time the rules are =
loaded and so can run at a much lower priority, whereas the packet =
filter evaluation happens on the critical path for network traffic and =
impacts the latency of every single packet. =20

David=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2434306D-5AC7-4624-B9E8-7C682350B78F>