Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Oct 2024 01:23:53 +0300
From:      Vadim Goncharov <vadimnuclight@gmail.com>
To:        "Gavin D. Howard" <gavin@gavinhoward.com>
Cc:        "freebsd-hackers@FreeBSD.org" <freebsd-hackers@FreeBSD.org>
Subject:   Re: BPF64: proposal of platform-independent hardware-friendly backwards-compatible eBPF alternative
Message-ID:  <20241011012353.59784a96@nuclight.lan>
In-Reply-To: <f9KxenmdwQZ8oQrcH6ePy30goR2nOPYCULfkQnFaugR0mFpoL_THbqWwt4Jkt4FKpe03xXULUwRCSc8rUdIsnCSeG2X03iSpfNczBy-vks8=@gavinhoward.com>
References:  <20240910040544.125245ad@nuclight.lan> <wLzD36W8VSXSlBByVmK745ezszNVGM-hfWOobdrCb1vmx9snihwW_gBgeAlvtKWL55fnAZgr9G5ztIO3UjD3Wou3-YPxmLkMp9AFuGHwXsA=@gavinhoward.com> <20240910181711.5d324ac5@nuclight.lan> <f9KxenmdwQZ8oQrcH6ePy30goR2nOPYCULfkQnFaugR0mFpoL_THbqWwt4Jkt4FKpe03xXULUwRCSc8rUdIsnCSeG2X03iSpfNczBy-vks8=@gavinhoward.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 12 Sep 2024 06:29:38 +0000
"Gavin D. Howard" <gavin@gavinhoward.com> wrote:

> > > If I understand Turing-completeness correctly, the "no backward
> > > jumps but allow recursion and quit on stack overflow" is exactly
> > > equivalent to having non-infinite loops.  
> >
> > Sure, but then look at practical usefulness: suppose you have stack
> > of 32 frames (current limit of eBPF and BPF64). Then you can have
> > only 31 iterations on a loop, loosing ability to call more
> > functions from loop body.  
> 
> That is true.
> 
> However, I wonder if everyone is going at this the wrong way. More
> specifically, I wonder if we are targeting the entirely wrong level.
> Maybe verifying bytecode or some other low-level code form is just too
> hard. What if it were easier just to provide a higher level language
> that had enough restrictions to be just barely not Turing-complete.

The problem is, then you need compiler of this language in target
environment, e.g. kernel, because you cannot trust that compiler's
output was not modified in a circumventing way.
 
> To test that idea, I did a quick experiment.
> 
> I have been working on a language called Yao for the past three years.
> One feature I planned for the future was the ability to restrict what
> packages and keywords are available at compile time. I had already
> put in the plumbing, but I just hadn't implemented it.
> 
> But I decided to quickly implement it as an experiment.
> 
> First, I needed a stack limit:
> 
> https://git.yzena.com/Yzena/Yc/commit/99c822bf7b
> 
> Now, you can run this:
> 
> ```
> $ ./release/yc yao --max-stack-depth=32 <script>
> ```
> 
> and you get the same max stack depth as eBPF.
> 
> Next, restricting keywords. One unique thing about Yao is that *every*
> keyword is imported like you would import functions, even standard
> ones like `if` and `while`. This means that if a keyword is not
> allowed, I can just skip importing it, and it's like it doesn't even
> exist.
> 
> So I built a way to skip importing the `while` keyword and any other
> keyword that would make the language Turing-complete. Crucially,
> though, it still includes a `foreach` loop, so backwards jumps are
> still possible.

This reminds me of Tcl which has very little syntax, and everything in
"common syntax", like "if" and "while", is in standard library (and
thus defining new syntax constructs is possible to some extent).

However, Tcl, while attractive for task, suffers from problems:
1) "Everything is a string" is incompatible with high performance
2) like Lua, it is too general-pupose - not designed for such security
3) again like Lua and Rust, is maintained outside of *BSD src tree.

> A script with a `while` loop is `tools/rig_slam.yao` in the same repo.
> If you run:
> 
> ```
> $ make # This bootstraps the repo.
> $ ./release/yc yao tools/rig_slam.yao
> ```
> 
> you will get a panic when it tries to parse the `while` loop because I
> haven't implemented the parsing:
> 
> ```
> Panic: Unimplemented
>     Source:
> /home/gavin/Yzena/Code/yc-git/yc/src/yao/keywords.c:2310 Function:
> yao_parse_while() ```
> 
> But if you run this:
> 
> ```
> $ ./release/yc yao --max-stack-depth=32 --lang-mode=iterative \
>   tools/rig_slam.yao
> ```
> 
> you get a compile error, and it doesn't even try to parse the `while`:
> 
> ```
> yc: tools/rig_slam.yao[145:2]
>     Parse error: Incomplete variable declaration for name: while
> 
> yc: tools/rig_slam.yao[145:8]
>     Invalid token: Expected semicolon (';')
> ```
> 
> (You also get a different panic after that because Yao has bugs.)
> 
> Anyway, essentially, the "iterative" language mode with a max stack
> depth makes Yao non-Turing-complete, but with a way to loop.
> 
> You can try that with `tools/format.yao`, which has a `foreach` loop:
> 
> ```
> $ ./release/yc yao --max-stack-depth=32 --lang-mode=iterative \
>   tools/format.yao
> ```
> 
> and it still works.
> 
> I'm not saying Yao as an eBPF replacement is a good idea; it's not.
> But I think this demonstrates that a simple high-level language with a
> simple compiler and a simple VM might be a viable method of
> implementing an eBPF-like thing without a complex verifier.

Probably interesting ideas to borrow, but I've tried to lurk in your
repository for 28 minutes and did not find any tutorial, introduction
or commented example of program longer than helloworld or fizzbuzz. For
BPF64, I've put relatively long example in both C and commented
translation to new language - as with learning new human languages,
first something in language already known to reader.

> Yes, the compiler and VM would need to be in the kernel, but that
> might already be the case with the JIT in Linux.

No, they keep compiler outside of kernel. And JIT is relatively dumb to
translate prepared bytecode.

In fact, having only VM in kernel is consistent with using for
non-kernel environments, like network interface card hardware or
passing to remote server for remote procedure call - yes, these are
also as important targets as BSD kernel. And no hardware engineer in
sane mind will not design a CPU for a human-readable high-level
language, only bytecode VM - maximum that ever was in industry was for
still relatively low level languages, like Forth CPUs (even Lisp
machines did not run Lisp directly).

> Anyway, my point is that before FreeBSD accepts BPF64, it might be a
> good idea to sit down and consider all of the options, brainstorm
> ideas together, and make an excellent design.

So what are other options to consider&

> And it could very well be that BPF64 *is* the best design! I don't
> know; I'm not a kernel guy, nor am I an expert compiler guy.
> 
> > > * But the analysis pass(es) must still live in the kernel.
> > > * There would need to be tutorials or some docs on how to write
> > > whatever language so that backwards jumps are avoided.  
> >
> > So BPF64 took simplicity pass: while you have tutorials etc. it's
> > still very hard to write (non-toy) code that passes verifier. I
> > think a language where you do not need backward jumps but have
> > usable constructs (so you just can't write bad code), even BAW, is
> > a better way to go than try to train people to fight with
> > unnecessary Cerber.  
> 
> Yes, I think you are right on this, which IMO just points to a
> high-level language. Such a language can have whatever constructs are
> safe, but by its very nature, it will be restricted such that
> dangerous code is impossible.
> 
> It would also be easier to write code that passes the "verifier" if
> the verifier is actually the compiler.

This restricts to only one high-level language. Of course, it may look
like

INSERT INTO conntrack FROM current_packet(), time_now() + 60;
DELETE FROM conntrack WHERE expire < time_now();

but I'd better think about .NET-like to allow for different languages
better suited for different areas, not only network.

> judge BPF64. Plus, I can tell you put in a lot of work.

Thanks, you are first one who noticed this.

-- 
WBR, @nuclight



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20241011012353.59784a96>