Date: Thu, 13 Apr 2023 21:26:25 +0200 From: Steffen Nurpmeso <steffen@sdaoden.eu> To: David Chisnall <theraven@freebsd.org> Cc: Ed Maste <emaste@freebsd.org>, freebsd-hackers@freebsd.org Subject: Re: capsicum(4): .. and SIGTRAP causing syscall really is in siginfo_t.si_errno? Message-ID: <20230413192625.mUQ_T%steffen@sdaoden.eu> In-Reply-To: <E8774F9D-239E-45EF-AFCE-EDE48489B323@freebsd.org> References: <20230412203438.IcwD7%steffen@sdaoden.eu> <E8774F9D-239E-45EF-AFCE-EDE48489B323@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello. David Chisnall wrote in <E8774F9D-239E-45EF-AFCE-EDE48489B323@freebsd.org>: |I added the siginfo member that passes the system call number (si_syscal\ |l). The problem that it solves is the syscall system call. For normal \ |system calls, you can extract the system call number from the register \ |frame, since it will be in rax. Unfortunately, for the syscall system \ |call, this value is clobbered and you have no way of usefully recovering \ |it. I am too amd64 bound for sure. Well, to be honest, i had written a test-strace test target which runs the entire machinery (hmm..), and then generates the list of necessary system calls for the client and the server. It, however, includes all system calls, including all the ugly pre-sandbox things. I am in the lucky position not to run debuggers, as well as being too stupid to handle them, anyway (step and stepi, and break, that is all i know). But good that someone did that. (Having said that i wrote that test target after i already had started the seccomp(2) implementation, and the SIGSYS thing is a regular build target, that i use.) |You might want to take a look at the Verona Sandbox code for inspiration \ |(it works correctly without si_syscall for all system calls except \ |syscall): | |https://github.com/microsoft/verona-sandbox | |This was my project that required this functionality, since it needed \ |to intercept system calls and convert them to RPCs. It provides a simple \ |mechanism for loading a .so in an unprivileged child process and handlin= g \ |all system calls that touch a global namespace (open, bind, getaddrinfo) \ |via RPC into the parent, with some easy-to-use abstractions for filesyste= m \ |and network access. It works on Linux with seccomp-bpf and on FreeBSD \ |with Capsicum. The FreeBSD version was significantly easier to write \ |for a variety of reasons (Linux doesn=E2=80=99t support strongly aligned = alloc\ |ation in mmap, Linux can=E2=80=99t kill ld process when the parent proces= s \ |exits, only the parent thread, seccomp-bpf policies are amazingly fragile= \ |and require an entire library dependency to get right). This sounds like a very impressive project, especially compared to my little and primitive thing. BPF for seccomp(2) seems to be very different than what the new epbf is capable to do; I watched a LWN-linked presentation on what BPF can do "some years" ago, with live modification / tracing / inspection of the kernel etc. (But *i* dreamed of "a syscall bitset in front" (like capsicum seems to have), and then executable snippets to do the rest, including checks against real in-use descriptors, as opposed to only compile-time constants. Or complicated runtime program generation. And then, running a program for any systemcall is tough.) I think capsicum is likely the smartest thing and so nicely reflects the UNIX "everything is a file". But really, my setup for my simple client/server is tremendous(ly complicated). I see from looking that the FreeBSD kernel now supports realpathat(2), yet not for users ([main] as of 03-31). And this would be so really important to have! I mean, i can evaluate configuration in a/the "super-capable" base process, and then simply fork off a new server which then inherits the new configuration (after the old has been told to die), but that is a real mess. Also because, you know, so i opened a directory FD for / (the way i do it: do this, use realpath(3) on all paths, and then simply openat(2) "rootfd,&[1]" to not openat(2) an absolute path..), but this is only for the sandboxed process. So if someone would mount some filesystem over / (i presumed that is the reason why AT_FDCWD and plain open(2) and openat(2) with absolute paths are forbidden), then this will affect the "super-capable" process which reloads the configuration and from which the new sandboxed server instance is spawned. That does not make sense. I could open the / descriptor already in that process, on the other hand; hmm. But still ugly. So my thinking would be that there *must* be a realpathat(2) so that the capsicum(4)ized server can simply reload the configuration itself, while allowing the user full flexibility. (My current approach is rather identical to what OpenBSDs unveil(2) thing ends up with, ... yet relative to the opened / file descriptor, of course. Because .. what else could i do? So users have to use the _very same_ file names, or the thing fails. realpath(3) cannot be used. I need to implement some purely string-only path canonicalization to make this a bit better. Lesser files the user may use, but new ones not at all.) |I have a patch under review that adds a SIGCAP as an alternative to \ |SIGTRAP, which avoids painful interaction with the debugger. I=E2=80=99d = love \ |to get that merged before 14 but haven=E2=80=99t had time to address the = last \ |round of review comments. I=E2=80=99ve been running with it locally for a= year \ |or so. So good luck for get this going! --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230413192625.mUQ_T%steffen>