Date: Thu, 13 Apr 2023 08:47:14 +0100 From: David Chisnall <theraven@freebsd.org> To: Steffen Nurpmeso <steffen@sdaoden.eu> Cc: Ed Maste <emaste@freebsd.org>, freebsd-hackers@freebsd.org Subject: Re: capsicum(4): .. and SIGTRAP causing syscall really is in siginfo_t.si_errno? Message-ID: <E8774F9D-239E-45EF-AFCE-EDE48489B323@freebsd.org> In-Reply-To: <20230412203438.IcwD7%steffen@sdaoden.eu> References: <20230412203438.IcwD7%steffen@sdaoden.eu>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail-F52579F3-1153-45FC-AE27-EC214D603731 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, I added the siginfo member that passes the system call number (si_syscall). = The problem that it solves is the syscall system call. For normal system ca= lls, you can extract the system call number from the register frame, since i= t will be in rax. Unfortunately, for the syscall system call, this value is c= lobbered and you have no way of usefully recovering it. You might want to take a look at the Verona Sandbox code for inspiration (it= works correctly without si_syscall for all system calls except syscall): https://github.com/microsoft/verona-sandbox This was my project that required this functionality, since it needed to int= ercept system calls and convert them to RPCs. It provides a simple mechanism= for loading a .so in an unprivileged child process and handling all system= calls that touch a global namespace (open, bind, getaddrinfo) via RPC into t= he parent, with some easy-to-use abstractions for filesystem and network acc= ess. It works on Linux with seccomp-bpf and on FreeBSD with Capsicum. The Fre= eBSD version was significantly easier to write for a variety of reasons (Lin= ux doesn=E2=80=99t support strongly aligned allocation in mmap, Linux can=E2= =80=99t kill ld process when the parent process exits, only the parent threa= d, seccomp-bpf policies are amazingly fragile and require an entire library d= ependency to get right). I have a patch under review that adds a SIGCAP as an alternative to SIGTRAP,= which avoids painful interaction with the debugger. I=E2=80=99d love to get= that merged before 14 but haven=E2=80=99t had time to address the last roun= d of review comments. I=E2=80=99ve been running with it locally for a year o= r so. David > On 12 Apr 2023, at 21:35, Steffen Nurpmeso <steffen@sdaoden.eu> wrote: >=20 > =EF=BB=BFHello! >=20 > Ah, oh!! >=20 > Ed Maste wrote in > <CAPyFy2Do80xZmNFdtG=3DxbRuscKaQQM7rQ5ir5TVZENX3UfyKtg@mail.gmail.com>: > |On Wed, 12 Apr 2023 at 10:49, Steffen Nurpmeso <steffen@sdaoden.eu> wrote= : > |> I am trying to capsicumize a simple daemon (for learning purposes > |> as that runs only in the second line behind postfix), and i have > ... > |Excellent, always happy to see folks exploring Capsicum. > | > |Keep in mind that Capsicum and pledge/unvil are not equivalent, so > |comparing the ease of applying one or the other isn't particularly > |meaningful. Achieving similar security properties with pledge/unveil > |as with Capsicum requires similar effort in decomposing and > |refactoring existing applications. >=20 > Luckily not this simple thing. (With unveil together pledge seems > pretty good, despite the many system calls i get, and of course > the path fixation that does not allow users to add new paths when > they reload configurations .. the way the program is designed; > i like that new syslog system call which avoids all the things GNU > C lib for example does and potentially needs, later maybe more. > I think capsicum is very, very smart and capable, like CloudABI > was. Yet very hard to work with as it needs so many new *at(), > needs to have hooks to modify descriptors after accept(), and > openat(), etc. And needs user-path <> realpath(3) mappings .. the > way i do it.) >=20 > As i am very new with this -- am i correct assuming that once > a capability was set on a directory or listening socket, opened > / accepted FDs inherit the capability of "their parent"? >=20 > |> Anyhow. Regardless of 13.1-i386 or 12.2-amd64 (despite > |> no_new_privs) i only see > |> > |> capsicum(4) violation (syscall 93, 4, 5, 0); please report this bug > | > |I'm not sure what you mean in the subject with respect to the syscall > |in siginfo_t.si_errno. It looks like this is ENOTCAPABLE, which means >=20 > This is a misunderstanding!! I *thought* PROC_TRAPCAP_CTL_ENABLE > saying "the si_errno member of the siginfo signal handler > parameter is set to the syscall error value" meant the actual > "syscall number"! And since git head now has that > _capsicum._syscall member i thought *that* would now be an > explicit thing "to detangle that". > It really is an error number! > I did not even think about that! > So .. the actual syscall number is not available in that siginfo_t > before FreeBSD 14? I guess you guys simply write one of those > dtrace snippets to get over that. > (You know i did not even think, because the Linux seccomp(2) thing > i did like that, though there it is SIGSYS and the syscall is in > si_syscall. The capsicum(4) and rights(4) etc manuals are > complete, but for someone without any real foreknowledge they miss > some small hints, here and there. Not that Linux does that > better. Or OpenBSD, where you need at least one unveil with "some > meat" in order to apply it, even if you simply want no paths at > all. .. I think.) >=20 > |an attempt to perform an operation on an fd that you are not allowed > |to do - for example, calling write() on an fd which has had > |cap_rights_limit() applied without CAP_WRITE. errno 94 is ECAPMODE. >=20 > Ah yes! Not a thought on error values. >=20 > |This could be for example trying to use open() in capability mode, > |which is just not permitted (openat() is). >=20 > Yes. I have had real problems with that, or rather that FDCWD is > not possible. (And realpath did cause violations, in at least > 12.2, .. though yesterday evening the program was in terrible > state on FreeBSD.) >=20 > |> This takes the usual shortcut of only sandboxing the last input fil= e. > |> It's a first cut and this program will be easy to adapt to sandbox \= > |> all > |> files in the future > |> > |> from a December 2016 commit message, and i like the word "easy". > | > |cap_fileargs() didn't exist in December 2016 and there was not yet a > |straightforward, performant and desirable way to apply Capsicum to > |existing applications that operate on a list of files provided on the > |commandline. > | > |For a more recent change that makes use of cap_fileargs a good example > |commit is: > | > |commit 802c2095b5a6dcf0f63c473cbba1e40445e9052a > |Author: Mark Johnston <markj@FreeBSD.org> > |Date: Thu Aug 1 18:57:08 2019 +0000 > | > | Capsicumize readelf(1). > | > | Reviewed by: oshogbo > | Sponsored by: The FreeBSD Foundation > | Differential Revision: https://reviews.freebsd.org/D21108 >=20 > I had the impression that casper uses a supervising process. You > know, then i thought i better do it myself: this allows the Linux > seccomp(2) program for the clients and the server to be > streamlined; not only for this small one, where that bystanding > process only logs; ie, i simply sliced that into the server, and > the server then forks again so that logger actually can > synchronize on the server via SIGCLD (etc etc etc), and thus can > inherit file locks, naturally, etc etc. >=20 > --End of <CAPyFy2Do80xZmNFdtG=3DxbRuscKaQQM7rQ5ir5TVZENX3UfyKtg@mail.gmail= \ > .com> >=20 > Thank you. >=20 > --steffen > | > |Der Kragenbaer, The moon bear, > |der holt sich munter he cheerfully and one by one > |einen nach dem anderen runter wa.ks himself off > |(By Robert Gernhardt) >=20 --Apple-Mail-F52579F3-1153-45FC-AE27-EC214D603731 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D= utf-8"></head><body dir=3D"auto"><div dir=3D"ltr"></div><div dir=3D"ltr">Hi,= </div><div dir=3D"ltr"><br></div><div dir=3D"ltr">I added the siginfo member= that passes the system call number (si_syscall). The problem that it s= olves is the syscall system call. For normal system calls, you can extract t= he system call number from the register frame, since it will be in rax. Unfo= rtunately, for the syscall system call, this value is clobbered and you have= no way of usefully recovering it.</div><div dir=3D"ltr"><br></div><div dir=3D= "ltr">You might want to take a look at the Verona Sandbox code for inspirati= on (it works correctly without si_syscall for all system calls except syscal= l):</div><div dir=3D"ltr"><br></div><div dir=3D"ltr"><a href=3D"https://gith= ub.com/microsoft/verona-sandbox">https://github.com/microsoft/verona-sandbox= </a></div><div dir=3D"ltr"><br></div><div dir=3D"ltr">This was my project th= at required this functionality, since it needed to intercept system calls an= d convert them to RPCs. It provides a simple mechanism for loading a .so in &= nbsp;an unprivileged child process and handling all system calls that touch a= global namespace (open, bind, getaddrinfo) via RPC into the parent, with so= me easy-to-use abstractions for filesystem and network access. It works on L= inux with seccomp-bpf and on FreeBSD with Capsicum. The FreeBSD version was s= ignificantly easier to write for a variety of reasons (Linux doesn=E2=80=99t= support strongly aligned allocation in mmap, Linux can=E2=80=99t kill ld pr= ocess when the parent process exits, only the parent thread, seccomp-bpf pol= icies are amazingly fragile and require an entire library dependency to get r= ight).</div><div dir=3D"ltr"><br></div><div dir=3D"ltr">I have a patch under= review that adds a SIGCAP as an alternative to SIGTRAP, which avoids painfu= l interaction with the debugger. I=E2=80=99d love to get that merged before 1= 4 but haven=E2=80=99t had time to address the last round of review comments.= I=E2=80=99ve been running with it locally for a year or so.</div><div dir=3D= "ltr"><br></div><div dir=3D"ltr">David</div><div dir=3D"ltr"><br></div><div d= ir=3D"ltr"><br><div dir=3D"ltr"></div><blockquote type=3D"cite">On 12 Apr 20= 23, at 21:35, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:<br><br></bl= ockquote></div><blockquote type=3D"cite"><div dir=3D"ltr">=EF=BB=BF<span>Hel= lo!</span><br><span></span><br><span>Ah, oh!!</span><br><span></span><br><sp= an>Ed Maste wrote in</span><br><span> <CAPyFy2Do80xZmNFdtG=3DxbRuscKaQQM7= rQ5ir5TVZENX3UfyKtg@mail.gmail.com>:</span><br><span> |On Wed, 12 Apr 202= 3 at 10:49, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:</span><br><sp= an> |> I am trying to capsicumize a simple daemon (for learning purposes<= /span><br><span> |> as that runs only in the second line behind postfix),= and i have</span><br><span> ...</span><br><span> |Excellent, always happy t= o see folks exploring Capsicum.</span><br><span> |</span><br><span> |Keep in= mind that Capsicum and pledge/unvil are not equivalent, so</span><br><span>= |comparing the ease of applying one or the other isn't particularly</span><= br><span> |meaningful. Achieving similar security properties with pledge/unv= eil</span><br><span> |as with Capsicum requires similar effort in decomposin= g and</span><br><span> |refactoring existing applications.</span><br><span><= /span><br><span>Luckily not this simple thing. (With unveil together p= ledge seems</span><br><span>pretty good, despite the many system calls i get= , and of course</span><br><span>the path fixation that does not allow users t= o add new paths when</span><br><span>they reload configurations .. the way t= he program is designed;</span><br><span>i like that new syslog system call w= hich avoids all the things GNU</span><br><span>C lib for example does and po= tentially needs, later maybe more.</span><br><span>I think capsicum is very,= very smart and capable, like CloudABI</span><br><span>was. Yet very h= ard to work with as it needs so many new *at(),</span><br><span>needs to hav= e hooks to modify descriptors after accept(), and</span><br><span>openat(), e= tc. And needs user-path <> realpath(3) mappings .. the</span><br= ><span>way i do it.)</span><br><span></span><br><span>As i am very new with t= his -- am i correct assuming that once</span><br><span>a capability was set o= n a directory or listening socket, opened</span><br><span>/ accepted FDs inh= erit the capability of "their parent"?</span><br><span></span><br><span> |&g= t; Anyhow. Regardless of 13.1-i386 or 12.2-amd64 (despite</span><br><s= pan> |> no_new_privs) i only see</span><br><span> |></span><br><span> |= > capsicum(4) violation (syscall 93, 4, 5, 0); please report t= his bug</span><br><span> |</span><br><span> |I'm not sure what you mean in t= he subject with respect to the syscall</span><br><span> |in siginfo_t.si_err= no. It looks like this is ENOTCAPABLE, which means</span><br><span></span><b= r><span>This is a misunderstanding!! I *thought* PROC_TRAPCAP_CTL_ENAB= LE</span><br><span>saying "the si_errno member of the siginfo signal handler= </span><br><span>parameter is set to the syscall error value" meant the actu= al</span><br><span>"syscall number"! And since git head now has that</= span><br><span>_capsicum._syscall member i thought *that* would now be an</s= pan><br><span>explicit thing "to detangle that".</span><br><span>It really i= s an error number!</span><br><span>I did not even think about that!</span><b= r><span>So .. the actual syscall number is not available in that siginfo_t</= span><br><span>before FreeBSD 14? I guess you guys simply write one of= those</span><br><span>dtrace snippets to get over that.</span><br><span>(Yo= u know i did not even think, because the Linux seccomp(2) thing</span><br><s= pan>i did like that, though there it is SIGSYS and the syscall is in</span><= br><span>si_syscall. The capsicum(4) and rights(4) etc manuals are</sp= an><br><span>complete, but for someone without any real foreknowledge they m= iss</span><br><span>some small hints, here and there. Not that Linux d= oes that</span><br><span>better. Or OpenBSD, where you need at least o= ne unveil with "some</span><br><span>meat" in order to apply it, even if you= simply want no paths at</span><br><span>all. .. I think.)</span><br><= span></span><br><span> |an attempt to perform an operation on an fd that you= are not allowed</span><br><span> |to do - for example, calling write() on a= n fd which has had</span><br><span> |cap_rights_limit() applied without CAP_= WRITE. errno 94 is ECAPMODE.</span><br><span></span><br><span>Ah yes! = Not a thought on error values.</span><br><span></span><br><span> |This could= be for example trying to use open() in capability mode,</span><br><span> |w= hich is just not permitted (openat() is).</span><br><span></span><br><span>Y= es. I have had real problems with that, or rather that FDCWD is</span>= <br><span>not possible. (And realpath did cause violations, in at leas= t</span><br><span>12.2, .. though yesterday evening the program was in terri= ble</span><br><span>state on FreeBSD.)</span><br><span></span><br><span> |&g= t; This takes the usual shortcut of only sandboxing t= he last input file.</span><br><span> |> It's a fi= rst cut and this program will be easy to adapt to sandbox \</span><br><span>= |> all</span><br><span> |> = files in the future</span><br><span> |></span><br><span> |> from= a December 2016 commit message, and i like the word "easy".</span><br><span= > |</span><br><span> |cap_fileargs() didn't exist in December 2016 and there= was not yet a</span><br><span> |straightforward, performant and desirable w= ay to apply Capsicum to</span><br><span> |existing applications that operate= on a list of files provided on the</span><br><span> |commandline.</span><br= ><span> |</span><br><span> |For a more recent change that makes use of cap_f= ileargs a good example</span><br><span> |commit is:</span><br><span> |</span= ><br><span> |commit 802c2095b5a6dcf0f63c473cbba1e40445e9052a</span><br><span= > |Author: Mark Johnston <markj@FreeBSD.org></span><br><span> |Date: &= nbsp; Thu Aug 1 18:57:08 2019 +0000</span><br><span> |</span><br><span>= | Capsicumize readelf(1).</span><br><span> |</span><br><s= pan> | Reviewed by: oshogbo</span><br><s= pan> | Sponsored by: The FreeBSD Foundation</s= pan><br><span> | Differential Revision: https://revi= ews.freebsd.org/D21108</span><br><span></span><br><span>I had the impression= that casper uses a supervising process. You</span><br><span>know, the= n i thought i better do it myself: this allows the Linux</span><br><span>sec= comp(2) program for the clients and the server to be</span><br><span>streaml= ined; not only for this small one, where that bystanding</span><br><span>pro= cess only logs; ie, i simply sliced that into the server, and</span><br><spa= n>the server then forks again so that logger actually can</span><br><span>sy= nchronize on the server via SIGCLD (etc etc etc), and thus can</span><br><sp= an>inherit file locks, naturally, etc etc.</span><br><span></span><br><span>= --End of <CAPyFy2Do80xZmNFdtG=3DxbRuscKaQQM7rQ5ir5TVZENX3UfyKtg@mail.gma= il\</span><br><span> .com></span><br><span></span><br><span>Thank you.</s= pan><br><span></span><br><span>--steffen</span><br><span>|</span><br><span>|= Der Kragenbaer, = The moon bear,</span><br><span>|der holt sich m= unter he cheerfu= lly and one by one</span><br><span>|einen nach dem anderen runter wa.k= s himself off</span><br><span>|(By Robert Gernhardt)</span><br><span></span>= <br></div></blockquote></body></html>= --Apple-Mail-F52579F3-1153-45FC-AE27-EC214D603731--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E8774F9D-239E-45EF-AFCE-EDE48489B323>