Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Apr 2023 08:47:14 +0100
From:      David Chisnall <theraven@freebsd.org>
To:        Steffen Nurpmeso <steffen@sdaoden.eu>
Cc:        Ed Maste <emaste@freebsd.org>, freebsd-hackers@freebsd.org
Subject:   Re: capsicum(4): .. and SIGTRAP causing syscall really is in siginfo_t.si_errno?
Message-ID:  <E8774F9D-239E-45EF-AFCE-EDE48489B323@freebsd.org>
In-Reply-To: <20230412203438.IcwD7%steffen@sdaoden.eu>
References:  <20230412203438.IcwD7%steffen@sdaoden.eu>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail-F52579F3-1153-45FC-AE27-EC214D603731
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi,

I added the siginfo member that passes the system call number (si_syscall). =
 The problem that it solves is the syscall system call. For normal system ca=
lls, you can extract the system call number from the register frame, since i=
t will be in rax. Unfortunately, for the syscall system call, this value is c=
lobbered and you have no way of usefully recovering it.

You might want to take a look at the Verona Sandbox code for inspiration (it=
 works correctly without si_syscall for all system calls except syscall):

https://github.com/microsoft/verona-sandbox

This was my project that required this functionality, since it needed to int=
ercept system calls and convert them to RPCs. It provides a simple mechanism=
 for loading a .so in  an unprivileged child process and handling all system=
 calls that touch a global namespace (open, bind, getaddrinfo) via RPC into t=
he parent, with some easy-to-use abstractions for filesystem and network acc=
ess. It works on Linux with seccomp-bpf and on FreeBSD with Capsicum. The Fre=
eBSD version was significantly easier to write for a variety of reasons (Lin=
ux doesn=E2=80=99t support strongly aligned allocation in mmap, Linux can=E2=
=80=99t kill ld process when the parent process exits, only the parent threa=
d, seccomp-bpf policies are amazingly fragile and require an entire library d=
ependency to get right).

I have a patch under review that adds a SIGCAP as an alternative to SIGTRAP,=
 which avoids painful interaction with the debugger. I=E2=80=99d love to get=
 that merged before 14 but haven=E2=80=99t had time to address the last roun=
d of review comments. I=E2=80=99ve been running with it locally for a year o=
r so.

David


> On 12 Apr 2023, at 21:35, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
>=20
> =EF=BB=BFHello!
>=20
> Ah, oh!!
>=20
> Ed Maste wrote in
> <CAPyFy2Do80xZmNFdtG=3DxbRuscKaQQM7rQ5ir5TVZENX3UfyKtg@mail.gmail.com>:
> |On Wed, 12 Apr 2023 at 10:49, Steffen Nurpmeso <steffen@sdaoden.eu> wrote=
:
> |> I am trying to capsicumize a simple daemon (for learning purposes
> |> as that runs only in the second line behind postfix), and i have
> ...
> |Excellent, always happy to see folks exploring Capsicum.
> |
> |Keep in mind that Capsicum and pledge/unvil are not equivalent, so
> |comparing the ease of applying one or the other isn't particularly
> |meaningful. Achieving similar security properties with pledge/unveil
> |as with Capsicum requires similar effort in decomposing and
> |refactoring existing applications.
>=20
> Luckily not this simple thing.  (With unveil together pledge seems
> pretty good, despite the many system calls i get, and of course
> the path fixation that does not allow users to add new paths when
> they reload configurations .. the way the program is designed;
> i like that new syslog system call which avoids all the things GNU
> C lib for example does and potentially needs, later maybe more.
> I think capsicum is very, very smart and capable, like CloudABI
> was.  Yet very hard to work with as it needs so many new *at(),
> needs to have hooks to modify descriptors after accept(), and
> openat(), etc.  And needs user-path <> realpath(3) mappings .. the
> way i do it.)
>=20
> As i am very new with this -- am i correct assuming that once
> a capability was set on a directory or listening socket, opened
> / accepted FDs inherit the capability of "their parent"?
>=20
> |> Anyhow.  Regardless of 13.1-i386 or 12.2-amd64 (despite
> |> no_new_privs) i only see
> |>
> |>   capsicum(4) violation (syscall 93, 4, 5, 0); please report this bug
> |
> |I'm not sure what you mean in the subject with respect to the syscall
> |in siginfo_t.si_errno. It looks like this is ENOTCAPABLE, which means
>=20
> This is a misunderstanding!!  I *thought* PROC_TRAPCAP_CTL_ENABLE
> saying "the si_errno member of the siginfo signal handler
> parameter is set to the syscall error value" meant the actual
> "syscall number"!  And since git head now has that
> _capsicum._syscall member i thought *that* would now be an
> explicit thing "to detangle that".
> It really is an error number!
> I did not even think about that!
> So .. the actual syscall number is not available in that siginfo_t
> before FreeBSD 14?  I guess you guys simply write one of those
> dtrace snippets to get over that.
> (You know i did not even think, because the Linux seccomp(2) thing
> i did like that, though there it is SIGSYS and the syscall is in
> si_syscall.  The capsicum(4) and rights(4) etc manuals are
> complete, but for someone without any real foreknowledge they miss
> some small hints, here and there.  Not that Linux does that
> better.  Or OpenBSD, where you need at least one unveil with "some
> meat" in order to apply it, even if you simply want no paths at
> all.  .. I think.)
>=20
> |an attempt to perform an operation on an fd that you are not allowed
> |to do - for example, calling write() on an fd which has had
> |cap_rights_limit() applied without CAP_WRITE. errno 94 is ECAPMODE.
>=20
> Ah yes!  Not a thought on error values.
>=20
> |This could be for example trying to use open() in capability mode,
> |which is just not permitted (openat() is).
>=20
> Yes.  I have had real problems with that, or rather that FDCWD is
> not possible.  (And realpath did cause violations, in at least
> 12.2, .. though yesterday evening the program was in terrible
> state on FreeBSD.)
>=20
> |>     This takes the usual shortcut of only sandboxing the last input fil=
e.
> |>     It's a first cut and this program will be easy to adapt to sandbox \=

> |>     all
> |>     files in the future
> |>
> |> from a December 2016 commit message, and i like the word "easy".
> |
> |cap_fileargs() didn't exist in December 2016 and there was not yet a
> |straightforward, performant and desirable way to apply Capsicum to
> |existing applications that operate on a list of files provided on the
> |commandline.
> |
> |For a more recent change that makes use of cap_fileargs a good example
> |commit is:
> |
> |commit 802c2095b5a6dcf0f63c473cbba1e40445e9052a
> |Author: Mark Johnston <markj@FreeBSD.org>
> |Date:   Thu Aug 1 18:57:08 2019 +0000
> |
> |    Capsicumize readelf(1).
> |
> |    Reviewed by:    oshogbo
> |    Sponsored by:   The FreeBSD Foundation
> |    Differential Revision:  https://reviews.freebsd.org/D21108
>=20
> I had the impression that casper uses a supervising process.  You
> know, then i thought i better do it myself: this allows the Linux
> seccomp(2) program for the clients and the server to be
> streamlined; not only for this small one, where that bystanding
> process only logs; ie, i simply sliced that into the server, and
> the server then forks again so that logger actually can
> synchronize on the server via SIGCLD (etc etc etc), and thus can
> inherit file locks, naturally, etc etc.
>=20
> --End of <CAPyFy2Do80xZmNFdtG=3DxbRuscKaQQM7rQ5ir5TVZENX3UfyKtg@mail.gmail=
\
> .com>
>=20
> Thank you.
>=20
> --steffen
> |
> |Der Kragenbaer,                The moon bear,
> |der holt sich munter           he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)
>=20

--Apple-Mail-F52579F3-1153-45FC-AE27-EC214D603731
Content-Type: text/html;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D=
utf-8"></head><body dir=3D"auto"><div dir=3D"ltr"></div><div dir=3D"ltr">Hi,=
</div><div dir=3D"ltr"><br></div><div dir=3D"ltr">I added the siginfo member=
 that passes the system call number (si_syscall). &nbsp;The problem that it s=
olves is the syscall system call. For normal system calls, you can extract t=
he system call number from the register frame, since it will be in rax. Unfo=
rtunately, for the syscall system call, this value is clobbered and you have=
 no way of usefully recovering it.</div><div dir=3D"ltr"><br></div><div dir=3D=
"ltr">You might want to take a look at the Verona Sandbox code for inspirati=
on (it works correctly without si_syscall for all system calls except syscal=
l):</div><div dir=3D"ltr"><br></div><div dir=3D"ltr"><a href=3D"https://gith=
ub.com/microsoft/verona-sandbox">https://github.com/microsoft/verona-sandbox=
</a></div><div dir=3D"ltr"><br></div><div dir=3D"ltr">This was my project th=
at required this functionality, since it needed to intercept system calls an=
d convert them to RPCs. It provides a simple mechanism for loading a .so in &=
nbsp;an unprivileged child process and handling all system calls that touch a=
 global namespace (open, bind, getaddrinfo) via RPC into the parent, with so=
me easy-to-use abstractions for filesystem and network access. It works on L=
inux with seccomp-bpf and on FreeBSD with Capsicum. The FreeBSD version was s=
ignificantly easier to write for a variety of reasons (Linux doesn=E2=80=99t=
 support strongly aligned allocation in mmap, Linux can=E2=80=99t kill ld pr=
ocess when the parent process exits, only the parent thread, seccomp-bpf pol=
icies are amazingly fragile and require an entire library dependency to get r=
ight).</div><div dir=3D"ltr"><br></div><div dir=3D"ltr">I have a patch under=
 review that adds a SIGCAP as an alternative to SIGTRAP, which avoids painfu=
l interaction with the debugger. I=E2=80=99d love to get that merged before 1=
4 but haven=E2=80=99t had time to address the last round of review comments.=
 I=E2=80=99ve been running with it locally for a year or so.</div><div dir=3D=
"ltr"><br></div><div dir=3D"ltr">David</div><div dir=3D"ltr"><br></div><div d=
ir=3D"ltr"><br><div dir=3D"ltr"></div><blockquote type=3D"cite">On 12 Apr 20=
23, at 21:35, Steffen Nurpmeso &lt;steffen@sdaoden.eu&gt; wrote:<br><br></bl=
ockquote></div><blockquote type=3D"cite"><div dir=3D"ltr">=EF=BB=BF<span>Hel=
lo!</span><br><span></span><br><span>Ah, oh!!</span><br><span></span><br><sp=
an>Ed Maste wrote in</span><br><span> &lt;CAPyFy2Do80xZmNFdtG=3DxbRuscKaQQM7=
rQ5ir5TVZENX3UfyKtg@mail.gmail.com&gt;:</span><br><span> |On Wed, 12 Apr 202=
3 at 10:49, Steffen Nurpmeso &lt;steffen@sdaoden.eu&gt; wrote:</span><br><sp=
an> |&gt; I am trying to capsicumize a simple daemon (for learning purposes<=
/span><br><span> |&gt; as that runs only in the second line behind postfix),=
 and i have</span><br><span> ...</span><br><span> |Excellent, always happy t=
o see folks exploring Capsicum.</span><br><span> |</span><br><span> |Keep in=
 mind that Capsicum and pledge/unvil are not equivalent, so</span><br><span>=
 |comparing the ease of applying one or the other isn't particularly</span><=
br><span> |meaningful. Achieving similar security properties with pledge/unv=
eil</span><br><span> |as with Capsicum requires similar effort in decomposin=
g and</span><br><span> |refactoring existing applications.</span><br><span><=
/span><br><span>Luckily not this simple thing. &nbsp;(With unveil together p=
ledge seems</span><br><span>pretty good, despite the many system calls i get=
, and of course</span><br><span>the path fixation that does not allow users t=
o add new paths when</span><br><span>they reload configurations .. the way t=
he program is designed;</span><br><span>i like that new syslog system call w=
hich avoids all the things GNU</span><br><span>C lib for example does and po=
tentially needs, later maybe more.</span><br><span>I think capsicum is very,=
 very smart and capable, like CloudABI</span><br><span>was. &nbsp;Yet very h=
ard to work with as it needs so many new *at(),</span><br><span>needs to hav=
e hooks to modify descriptors after accept(), and</span><br><span>openat(), e=
tc. &nbsp;And needs user-path &lt;&gt; realpath(3) mappings .. the</span><br=
><span>way i do it.)</span><br><span></span><br><span>As i am very new with t=
his -- am i correct assuming that once</span><br><span>a capability was set o=
n a directory or listening socket, opened</span><br><span>/ accepted FDs inh=
erit the capability of "their parent"?</span><br><span></span><br><span> |&g=
t; Anyhow. &nbsp;Regardless of 13.1-i386 or 12.2-amd64 (despite</span><br><s=
pan> |&gt; no_new_privs) i only see</span><br><span> |&gt;</span><br><span> |=
&gt; &nbsp;&nbsp;capsicum(4) violation (syscall 93, 4, 5, 0); please report t=
his bug</span><br><span> |</span><br><span> |I'm not sure what you mean in t=
he subject with respect to the syscall</span><br><span> |in siginfo_t.si_err=
no. It looks like this is ENOTCAPABLE, which means</span><br><span></span><b=
r><span>This is a misunderstanding!! &nbsp;I *thought* PROC_TRAPCAP_CTL_ENAB=
LE</span><br><span>saying "the si_errno member of the siginfo signal handler=
</span><br><span>parameter is set to the syscall error value" meant the actu=
al</span><br><span>"syscall number"! &nbsp;And since git head now has that</=
span><br><span>_capsicum._syscall member i thought *that* would now be an</s=
pan><br><span>explicit thing "to detangle that".</span><br><span>It really i=
s an error number!</span><br><span>I did not even think about that!</span><b=
r><span>So .. the actual syscall number is not available in that siginfo_t</=
span><br><span>before FreeBSD 14? &nbsp;I guess you guys simply write one of=
 those</span><br><span>dtrace snippets to get over that.</span><br><span>(Yo=
u know i did not even think, because the Linux seccomp(2) thing</span><br><s=
pan>i did like that, though there it is SIGSYS and the syscall is in</span><=
br><span>si_syscall. &nbsp;The capsicum(4) and rights(4) etc manuals are</sp=
an><br><span>complete, but for someone without any real foreknowledge they m=
iss</span><br><span>some small hints, here and there. &nbsp;Not that Linux d=
oes that</span><br><span>better. &nbsp;Or OpenBSD, where you need at least o=
ne unveil with "some</span><br><span>meat" in order to apply it, even if you=
 simply want no paths at</span><br><span>all. &nbsp;.. I think.)</span><br><=
span></span><br><span> |an attempt to perform an operation on an fd that you=
 are not allowed</span><br><span> |to do - for example, calling write() on a=
n fd which has had</span><br><span> |cap_rights_limit() applied without CAP_=
WRITE. errno 94 is ECAPMODE.</span><br><span></span><br><span>Ah yes! &nbsp;=
Not a thought on error values.</span><br><span></span><br><span> |This could=
 be for example trying to use open() in capability mode,</span><br><span> |w=
hich is just not permitted (openat() is).</span><br><span></span><br><span>Y=
es. &nbsp;I have had real problems with that, or rather that FDCWD is</span>=
<br><span>not possible. &nbsp;(And realpath did cause violations, in at leas=
t</span><br><span>12.2, .. though yesterday evening the program was in terri=
ble</span><br><span>state on FreeBSD.)</span><br><span></span><br><span> |&g=
t; &nbsp;&nbsp;&nbsp;&nbsp;This takes the usual shortcut of only sandboxing t=
he last input file.</span><br><span> |&gt; &nbsp;&nbsp;&nbsp;&nbsp;It's a fi=
rst cut and this program will be easy to adapt to sandbox \</span><br><span>=
 |&gt; &nbsp;&nbsp;&nbsp;&nbsp;all</span><br><span> |&gt; &nbsp;&nbsp;&nbsp;=
&nbsp;files in the future</span><br><span> |&gt;</span><br><span> |&gt; from=
 a December 2016 commit message, and i like the word "easy".</span><br><span=
> |</span><br><span> |cap_fileargs() didn't exist in December 2016 and there=
 was not yet a</span><br><span> |straightforward, performant and desirable w=
ay to apply Capsicum to</span><br><span> |existing applications that operate=
 on a list of files provided on the</span><br><span> |commandline.</span><br=
><span> |</span><br><span> |For a more recent change that makes use of cap_f=
ileargs a good example</span><br><span> |commit is:</span><br><span> |</span=
><br><span> |commit 802c2095b5a6dcf0f63c473cbba1e40445e9052a</span><br><span=
> |Author: Mark Johnston &lt;markj@FreeBSD.org&gt;</span><br><span> |Date: &=
nbsp;&nbsp;Thu Aug 1 18:57:08 2019 +0000</span><br><span> |</span><br><span>=
 | &nbsp;&nbsp;&nbsp;Capsicumize readelf(1).</span><br><span> |</span><br><s=
pan> | &nbsp;&nbsp;&nbsp;Reviewed by: &nbsp;&nbsp;&nbsp;oshogbo</span><br><s=
pan> | &nbsp;&nbsp;&nbsp;Sponsored by: &nbsp;&nbsp;The FreeBSD Foundation</s=
pan><br><span> | &nbsp;&nbsp;&nbsp;Differential Revision: &nbsp;https://revi=
ews.freebsd.org/D21108</span><br><span></span><br><span>I had the impression=
 that casper uses a supervising process. &nbsp;You</span><br><span>know, the=
n i thought i better do it myself: this allows the Linux</span><br><span>sec=
comp(2) program for the clients and the server to be</span><br><span>streaml=
ined; not only for this small one, where that bystanding</span><br><span>pro=
cess only logs; ie, i simply sliced that into the server, and</span><br><spa=
n>the server then forks again so that logger actually can</span><br><span>sy=
nchronize on the server via SIGCLD (etc etc etc), and thus can</span><br><sp=
an>inherit file locks, naturally, etc etc.</span><br><span></span><br><span>=
 --End of &lt;CAPyFy2Do80xZmNFdtG=3DxbRuscKaQQM7rQ5ir5TVZENX3UfyKtg@mail.gma=
il\</span><br><span> .com&gt;</span><br><span></span><br><span>Thank you.</s=
pan><br><span></span><br><span>--steffen</span><br><span>|</span><br><span>|=
Der Kragenbaer, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The moon bear,</span><br><span>|der holt sich m=
unter &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;he cheerfu=
lly and one by one</span><br><span>|einen nach dem anderen runter &nbsp;wa.k=
s himself off</span><br><span>|(By Robert Gernhardt)</span><br><span></span>=
<br></div></blockquote></body></html>=

--Apple-Mail-F52579F3-1153-45FC-AE27-EC214D603731--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E8774F9D-239E-45EF-AFCE-EDE48489B323>