From nobody Thu Apr 13 07:47:14 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Pxs9p71GCz45gkT for ; Thu, 13 Apr 2023 07:47:26 +0000 (UTC) (envelope-from theraven@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Pxs9p6SW5z3GZj; Thu, 13 Apr 2023 07:47:26 +0000 (UTC) (envelope-from theraven@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1681372046; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eL8RwBeCb3i2LS/4U6QTX+KlS163s4l5DCyxGUeAJCc=; b=tkx8+9NTz/Zd6EJmkzHBHvq/OCR0siRrGN2e8F00T0zWY9Ef5T4kQsPcPbpxPbTxzAbgY3 UIQJfLFETvwdyjzq8dufNpX1lP8c96c2QLXkNh4xveOPCsxp0pi+3FwpAZy+msECstAfK5 jF0RniGtMhyu/Fho3sseYRnYnkeGoi6lj0eQgp0yUok8UPqMZxrtKngtik1QcSp5FA8WGn 4QGuyo3VYdXycxST0A62YSKSqfThaZI/xneUJBBVLPIdnb6FsbFFFe47eangeB7hUEbHCr XaSFEM7u+7wCosPsyn16rP6Pws/Di6x/V4U0nyGMGrMrBon2mnHqL/rePCZ8iQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1681372046; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eL8RwBeCb3i2LS/4U6QTX+KlS163s4l5DCyxGUeAJCc=; b=kUiMwmC4duFW0wjQMKXgo35AHJrrP5bWiDFYjReSGqu5X1hKSZNvGgEiUXsNYQCiUc/3GM 4KObfE2E2XB5eisd4VrdBXgL3SjAOjxgqrNjHp12uYSNtHsO/2dmuumpUoT6BAfmaoq24I t9/wO5C+nHFZgRKICjBzzVW8Cu7V57gg0L+p9s7cX+BfgDACwBVhFdUA28n5dG0yiw9plW K30ohfbcr3GIkVFTXCpPdf4Gv23VPD2wEIS/oFs9q6kuP+qy5Y0NEhC68E08ezVDkPoYwV YcNa2zIf8HXl63xAEdt94hmms7vE7QaLDLLEaFJLj9fqT3G5uBNA36oOh5Owog== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1681372046; a=rsa-sha256; cv=none; b=HScFzt5hLbqt1YPK4mwGqDVV6L3r/5i1vW/vBFHWWJPPQc04tZuFFspvwF8v138QDl/JWl 1/CJnB0Q9zmnbsQ6bmIS5oJ6i4BLfAxzniZI5GUPImE+M7t3wF66pZsB6ZX4APyZXqDoSD 2cjmu+FVLBjHZV0vkbgYvE1WzmohaWmfRARNeRdNI68anYbcbCat9XSK4xobyJqCHK2Z30 6ke3Y78i0dPuZZgdJaGRWN1dni6NCdRLlM6C5j33oY1s0OKnFC699UHPdZ77fPsXaogdXn GAUtqr6UmrGnL1sFnUGpMqQfyTSfCiUQn4uFcPoRZZWdUMbpjD79/R6MH5lyGg== Received: from smtp.theravensnest.org (smtp.theravensnest.org [45.77.103.195]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: theraven) by smtp.freebsd.org (Postfix) with ESMTPSA id 4Pxs9p5NvQzZ5L; Thu, 13 Apr 2023 07:47:26 +0000 (UTC) (envelope-from theraven@freebsd.org) Received: from smtpclient.apple (host86-162-255-125.range86-162.btcentralplus.com [86.162.255.125]) by smtp.theravensnest.org (Postfix) with ESMTPSA id 1DF2110C28; Thu, 13 Apr 2023 08:47:26 +0100 (BST) Content-Type: multipart/alternative; boundary=Apple-Mail-F52579F3-1153-45FC-AE27-EC214D603731 Content-Transfer-Encoding: 7bit From: David Chisnall List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org Mime-Version: 1.0 (1.0) Subject: Re: capsicum(4): .. and SIGTRAP causing syscall really is in siginfo_t.si_errno? Date: Thu, 13 Apr 2023 08:47:14 +0100 Message-Id: References: <20230412203438.IcwD7%steffen@sdaoden.eu> Cc: Ed Maste , freebsd-hackers@freebsd.org In-Reply-To: <20230412203438.IcwD7%steffen@sdaoden.eu> To: Steffen Nurpmeso X-Mailer: iPad Mail (20E252) X-ThisMailContainsUnwantedMimeParts: N --Apple-Mail-F52579F3-1153-45FC-AE27-EC214D603731 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, I added the siginfo member that passes the system call number (si_syscall). = The problem that it solves is the syscall system call. For normal system ca= lls, you can extract the system call number from the register frame, since i= t will be in rax. Unfortunately, for the syscall system call, this value is c= lobbered and you have no way of usefully recovering it. You might want to take a look at the Verona Sandbox code for inspiration (it= works correctly without si_syscall for all system calls except syscall): https://github.com/microsoft/verona-sandbox This was my project that required this functionality, since it needed to int= ercept system calls and convert them to RPCs. It provides a simple mechanism= for loading a .so in an unprivileged child process and handling all system= calls that touch a global namespace (open, bind, getaddrinfo) via RPC into t= he parent, with some easy-to-use abstractions for filesystem and network acc= ess. It works on Linux with seccomp-bpf and on FreeBSD with Capsicum. The Fre= eBSD version was significantly easier to write for a variety of reasons (Lin= ux doesn=E2=80=99t support strongly aligned allocation in mmap, Linux can=E2= =80=99t kill ld process when the parent process exits, only the parent threa= d, seccomp-bpf policies are amazingly fragile and require an entire library d= ependency to get right). I have a patch under review that adds a SIGCAP as an alternative to SIGTRAP,= which avoids painful interaction with the debugger. I=E2=80=99d love to get= that merged before 14 but haven=E2=80=99t had time to address the last roun= d of review comments. I=E2=80=99ve been running with it locally for a year o= r so. David > On 12 Apr 2023, at 21:35, Steffen Nurpmeso wrote: >=20 > =EF=BB=BFHello! >=20 > Ah, oh!! >=20 > Ed Maste wrote in > : > |On Wed, 12 Apr 2023 at 10:49, Steffen Nurpmeso wrote= : > |> I am trying to capsicumize a simple daemon (for learning purposes > |> as that runs only in the second line behind postfix), and i have > ... > |Excellent, always happy to see folks exploring Capsicum. > | > |Keep in mind that Capsicum and pledge/unvil are not equivalent, so > |comparing the ease of applying one or the other isn't particularly > |meaningful. Achieving similar security properties with pledge/unveil > |as with Capsicum requires similar effort in decomposing and > |refactoring existing applications. >=20 > Luckily not this simple thing. (With unveil together pledge seems > pretty good, despite the many system calls i get, and of course > the path fixation that does not allow users to add new paths when > they reload configurations .. the way the program is designed; > i like that new syslog system call which avoids all the things GNU > C lib for example does and potentially needs, later maybe more. > I think capsicum is very, very smart and capable, like CloudABI > was. Yet very hard to work with as it needs so many new *at(), > needs to have hooks to modify descriptors after accept(), and > openat(), etc. And needs user-path <> realpath(3) mappings .. the > way i do it.) >=20 > As i am very new with this -- am i correct assuming that once > a capability was set on a directory or listening socket, opened > / accepted FDs inherit the capability of "their parent"? >=20 > |> Anyhow. Regardless of 13.1-i386 or 12.2-amd64 (despite > |> no_new_privs) i only see > |> > |> capsicum(4) violation (syscall 93, 4, 5, 0); please report this bug > | > |I'm not sure what you mean in the subject with respect to the syscall > |in siginfo_t.si_errno. It looks like this is ENOTCAPABLE, which means >=20 > This is a misunderstanding!! I *thought* PROC_TRAPCAP_CTL_ENABLE > saying "the si_errno member of the siginfo signal handler > parameter is set to the syscall error value" meant the actual > "syscall number"! And since git head now has that > _capsicum._syscall member i thought *that* would now be an > explicit thing "to detangle that". > It really is an error number! > I did not even think about that! > So .. the actual syscall number is not available in that siginfo_t > before FreeBSD 14? I guess you guys simply write one of those > dtrace snippets to get over that. > (You know i did not even think, because the Linux seccomp(2) thing > i did like that, though there it is SIGSYS and the syscall is in > si_syscall. The capsicum(4) and rights(4) etc manuals are > complete, but for someone without any real foreknowledge they miss > some small hints, here and there. Not that Linux does that > better. Or OpenBSD, where you need at least one unveil with "some > meat" in order to apply it, even if you simply want no paths at > all. .. I think.) >=20 > |an attempt to perform an operation on an fd that you are not allowed > |to do - for example, calling write() on an fd which has had > |cap_rights_limit() applied without CAP_WRITE. errno 94 is ECAPMODE. >=20 > Ah yes! Not a thought on error values. >=20 > |This could be for example trying to use open() in capability mode, > |which is just not permitted (openat() is). >=20 > Yes. I have had real problems with that, or rather that FDCWD is > not possible. (And realpath did cause violations, in at least > 12.2, .. though yesterday evening the program was in terrible > state on FreeBSD.) >=20 > |> This takes the usual shortcut of only sandboxing the last input fil= e. > |> It's a first cut and this program will be easy to adapt to sandbox \= > |> all > |> files in the future > |> > |> from a December 2016 commit message, and i like the word "easy". > | > |cap_fileargs() didn't exist in December 2016 and there was not yet a > |straightforward, performant and desirable way to apply Capsicum to > |existing applications that operate on a list of files provided on the > |commandline. > | > |For a more recent change that makes use of cap_fileargs a good example > |commit is: > | > |commit 802c2095b5a6dcf0f63c473cbba1e40445e9052a > |Author: Mark Johnston > |Date: Thu Aug 1 18:57:08 2019 +0000 > | > | Capsicumize readelf(1). > | > | Reviewed by: oshogbo > | Sponsored by: The FreeBSD Foundation > | Differential Revision: https://reviews.freebsd.org/D21108 >=20 > I had the impression that casper uses a supervising process. You > know, then i thought i better do it myself: this allows the Linux > seccomp(2) program for the clients and the server to be > streamlined; not only for this small one, where that bystanding > process only logs; ie, i simply sliced that into the server, and > the server then forks again so that logger actually can > synchronize on the server via SIGCLD (etc etc etc), and thus can > inherit file locks, naturally, etc etc. >=20 > --End of .com> >=20 > Thank you. >=20 > --steffen > | > |Der Kragenbaer, The moon bear, > |der holt sich munter he cheerfully and one by one > |einen nach dem anderen runter wa.ks himself off > |(By Robert Gernhardt) >=20 --Apple-Mail-F52579F3-1153-45FC-AE27-EC214D603731 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Hi,=

I added the siginfo member= that passes the system call number (si_syscall).  The problem that it s= olves is the syscall system call. For normal system calls, you can extract t= he system call number from the register frame, since it will be in rax. Unfo= rtunately, for the syscall system call, this value is clobbered and you have= no way of usefully recovering it.

You might want to take a look at the Verona Sandbox code for inspirati= on (it works correctly without si_syscall for all system calls except syscal= l):


This was my project th= at required this functionality, since it needed to intercept system calls an= d convert them to RPCs. It provides a simple mechanism for loading a .so in &= nbsp;an unprivileged child process and handling all system calls that touch a= global namespace (open, bind, getaddrinfo) via RPC into the parent, with so= me easy-to-use abstractions for filesystem and network access. It works on L= inux with seccomp-bpf and on FreeBSD with Capsicum. The FreeBSD version was s= ignificantly easier to write for a variety of reasons (Linux doesn=E2=80=99t= support strongly aligned allocation in mmap, Linux can=E2=80=99t kill ld pr= ocess when the parent process exits, only the parent thread, seccomp-bpf pol= icies are amazingly fragile and require an entire library dependency to get r= ight).

I have a patch under= review that adds a SIGCAP as an alternative to SIGTRAP, which avoids painfu= l interaction with the debugger. I=E2=80=99d love to get that merged before 1= 4 but haven=E2=80=99t had time to address the last round of review comments.= I=E2=80=99ve been running with it locally for a year or so.

David


On 12 Apr 20= 23, at 21:35, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:

=EF=BB=BFHel= lo!

Ah, oh!!

Ed Maste wrote in
<CAPyFy2Do80xZmNFdtG=3DxbRuscKaQQM7= rQ5ir5TVZENX3UfyKtg@mail.gmail.com>:
|On Wed, 12 Apr 202= 3 at 10:49, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
|> I am trying to capsicumize a simple daemon (for learning purposes<= /span>
|> as that runs only in the second line behind postfix),= and i have
...
|Excellent, always happy t= o see folks exploring Capsicum.
|
|Keep in= mind that Capsicum and pledge/unvil are not equivalent, so
= |comparing the ease of applying one or the other isn't particularly<= br> |meaningful. Achieving similar security properties with pledge/unv= eil
|as with Capsicum requires similar effort in decomposin= g and
|refactoring existing applications.
<= /span>
Luckily not this simple thing.  (With unveil together p= ledge seems
pretty good, despite the many system calls i get= , and of course
the path fixation that does not allow users t= o add new paths when
they reload configurations .. the way t= he program is designed;
i like that new syslog system call w= hich avoids all the things GNU
C lib for example does and po= tentially needs, later maybe more.
I think capsicum is very,= very smart and capable, like CloudABI
was.  Yet very h= ard to work with as it needs so many new *at(),
needs to hav= e hooks to modify descriptors after accept(), and
openat(), e= tc.  And needs user-path <> realpath(3) mappings .. theway i do it.)

As i am very new with t= his -- am i correct assuming that once
a capability was set o= n a directory or listening socket, opened
/ accepted FDs inh= erit the capability of "their parent"?

|&g= t; Anyhow.  Regardless of 13.1-i386 or 12.2-amd64 (despite
|> no_new_privs) i only see

|>
|= >   capsicum(4) violation (syscall 93, 4, 5, 0); please report t= his bug
|
|I'm not sure what you mean in t= he subject with respect to the syscall
|in siginfo_t.si_err= no. It looks like this is ENOTCAPABLE, which means
This is a misunderstanding!!  I *thought* PROC_TRAPCAP_CTL_ENAB= LE
saying "the si_errno member of the siginfo signal handler=
parameter is set to the syscall error value" meant the actu= al
"syscall number"!  And since git head now has that
_capsicum._syscall member i thought *that* would now be an
explicit thing "to detangle that".
It really i= s an error number!
I did not even think about that!So .. the actual syscall number is not available in that siginfo_t
before FreeBSD 14?  I guess you guys simply write one of= those
dtrace snippets to get over that.
(Yo= u know i did not even think, because the Linux seccomp(2) thing
i did like that, though there it is SIGSYS and the syscall is in
<= br>si_syscall.  The capsicum(4) and rights(4) etc manuals are

complete, but for someone without any real foreknowledge they m= iss
some small hints, here and there.  Not that Linux d= oes that
better.  Or OpenBSD, where you need at least o= ne unveil with "some
meat" in order to apply it, even if you= simply want no paths at
all.  .. I think.)
<= span>
|an attempt to perform an operation on an fd that you= are not allowed
|to do - for example, calling write() on a= n fd which has had
|cap_rights_limit() applied without CAP_= WRITE. errno 94 is ECAPMODE.

Ah yes!  = Not a thought on error values.

|This could= be for example trying to use open() in capability mode,
|w= hich is just not permitted (openat() is).

Y= es.  I have had real problems with that, or rather that FDCWD is=
not possible.  (And realpath did cause violations, in at leas= t
12.2, .. though yesterday evening the program was in terri= ble
state on FreeBSD.)

|&g= t;     This takes the usual shortcut of only sandboxing t= he last input file.
|>     It's a fi= rst cut and this program will be easy to adapt to sandbox \
= |>     all
|>    =  files in the future
|>
|> from= a December 2016 commit message, and i like the word "easy".
|
|cap_fileargs() didn't exist in December 2016 and there= was not yet a
|straightforward, performant and desirable w= ay to apply Capsicum to
|existing applications that operate= on a list of files provided on the
|commandline. |
|For a more recent change that makes use of cap_f= ileargs a good example
|commit is:
|

|commit 802c2095b5a6dcf0f63c473cbba1e40445e9052a
|Author: Mark Johnston <markj@FreeBSD.org>
|Date: &= nbsp; Thu Aug 1 18:57:08 2019 +0000
|
= |    Capsicumize readelf(1).
|
|    Reviewed by:    oshogbo
|    Sponsored by:   The FreeBSD Foundation
|    Differential Revision:  https://revi= ews.freebsd.org/D21108

I had the impression= that casper uses a supervising process.  You
know, the= n i thought i better do it myself: this allows the Linux
sec= comp(2) program for the clients and the server to be
streaml= ined; not only for this small one, where that bystanding
pro= cess only logs; ie, i simply sliced that into the server, and
the server then forks again so that logger actually can
sy= nchronize on the server via SIGCLD (etc etc etc), and thus can
inherit file locks, naturally, etc etc.

= --End of <CAPyFy2Do80xZmNFdtG=3DxbRuscKaQQM7rQ5ir5TVZENX3UfyKtg@mail.gma= il\
.com>

Thank you.


--steffen
|
|= Der Kragenbaer,           =      The moon bear,
|der holt sich m= unter           he cheerfu= lly and one by one
|einen nach dem anderen runter  wa.k= s himself off
|(By Robert Gernhardt)
=
= --Apple-Mail-F52579F3-1153-45FC-AE27-EC214D603731--