From owner-freebsd-hackers@freebsd.org Sat Mar 3 12:46:37 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3E446F390F2 for ; Sat, 3 Mar 2018 12:46:37 +0000 (UTC) (envelope-from alr48@hermes.cam.ac.uk) Received: from ppsw-30.csi.cam.ac.uk (ppsw-30.csi.cam.ac.uk [131.111.8.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C40B87FBC0 for ; Sat, 3 Mar 2018 12:46:36 +0000 (UTC) (envelope-from alr48@hermes.cam.ac.uk) X-Cam-AntiVirus: no malware found X-Cam-ScannerInfo: http://help.uis.cam.ac.uk/email-scanner-virus Received: from mail-it0-f51.google.com ([209.85.214.51]:51321) by ppsw-30.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.156]:587) with esmtpsa (PLAIN:alr48) (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) id 1es6Yc-0008Na-eE (Exim 4.90_1) for freebsd-hackers@freebsd.org (return-path ); Sat, 03 Mar 2018 12:46:34 +0000 Received: by mail-it0-f51.google.com with SMTP id u66so4582120ith.1 for ; Sat, 03 Mar 2018 04:46:31 -0800 (PST) X-Gm-Message-State: AElRT7EySffJbac4ikWgJTcXjQ5rWuD5JtoTROVf+Vs5/bKaoLIUJHMQ 77Bs4h45KrMVATBhEPaMNHc+Th/nEoSSDxX+Q3U= X-Received: by 10.36.3.67 with SMTP id e64mt7305684ite.46.1520081190443; Sat, 03 Mar 2018 04:46:30 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.169.133 with HTTP; Sat, 3 Mar 2018 04:46:10 -0800 (PST) In-Reply-To: <17DE0BFF-42A2-4CD7-B09C-ABA2606C4041@cl.cam.ac.uk> References: <20180302183514.GA99279@x-wing> <17DE0BFF-42A2-4CD7-B09C-ABA2606C4041@cl.cam.ac.uk> From: Alexander Richardson Date: Sat, 3 Mar 2018 12:46:10 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [capsicum] unlinkfd Cc: "" , freebsd-hackers@freebsd.org Sender: "A.L. Richardson" X-Mailman-Approved-At: Sat, 03 Mar 2018 15:08:04 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Mar 2018 12:46:37 -0000 Linux has a unlinkat() system call (https://linux.die.net/man/2/unlinkat) but it doesn't seem to have a flag that lets you unlink the fd itself. Possibly pathname =3D=3D NULL and AT_EMPTY_PATH could mean unlink the fd bu= t I haven't tried whether that works. It also has a AT_REMOVEDIR flag to make it function as rmdirat(). On 3 March 2018 at 10:41, Robert N. M. Watson wrote: > FWIW, this is part of why we introduced anonymous POSIX shared memory > objects with Capsicum in FreeBSD -- we allow shm_open(2) to be passed a > SHM_ANON special name, which causes the creation of a swap-backed, mappab= le > file-like object that can have I/O, memory mapping, etc, performed on it = .. > but never has any persistent state across reboots even in the event of a > crash. > > With Capsicum you can then refine a file descriptor to the otherwise > writable object to be read-only for the purposes of delegation. There is > not, however, a mechanism to "freeze" the state of the object causing oth= er > outstanding writable descriptors to become read-only -- certainly somethi= ng > could be added, but some care regarding VM semantics would be required -- > in particular, so that faults could not be experienced as a result of an > memory store performed before the "freeze" but issued to VFS only later. > > I certainly have no objection to an unlinkat(2) system call -- it's > unfortunate that a full suite of the at(2) APIs wasn't introduced in the > first place. It would be worth checking that no one else (e.g., Solaris, > Mac OS X, Linux) hasn't already added an unlinkat(2) that we can match AP= I > semantics for. I think I take the view that for truly anonymous objects, > shm_open(2) without a name (or the Linux equiv) is the right thing -- and > hence unlinkat(2) is for more conventional use cases where the final > pathname element is known. > > On directories: There, I find myself falling back on a Casper-like > service, since GC'ing a single anonymous memory object is straightforward= , > but GC'ing a directory hierarchy is a more messy business. > > Robert > > > On 3 Mar 2018, at 09:53, Justin Cormack > wrote: > > > > I think it would make sense to have an unlinkfd() that unlinks the file > from > > everywhere, so it does not need a name to be specified. This might be > > hard to implement. > > > > For temporary files, I really like Linux memfd_create(2) that opens an > anonymous > > file without a name. This semantics is really useful. (Linux memfd also > has > > additional options for sealing the file fo make it immutable which are > very > > useful for safely passing files between processes.) Having a way to mak= e > > unnamed temporary files solves a lot of deletion issues as the file > > never needs to > > be unlinked. > > > > > > On 2 March 2018 at 18:35, Mariusz Zaborski wrote: > >> Hello, > >> > >> Today I would like to propose a new syscall called unlinkfd(2) which > came up > >> during a discussion with Ed Maste. > >> > >> Currently in UNIX we can=E2=80=99t remove files safely. If we will try= to do so > we > >> always end up in a race condition. For example when we open a file, an= d > check > >> it with fstat, etc. then we want to unlink(2) it=E2=80=A6 but the file= we are > trying to > >> unlink could be a different one than the one we were fstating just a > moment ago. > >> > >> Another reason of implementing unlinkfd(2) came to us when we were > trying > >> to sandbox some applications like: uudecode/b64decode or bspatch. It > occured > >> to us that we don=E2=80=99t have a good way of removing single files. = Of course > we can > >> try to determine in which directory we are in, and then open this > directory and > >> remove a single file. > >> > >> It looks even more bizarre if we would think about a program which > operates on > >> multiple files. If we would analyze a situation with two totally > different > >> directories like `/tmp` and `/home/oshogbo` we would end up with pre > opening > >> a root directory or keeping as many directories as we are working on > open. > >> All of that effort only to remove two files. This make it totally > impractical! > >> > >> I think that opening directories also presents some wider attack vecto= r > because > >> we are keeping a single descriptor to a directory only to remove one > file. > >> Unfortunately this means that an attacker can remove all files in that > directory. > >> > >> I proposed this as well on the last Capsicum call. There was a > suggestion that > >> instead of doing a single syscall maybe we should have a Casper servic= e > that > >> will allow us to remove files. Another idea was that we should perhaps > redesign > >> programs to create some subdirs work on the subdirs and then remove al= l > files in > >> this subdir. I don=E2=80=99t feel that creating a Casper service is a = good idea > because > >> we still have exactly the same issue of race condition. In my opinion > creating > >> subdirs is also a problem for us. > >> > >> First we would need to redesign some of our tools and I think we shoul= d > >> simplyfiy capsicumizition of the process instead of making it harder. > >> > >> Secondly we can create a temporary subdirectory but what will remove i= t? > >> We are going back to having a fd to directory in which we just created > a subdir. > >> Another way would be to have Casper service which would remove a > directory but > >> with the risk of RC. > >> > >> In conclusion, I think we need syscall like unlinkfd(2), which turn ou= t > taht it > >> is easy to implement. The only downside of this implementation is that > we not > >> only need to provide a fd but also a path file. This is because inodes > nor > >> vnodes don=E2=80=99t contain filenames. We are comparing vnodes of the= fd and > the given > >> path, if they are exactly the same we remove a file. In the syscall we > are using > >> a fd so there is no Ambient Authority because we are proving that we > already > >> have access to that file. Thanks to that the syscall can be safely use= d > with > >> Caspsicum. I have already discussed this with some people and they sai= d > >> `Hey I already had that idea a while ago=E2=80=A6` so let=E2=80=99s do= something with > that idea! > >> If you are intereted in patch you can find it here: > >> https://reviews.freebsd.org/D14567 > >> > >> Thanks, > >> -- > >> Mariusz Zaborski > >> oshogbo//vx | http://oshogbo.vexillium.org > >> FreeBSD commiter | https://freebsd.org > >> Software developer | http://wheelsystems.com > >> If it's not broken, let's fix it till it is!!1 > > > > >