Date: Sat, 3 Mar 2018 17:16:38 +0000 From: "Robert N. M. Watson" <robert.watson@cl.cam.ac.uk> To: Alan Somers <asomers@freebsd.org> Cc: Alexander Richardson <Alexander.Richardson@cl.cam.ac.uk>, "<cl-capsicum-discuss@lists.cam.ac.uk>" <cl-capsicum-discuss@lists.cam.ac.uk>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: Re: [capsicum] unlinkfd Message-ID: <6E83EC9C-56C5-40E7-AED0-2692A15F7FD3@cl.cam.ac.uk> In-Reply-To: <CAOtMX2h83wddDwcvgae-a02AuyYPyTYfmzqeJemMtKt7%2BL74YQ@mail.gmail.com> References: <20180302183514.GA99279@x-wing> <CAK4o1Wyk54chHobhUkb2PBUtaWOF2rDv6tkX_bFGY6D331xUqw@mail.gmail.com> <17DE0BFF-42A2-4CD7-B09C-ABA2606C4041@cl.cam.ac.uk> <CAEeofcgLD%2BTjKswPexNDUfeeAxHgUOjsZUdD3g3Jc%2BQuyRu4OQ@mail.gmail.com> <CAOtMX2h83wddDwcvgae-a02AuyYPyTYfmzqeJemMtKt7%2BL74YQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
New _check() variants of the unlinkat(2) and rmdirat(2) system calls = might do the trick -- e.g., int unlinkat_check(dirfd, name, checkfd); int rmdirat_check(dirfd, name, checkfd); The calls would succeed only if 'name' refers to the filesystem object = passed via checkfd. This would retain UNIX-style directory behaviour but = allows an atomic check that the object is as expected. Of course, what you do about it if it turns out the check fails is = another question... Better not to have a name at all, hence = shm_open(SHM_ANON, ...) -- although just for file objects, and not = directory hierarchies. Robert > On 3 Mar 2018, at 15:29, Alan Somers <asomers@freebsd.org> wrote: >=20 > In fact, FreeBSD has that same unlinkat(2) system call. But it = doesn't solve Mariusz's problem. He's concerned about race conditions. = With either unlink(2) or unlinkat(2), there's no way to ensure that the = directory entry you remove is for the file you think it is. Because = after reading/writing a file and before unlinking it, some other = processes could've unlinked it and created a new one with the same name. = It's this race condition that Mariuz seeks to solve with unlinkfd. > -Alan >=20 > On Sat, Mar 3, 2018 at 5:46 AM, Alexander Richardson = <Alexander.Richardson@cl.cam.ac.uk = <mailto:Alexander.Richardson@cl.cam.ac.uk>> wrote: > Linux has a unlinkat() system call = (https://linux.die.net/man/2/unlinkat = <https://linux.die.net/man/2/unlinkat>) > but it doesn't seem to have a flag that lets you unlink the fd itself. > Possibly pathname =3D=3D NULL and AT_EMPTY_PATH could mean unlink the = fd but I > haven't tried whether that works. > It also has a AT_REMOVEDIR flag to make it function as rmdirat(). >=20 > On 3 March 2018 at 10:41, Robert N. M. Watson = <robert.watson@cl.cam.ac.uk <mailto:robert.watson@cl.cam.ac.uk>> > wrote: >=20 > > FWIW, this is part of why we introduced anonymous POSIX shared = memory > > objects with Capsicum in FreeBSD -- we allow shm_open(2) to be = passed a > > SHM_ANON special name, which causes the creation of a swap-backed, = mappable > > file-like object that can have I/O, memory mapping, etc, performed = on it .. > > but never has any persistent state across reboots even in the event = of a > > crash. > > > > With Capsicum you can then refine a file descriptor to the otherwise > > writable object to be read-only for the purposes of delegation. = There is > > not, however, a mechanism to "freeze" the state of the object = causing other > > outstanding writable descriptors to become read-only -- certainly = something > > could be added, but some care regarding VM semantics would be = required -- > > in particular, so that faults could not be experienced as a result = of an > > memory store performed before the "freeze" but issued to VFS only = later. > > > > I certainly have no objection to an unlinkat(2) system call -- it's > > unfortunate that a full suite of the at(2) APIs wasn't introduced in = the > > first place. It would be worth checking that no one else (e.g., = Solaris, > > Mac OS X, Linux) hasn't already added an unlinkat(2) that we can = match API > > semantics for. I think I take the view that for truly anonymous = objects, > > shm_open(2) without a name (or the Linux equiv) is the right thing = -- and > > hence unlinkat(2) is for more conventional use cases where the final > > pathname element is known. > > > > On directories: There, I find myself falling back on a Casper-like > > service, since GC'ing a single anonymous memory object is = straightforward, > > but GC'ing a directory hierarchy is a more messy business. > > > > Robert > > > > > On 3 Mar 2018, at 09:53, Justin Cormack = <justin@specialbusservice.com <mailto:justin@specialbusservice.com>> > > wrote: > > > > > > I think it would make sense to have an unlinkfd() that unlinks the = file > > from > > > everywhere, so it does not need a name to be specified. This might = be > > > hard to implement. > > > > > > For temporary files, I really like Linux memfd_create(2) that = opens an > > anonymous > > > file without a name. This semantics is really useful. (Linux memfd = also > > has > > > additional options for sealing the file fo make it immutable which = are > > very > > > useful for safely passing files between processes.) Having a way = to make > > > unnamed temporary files solves a lot of deletion issues as the = file > > > never needs to > > > be unlinked. > > > > > > > > > On 2 March 2018 at 18:35, Mariusz Zaborski <oshogbo@freebsd.org = <mailto:oshogbo@freebsd.org>> wrote: > > >> Hello, > > >> > > >> Today I would like to propose a new syscall called unlinkfd(2) = which > > came up > > >> during a discussion with Ed Maste. > > >> > > >> Currently in UNIX we can=E2=80=99t remove files safely. If we = will try to do so > > we > > >> always end up in a race condition. For example when we open a = file, and > > check > > >> it with fstat, etc. then we want to unlink(2) it=E2=80=A6 but the = file we are > > trying to > > >> unlink could be a different one than the one we were fstating = just a > > moment ago. > > >> > > >> Another reason of implementing unlinkfd(2) came to us when we = were > > trying > > >> to sandbox some applications like: uudecode/b64decode or bspatch. = It > > occured > > >> to us that we don=E2=80=99t have a good way of removing single = files. Of course > > we can > > >> try to determine in which directory we are in, and then open this > > directory and > > >> remove a single file. > > >> > > >> It looks even more bizarre if we would think about a program = which > > operates on > > >> multiple files. If we would analyze a situation with two totally > > different > > >> directories like `/tmp` and `/home/oshogbo` we would end up with = pre > > opening > > >> a root directory or keeping as many directories as we are working = on > > open. > > >> All of that effort only to remove two files. This make it totally > > impractical! > > >> > > >> I think that opening directories also presents some wider attack = vector > > because > > >> we are keeping a single descriptor to a directory only to remove = one > > file. > > >> Unfortunately this means that an attacker can remove all files in = that > > directory. > > >> > > >> I proposed this as well on the last Capsicum call. There was a > > suggestion that > > >> instead of doing a single syscall maybe we should have a Casper = service > > that > > >> will allow us to remove files. Another idea was that we should = perhaps > > redesign > > >> programs to create some subdirs work on the subdirs and then = remove all > > files in > > >> this subdir. I don=E2=80=99t feel that creating a Casper service = is a good idea > > because > > >> we still have exactly the same issue of race condition. In my = opinion > > creating > > >> subdirs is also a problem for us. > > >> > > >> First we would need to redesign some of our tools and I think we = should > > >> simplyfiy capsicumizition of the process instead of making it = harder. > > >> > > >> Secondly we can create a temporary subdirectory but what will = remove it? > > >> We are going back to having a fd to directory in which we just = created > > a subdir. > > >> Another way would be to have Casper service which would remove a > > directory but > > >> with the risk of RC. > > >> > > >> In conclusion, I think we need syscall like unlinkfd(2), which = turn out > > taht it > > >> is easy to implement. The only downside of this implementation is = that > > we not > > >> only need to provide a fd but also a path file. This is because = inodes > > nor > > >> vnodes don=E2=80=99t contain filenames. We are comparing vnodes = of the fd and > > the given > > >> path, if they are exactly the same we remove a file. In the = syscall we > > are using > > >> a fd so there is no Ambient Authority because we are proving that = we > > already > > >> have access to that file. Thanks to that the syscall can be = safely used > > with > > >> Caspsicum. I have already discussed this with some people and = they said > > >> `Hey I already had that idea a while ago=E2=80=A6` so let=E2=80=99s= do something with > > that idea! > > >> If you are intereted in patch you can find it here: > > >> https://reviews.freebsd.org/D14567 = <https://reviews.freebsd.org/D14567> > > >> > > >> Thanks, > > >> -- > > >> Mariusz Zaborski > > >> oshogbo//vx | http://oshogbo.vexillium.org = <http://oshogbo.vexillium.org/> > > >> FreeBSD commiter | https://freebsd.org = <https://freebsd.org/> > > >> Software developer | http://wheelsystems.com = <http://wheelsystems.com/> > > >> If it's not broken, let's fix it till it is!!1 > > > > > > > > > > _______________________________________________ > freebsd-hackers@freebsd.org <mailto:freebsd-hackers@freebsd.org> = mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers = <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers> > To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org = <mailto:freebsd-hackers-unsubscribe@freebsd.org>" >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6E83EC9C-56C5-40E7-AED0-2692A15F7FD3>