From owner-freebsd-hackers@freebsd.org Sat Mar 3 18:43:52 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 991B4F3113E for ; Sat, 3 Mar 2018 18:43:52 +0000 (UTC) (envelope-from oshogbo.vx@gmail.com) Received: from mail-lf0-x22d.google.com (mail-lf0-x22d.google.com [IPv6:2a00:1450:4010:c07::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EDAC16FDF4; Sat, 3 Mar 2018 18:43:51 +0000 (UTC) (envelope-from oshogbo.vx@gmail.com) Received: by mail-lf0-x22d.google.com with SMTP id i80so17780872lfg.5; Sat, 03 Mar 2018 10:43:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=b4+dB4wwY/NkxEuO1DCP95STQD1r8RYEUDeWqFmH6Cg=; b=To31DY3sNCbidKgLap0k0VlIrK8tFpQoSjHbwJnhTWbqbL7rxfnmTz+rl5xTIX4MT9 wT81zLDudHFIvnudIMit5ft2pb0x91j1srv9LWnv5lho1kWZb4H3v0KWslAXdotau0NU xOhcj0u5hDtQTcBpehex8hSmZ5gq7tlRmR0RCw4w25MymZl5kzRHuU6b8xFA84QUvnqk 4CM3F1/VxAwKNjAXMngUYF5S/6JLeLxiTo5QuNLYJScnnxbjaawQnSn8tM9KTHr3hHCO 4du+oh6PIYqnHpGXpJ6r/zY2CmQQCp+BUTMnvGE8heyr2bcAiQeZddYOFSRIRqWEiIFK FVzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=b4+dB4wwY/NkxEuO1DCP95STQD1r8RYEUDeWqFmH6Cg=; b=Z0k5KaiFLyqghNGePOmAjc7GySfx60PRh7/JGffYudMAbed+OxFO9pXoBYbJfXFCgi x8ulUkY9NDfq9lQYkTi7o8WT72u+KDgV2jakvUMufH17nMEBYJ8CLS1NS6LilVdMKgV/ yQE12xcwdMi9ZV/fBXh5RX5/Wc1Fg/tV0pdCiQUTo4em6S/ihmvDzfdFNJ0W78krxwmy EyFWCGXiNkUivHSKYhaCb/x4yr0WFdmW8rydz3dSyBbQbgG9K4a7uCAVj0DV4kLSpkCk PtJjO4TNZj+M09OpxXL5vYGASu+Vs+LPtxqLncMu4eZ8NqPbydVkeR7nd+IlwfqSFdR0 USpw== X-Gm-Message-State: AElRT7EyiqC7iB4S04NXjDPdtISEnFDPjm+z7Ss2MyhMefS7Tp7xnjag OY/qi7BqyojhIqUqQVgE7avQ7/v8 X-Google-Smtp-Source: AG47ELt39142nDj4scSOcuEbQ72JBtcYVApV1EuiKA6Q3rBwZVVaK7za37IP7QmsnM8sjevhYn4bxg== X-Received: by 10.25.215.197 with SMTP id q66mr6088016lfi.89.1520102630291; Sat, 03 Mar 2018 10:43:50 -0800 (PST) Received: from x-wing (87-206-170-77.dynamic.chello.pl. [87.206.170.77]) by smtp.gmail.com with ESMTPSA id n15sm1911777lfn.15.2018.03.03.10.43.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 03 Mar 2018 10:43:49 -0800 (PST) Sender: Mariusz Zaborski Date: Sat, 3 Mar 2018 19:43:54 +0100 From: Mariusz Zaborski To: "Robert N. M. Watson" Cc: Alan Somers , "" , Alexander Richardson , "freebsd-hackers@freebsd.org" Subject: Re: [capsicum] unlinkfd Message-ID: <20180303184354.GA48406@x-wing> References: <20180302183514.GA99279@x-wing> <17DE0BFF-42A2-4CD7-B09C-ABA2606C4041@cl.cam.ac.uk> <6E83EC9C-56C5-40E7-AED0-2692A15F7FD3@cl.cam.ac.uk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="huq684BweRXVnRxX" Content-Disposition: inline In-Reply-To: <6E83EC9C-56C5-40E7-AED0-2692A15F7FD3@cl.cam.ac.uk> User-Agent: Mutt/1.9.2 (2017-12-15) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Mar 2018 18:43:53 -0000 --huq684BweRXVnRxX Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I feel that there is two different things we can think about: - What we would implement in the capability system if we would build it from scratch. Here shm_open(2) and SHM_ANON can be solution to our problems. - On the other hand we have a working operating system and we can't expect = that all our programs that are already implemented will fit to those assumptio= ns nor ask developers to rewrite many existing programs. On Sat, Mar 03, 2018 at 05:16:38PM +0000, Robert N. M. Watson wrote: > New _check() variants of the unlinkat(2) and rmdirat(2) system calls migh= t do the trick -- e.g., >=20 > int unlinkat_check(dirfd, name, checkfd); > int rmdirat_check(dirfd, name, checkfd); >=20 Similar API was proposed on the review. This solves the issue with RC. Unfortunately it's not solve the problem with guessing in which directory we will work in. When I think about sandboxing for example rm(1) we would need to preopen ro= ot directory, or preopen all directories we will work in. Both solution just d= on't feel right. I'm not saying that the unlinkfd is the right and only solution - I'm just = trying to solve problem we identified while sandboxing apps. I'm glad we started t= his discussion I hope we will work some compromise between all presented challe= nges. Thanks, --=20 Mariusz Zaborski oshogbo//vx | http://oshogbo.vexillium.org FreeBSD commiter | https://freebsd.org Software developer | http://wheelsystems.com If it's not broken, let's fix it till it is!!1 > The calls would succeed only if 'name' refers to the filesystem object pa= ssed via checkfd. This would retain UNIX-style directory behaviour but allo= ws an atomic check that the object is as expected. >=20 > Of course, what you do about it if it turns out the check fails is anothe= r question... Better not to have a name at all, hence shm_open(SHM_ANON, ..= =2E) -- although just for file objects, and not directory hierarchies. >=20 > Robert >=20 > > On 3 Mar 2018, at 15:29, Alan Somers wrote: > >=20 > > In fact, FreeBSD has that same unlinkat(2) system call. But it doesn't= solve Mariusz's problem. He's concerned about race conditions. With eith= er unlink(2) or unlinkat(2), there's no way to ensure that the directory en= try you remove is for the file you think it is. Because after reading/writ= ing a file and before unlinking it, some other processes could've unlinked = it and created a new one with the same name. It's this race condition that= Mariuz seeks to solve with unlinkfd. > > -Alan > >=20 > > On Sat, Mar 3, 2018 at 5:46 AM, Alexander Richardson > wrote: > > Linux has a unlinkat() system call (https://linux.die.net/man/2/unlinka= t ) > > but it doesn't seem to have a flag that lets you unlink the fd itself. > > Possibly pathname =3D=3D NULL and AT_EMPTY_PATH could mean unlink the f= d but I > > haven't tried whether that works. > > It also has a AT_REMOVEDIR flag to make it function as rmdirat(). > >=20 > > On 3 March 2018 at 10:41, Robert N. M. Watson > > > wrote: > >=20 > > > FWIW, this is part of why we introduced anonymous POSIX shared memory > > > objects with Capsicum in FreeBSD -- we allow shm_open(2) to be passed= a > > > SHM_ANON special name, which causes the creation of a swap-backed, ma= ppable > > > file-like object that can have I/O, memory mapping, etc, performed on= it .. > > > but never has any persistent state across reboots even in the event o= f a > > > crash. > > > > > > With Capsicum you can then refine a file descriptor to the otherwise > > > writable object to be read-only for the purposes of delegation. There= is > > > not, however, a mechanism to "freeze" the state of the object causing= other > > > outstanding writable descriptors to become read-only -- certainly som= ething > > > could be added, but some care regarding VM semantics would be require= d -- > > > in particular, so that faults could not be experienced as a result of= an > > > memory store performed before the "freeze" but issued to VFS only lat= er. > > > > > > I certainly have no objection to an unlinkat(2) system call -- it's > > > unfortunate that a full suite of the at(2) APIs wasn't introduced in = the > > > first place. It would be worth checking that no one else (e.g., Solar= is, > > > Mac OS X, Linux) hasn't already added an unlinkat(2) that we can matc= h API > > > semantics for. I think I take the view that for truly anonymous objec= ts, > > > shm_open(2) without a name (or the Linux equiv) is the right thing --= and > > > hence unlinkat(2) is for more conventional use cases where the final > > > pathname element is known. > > > > > > On directories: There, I find myself falling back on a Casper-like > > > service, since GC'ing a single anonymous memory object is straightfor= ward, > > > but GC'ing a directory hierarchy is a more messy business. > > > > > > Robert > > > > > > > On 3 Mar 2018, at 09:53, Justin Cormack > > > > wrote: > > > > > > > > I think it would make sense to have an unlinkfd() that unlinks the = file > > > from > > > > everywhere, so it does not need a name to be specified. This might = be > > > > hard to implement. > > > > > > > > For temporary files, I really like Linux memfd_create(2) that opens= an > > > anonymous > > > > file without a name. This semantics is really useful. (Linux memfd = also > > > has > > > > additional options for sealing the file fo make it immutable which = are > > > very > > > > useful for safely passing files between processes.) Having a way to= make > > > > unnamed temporary files solves a lot of deletion issues as the file > > > > never needs to > > > > be unlinked. > > > > > > > > > > > > On 2 March 2018 at 18:35, Mariusz Zaborski > wrote: > > > >> Hello, > > > >> > > > >> Today I would like to propose a new syscall called unlinkfd(2) whi= ch > > > came up > > > >> during a discussion with Ed Maste. > > > >> > > > >> Currently in UNIX we can=E2=80=99t remove files safely. If we will= try to do so > > > we > > > >> always end up in a race condition. For example when we open a file= , and > > > check > > > >> it with fstat, etc. then we want to unlink(2) it=E2=80=A6 but the = file we are > > > trying to > > > >> unlink could be a different one than the one we were fstating just= a > > > moment ago. > > > >> > > > >> Another reason of implementing unlinkfd(2) came to us when we were > > > trying > > > >> to sandbox some applications like: uudecode/b64decode or bspatch. = It > > > occured > > > >> to us that we don=E2=80=99t have a good way of removing single fil= es. Of course > > > we can > > > >> try to determine in which directory we are in, and then open this > > > directory and > > > >> remove a single file. > > > >> > > > >> It looks even more bizarre if we would think about a program which > > > operates on > > > >> multiple files. If we would analyze a situation with two totally > > > different > > > >> directories like `/tmp` and `/home/oshogbo` we would end up with p= re > > > opening > > > >> a root directory or keeping as many directories as we are working = on > > > open. > > > >> All of that effort only to remove two files. This make it totally > > > impractical! > > > >> > > > >> I think that opening directories also presents some wider attack v= ector > > > because > > > >> we are keeping a single descriptor to a directory only to remove o= ne > > > file. > > > >> Unfortunately this means that an attacker can remove all files in = that > > > directory. > > > >> > > > >> I proposed this as well on the last Capsicum call. There was a > > > suggestion that > > > >> instead of doing a single syscall maybe we should have a Casper se= rvice > > > that > > > >> will allow us to remove files. Another idea was that we should per= haps > > > redesign > > > >> programs to create some subdirs work on the subdirs and then remov= e all > > > files in > > > >> this subdir. I don=E2=80=99t feel that creating a Casper service i= s a good idea > > > because > > > >> we still have exactly the same issue of race condition. In my opin= ion > > > creating > > > >> subdirs is also a problem for us. > > > >> > > > >> First we would need to redesign some of our tools and I think we s= hould > > > >> simplyfiy capsicumizition of the process instead of making it hard= er. > > > >> > > > >> Secondly we can create a temporary subdirectory but what will remo= ve it? > > > >> We are going back to having a fd to directory in which we just cre= ated > > > a subdir. > > > >> Another way would be to have Casper service which would remove a > > > directory but > > > >> with the risk of RC. > > > >> > > > >> In conclusion, I think we need syscall like unlinkfd(2), which tur= n out > > > taht it > > > >> is easy to implement. The only downside of this implementation is = that > > > we not > > > >> only need to provide a fd but also a path file. This is because in= odes > > > nor > > > >> vnodes don=E2=80=99t contain filenames. We are comparing vnodes of= the fd and > > > the given > > > >> path, if they are exactly the same we remove a file. In the syscal= l we > > > are using > > > >> a fd so there is no Ambient Authority because we are proving that = we > > > already > > > >> have access to that file. Thanks to that the syscall can be safely= used > > > with > > > >> Caspsicum. I have already discussed this with some people and they= said > > > >> `Hey I already had that idea a while ago=E2=80=A6` so let=E2=80=99= s do something with > > > that idea! > > > >> If you are intereted in patch you can find it here: > > > >> https://reviews.freebsd.org/D14567 > > > >> > > > >> Thanks, > > > >> -- > > > >> Mariusz Zaborski > > > >> oshogbo//vx | http://oshogbo.vexillium.org > > > >> FreeBSD commiter | https://freebsd.org > > > >> Software developer | http://wheelsystems.com > > > >> If it's not broken, let's fix it till it is!!1 > > > > > > > > > > > > > > > _______________________________________________ > > freebsd-hackers@freebsd.org mailin= g list > > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.o= rg " > >=20 >=20 --huq684BweRXVnRxX Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEkD1x0xkJXVVY1Gwf38KEGuLGxWQFAlqa7N8ACgkQ38KEGuLG xWSNRw/9FwcsJBggBQ9oCckXSdsxGN4SVwTkrD9TSr1HmQ9f204BcbbLody5btwd xL/vCfh16MMRgJhFmsKxZEmhykVaBRfd4ZNZ9nVVY4t/RfzjbU+B61C5PyQ/TV/m hyVMB1V8VIDOsiDs9MWL+v6sUY6e483vx9j3VmAqR1IAwyybB9mvuTZ1ZTpe81Iu av2FAWjmlwpWRJWzFMthxMqVkWBn1QINM2EOHKzdgpG4fl+8iwc5DUV0Hjtaofba R1kaUXvhuo2NTGMLqa5NTC1yoBttYPyQGYTmv3WUI4DdbK/0TwM9hmUJhXRJgA8+ D2iOietYwwllTlz5V4mkBTs6O9g3Dwde3QVHeGLV1eDdLgMYNk0uXHj2zzI1KU4p VUXCTmgw1VR/dTdCchPKuxJmgKqRSi5MZhIP+E47sRTQernfNoj2zp3brdLeG0wG tJR+FFTF0TC81rF2BPU6iWptuajXqiHWw5rrP5l6tr3Sk/qLiR0axc1muc7PjW7K VJxrswG82KApJBQt+GgUjFySrvSZ5DiXs+zCOcrUjNvK1Tf0foAARD9ZXaAEYtpL sFmpdfHjP3pOHP8zTG+1UrO00RiT0nO4dodzSFnMEBjmEfrBu2GNN5ygL8ATPenl FbNUB/bQ6/xYsnQrVROBuLJu69klIoEfUUMtESt3K5mNUVn3MeE= =+11b -----END PGP SIGNATURE----- --huq684BweRXVnRxX--