Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 3 Mar 2018 10:41:07 +0000
From:      "Robert N. M. Watson" <robert.watson@cl.cam.ac.uk>
To:        Justin Cormack <justin@specialbusservice.com>
Cc:        Mariusz Zaborski <oshogbo@freebsd.org>, "<cl-capsicum-discuss@lists.cam.ac.uk>" <cl-capsicum-discuss@lists.cam.ac.uk>,  freebsd-hackers@freebsd.org
Subject:   Re: [capsicum] unlinkfd
Message-ID:  <17DE0BFF-42A2-4CD7-B09C-ABA2606C4041@cl.cam.ac.uk>
In-Reply-To: <CAK4o1Wyk54chHobhUkb2PBUtaWOF2rDv6tkX_bFGY6D331xUqw@mail.gmail.com>
References:  <20180302183514.GA99279@x-wing> <CAK4o1Wyk54chHobhUkb2PBUtaWOF2rDv6tkX_bFGY6D331xUqw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
FWIW, this is part of why we introduced anonymous POSIX shared memory =
objects with Capsicum in FreeBSD -- we allow shm_open(2) to be passed a =
SHM_ANON special name, which causes the creation of a swap-backed, =
mappable file-like object that can have I/O, memory mapping, etc, =
performed on it .. but never has any persistent state across reboots =
even in the event of a crash.

With Capsicum you can then refine a file descriptor to the otherwise =
writable object to be read-only for the purposes of delegation. There is =
not, however, a mechanism to "freeze" the state of the object causing =
other outstanding writable descriptors to become read-only -- certainly =
something could be added, but some care regarding VM semantics would be =
required -- in particular, so that faults could not be experienced as a =
result of an memory store performed before the "freeze" but issued to =
VFS only later.

I certainly have no objection to an unlinkat(2) system call -- it's =
unfortunate that a full suite of the at(2) APIs wasn't introduced in the =
first place. It would be worth checking that no one else (e.g., Solaris, =
Mac OS X, Linux) hasn't already added an unlinkat(2) that we can match =
API semantics for. I think I take the view that for truly anonymous =
objects, shm_open(2) without a name (or the Linux equiv) is the right =
thing -- and hence unlinkat(2) is for more conventional use cases where =
the final pathname element is known.

On directories: There, I find myself falling back on a Casper-like =
service, since GC'ing a single anonymous memory object is =
straightforward, but GC'ing a directory hierarchy is a more messy =
business.

Robert

> On 3 Mar 2018, at 09:53, Justin Cormack <justin@specialbusservice.com> =
wrote:
>=20
> I think it would make sense to have an unlinkfd() that unlinks the =
file from
> everywhere, so it does not need a name to be specified. This might be
> hard to implement.
>=20
> For temporary files, I really like Linux memfd_create(2) that opens an =
anonymous
> file without a name. This semantics is really useful. (Linux memfd =
also has
> additional options for sealing the file fo make it immutable which are =
very
> useful for safely passing files between processes.) Having a way to =
make
> unnamed temporary files solves a lot of deletion issues as the file
> never needs to
> be unlinked.
>=20
>=20
> On 2 March 2018 at 18:35, Mariusz Zaborski <oshogbo@freebsd.org> =
wrote:
>> Hello,
>>=20
>> Today I would like to propose a new syscall called unlinkfd(2) which =
came up
>> during a discussion with Ed Maste.
>>=20
>> Currently in UNIX we can=E2=80=99t remove files safely. If we will =
try to do so we
>> always end up in a race condition. For example when we open a file, =
and check
>> it with fstat, etc. then we want to unlink(2) it=E2=80=A6 but the =
file we are trying to
>> unlink could be a different one than the one we were fstating just a =
moment ago.
>>=20
>> Another reason of implementing unlinkfd(2) came to us when we were =
trying
>> to sandbox some applications like: uudecode/b64decode or bspatch. It =
occured
>> to us that we don=E2=80=99t have a good way of removing single files. =
Of course we can
>> try to determine in which directory we are in, and then open this =
directory and
>> remove a single file.
>>=20
>> It looks even more bizarre if we would think about a program which =
operates on
>> multiple files. If we would analyze a situation with two totally =
different
>> directories like `/tmp` and `/home/oshogbo` we would end up with pre =
opening
>> a root directory or keeping as many directories as we are working on =
open.
>> All of that effort only to remove two files. This make it totally =
impractical!
>>=20
>> I think that opening directories also presents some wider attack =
vector because
>> we are keeping a single descriptor to a directory only to remove one =
file.
>> Unfortunately this means that an attacker can remove all files in =
that directory.
>>=20
>> I proposed this as well on the last Capsicum call. There was a =
suggestion that
>> instead of doing a single syscall maybe we should have a Casper =
service that
>> will allow us to remove files. Another idea was that we should =
perhaps redesign
>> programs to create some subdirs work on the subdirs and then remove =
all files in
>> this subdir. I don=E2=80=99t feel that creating a Casper service is a =
good idea because
>> we still have exactly the same issue of race condition. In my opinion =
creating
>> subdirs is also a problem for us.
>>=20
>> First we would need to redesign some of our tools and I think we =
should
>> simplyfiy capsicumizition of the process instead of making it harder.
>>=20
>> Secondly we can create a temporary subdirectory but what will remove =
it?
>> We are going back to having a fd to directory in which we just =
created a subdir.
>> Another way would be to have Casper service which would remove a =
directory but
>> with the risk of RC.
>>=20
>> In conclusion, I think we need syscall like unlinkfd(2), which turn =
out taht it
>> is easy to implement. The only downside of this implementation is =
that we not
>> only need to provide a fd but also a path file. This is because =
inodes nor
>> vnodes don=E2=80=99t contain filenames. We are comparing vnodes of =
the fd and the given
>> path, if they are exactly the same we remove a file. In the syscall =
we are using
>> a fd so there is no Ambient Authority because we are proving that we =
already
>> have access to that file. Thanks to that the syscall can be safely =
used with
>> Caspsicum. I have already discussed this with some people and they =
said
>> `Hey I already had that idea a while ago=E2=80=A6` so let=E2=80=99s =
do something with that idea!
>> If you are intereted in patch you can find it here:
>> https://reviews.freebsd.org/D14567
>>=20
>> Thanks,
>> --
>> Mariusz Zaborski
>> oshogbo//vx             | http://oshogbo.vexillium.org
>> FreeBSD commiter        | https://freebsd.org
>> Software developer      | http://wheelsystems.com
>> If it's not broken, let's fix it till it is!!1
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?17DE0BFF-42A2-4CD7-B09C-ABA2606C4041>