Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 3 Mar 2018 17:16:38 +0000
From:      "Robert N. M. Watson" <robert.watson@cl.cam.ac.uk>
To:        Alan Somers <asomers@freebsd.org>
Cc:        Alexander Richardson <Alexander.Richardson@cl.cam.ac.uk>, "<cl-capsicum-discuss@lists.cam.ac.uk>" <cl-capsicum-discuss@lists.cam.ac.uk>,  "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: [capsicum] unlinkfd
Message-ID:  <6E83EC9C-56C5-40E7-AED0-2692A15F7FD3@cl.cam.ac.uk>
In-Reply-To: <CAOtMX2h83wddDwcvgae-a02AuyYPyTYfmzqeJemMtKt7%2BL74YQ@mail.gmail.com>
References:  <20180302183514.GA99279@x-wing> <CAK4o1Wyk54chHobhUkb2PBUtaWOF2rDv6tkX_bFGY6D331xUqw@mail.gmail.com> <17DE0BFF-42A2-4CD7-B09C-ABA2606C4041@cl.cam.ac.uk> <CAEeofcgLD%2BTjKswPexNDUfeeAxHgUOjsZUdD3g3Jc%2BQuyRu4OQ@mail.gmail.com> <CAOtMX2h83wddDwcvgae-a02AuyYPyTYfmzqeJemMtKt7%2BL74YQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
New _check() variants of the unlinkat(2) and rmdirat(2) system calls =
might do the trick -- e.g.,

	int	unlinkat_check(dirfd, name, checkfd);
	int	rmdirat_check(dirfd, name, checkfd);

The calls would succeed only if 'name' refers to the filesystem object =
passed via checkfd. This would retain UNIX-style directory behaviour but =
allows an atomic check that the object is as expected.

Of course, what you do about it if it turns out the check fails is =
another question... Better not to have a name at all, hence =
shm_open(SHM_ANON, ...) -- although just for file objects, and not =
directory hierarchies.

Robert

> On 3 Mar 2018, at 15:29, Alan Somers <asomers@freebsd.org> wrote:
>=20
> In fact, FreeBSD has that same unlinkat(2) system call.  But it =
doesn't solve Mariusz's problem.  He's concerned about race conditions.  =
With either unlink(2) or unlinkat(2), there's no way to ensure that the =
directory entry you remove is for the file you think it is.  Because =
after reading/writing a file and before unlinking it, some other =
processes could've unlinked it and created a new one with the same name. =
 It's this race condition that Mariuz seeks to solve with unlinkfd.
> -Alan
>=20
> On Sat, Mar 3, 2018 at 5:46 AM, Alexander Richardson =
<Alexander.Richardson@cl.cam.ac.uk =
<mailto:Alexander.Richardson@cl.cam.ac.uk>> wrote:
> Linux has a unlinkat() system call =
(https://linux.die.net/man/2/unlinkat =
<https://linux.die.net/man/2/unlinkat>)
> but it doesn't seem to have a flag that lets you unlink the fd itself.
> Possibly pathname =3D=3D NULL and AT_EMPTY_PATH could mean unlink the =
fd but I
> haven't tried whether that works.
> It also has a AT_REMOVEDIR flag to make it function as rmdirat().
>=20
> On 3 March 2018 at 10:41, Robert N. M. Watson =
<robert.watson@cl.cam.ac.uk <mailto:robert.watson@cl.cam.ac.uk>>
> wrote:
>=20
> > FWIW, this is part of why we introduced anonymous POSIX shared =
memory
> > objects with Capsicum in FreeBSD -- we allow shm_open(2) to be =
passed a
> > SHM_ANON special name, which causes the creation of a swap-backed, =
mappable
> > file-like object that can have I/O, memory mapping, etc, performed =
on it ..
> > but never has any persistent state across reboots even in the event =
of a
> > crash.
> >
> > With Capsicum you can then refine a file descriptor to the otherwise
> > writable object to be read-only for the purposes of delegation. =
There is
> > not, however, a mechanism to "freeze" the state of the object =
causing other
> > outstanding writable descriptors to become read-only -- certainly =
something
> > could be added, but some care regarding VM semantics would be =
required --
> > in particular, so that faults could not be experienced as a result =
of an
> > memory store performed before the "freeze" but issued to VFS only =
later.
> >
> > I certainly have no objection to an unlinkat(2) system call -- it's
> > unfortunate that a full suite of the at(2) APIs wasn't introduced in =
the
> > first place. It would be worth checking that no one else (e.g., =
Solaris,
> > Mac OS X, Linux) hasn't already added an unlinkat(2) that we can =
match API
> > semantics for. I think I take the view that for truly anonymous =
objects,
> > shm_open(2) without a name (or the Linux equiv) is the right thing =
-- and
> > hence unlinkat(2) is for more conventional use cases where the final
> > pathname element is known.
> >
> > On directories: There, I find myself falling back on a Casper-like
> > service, since GC'ing a single anonymous memory object is =
straightforward,
> > but GC'ing a directory hierarchy is a more messy business.
> >
> > Robert
> >
> > > On 3 Mar 2018, at 09:53, Justin Cormack =
<justin@specialbusservice.com <mailto:justin@specialbusservice.com>>
> > wrote:
> > >
> > > I think it would make sense to have an unlinkfd() that unlinks the =
file
> > from
> > > everywhere, so it does not need a name to be specified. This might =
be
> > > hard to implement.
> > >
> > > For temporary files, I really like Linux memfd_create(2) that =
opens an
> > anonymous
> > > file without a name. This semantics is really useful. (Linux memfd =
also
> > has
> > > additional options for sealing the file fo make it immutable which =
are
> > very
> > > useful for safely passing files between processes.) Having a way =
to make
> > > unnamed temporary files solves a lot of deletion issues as the =
file
> > > never needs to
> > > be unlinked.
> > >
> > >
> > > On 2 March 2018 at 18:35, Mariusz Zaborski <oshogbo@freebsd.org =
<mailto:oshogbo@freebsd.org>> wrote:
> > >> Hello,
> > >>
> > >> Today I would like to propose a new syscall called unlinkfd(2) =
which
> > came up
> > >> during a discussion with Ed Maste.
> > >>
> > >> Currently in UNIX we can=E2=80=99t remove files safely. If we =
will try to do so
> > we
> > >> always end up in a race condition. For example when we open a =
file, and
> > check
> > >> it with fstat, etc. then we want to unlink(2) it=E2=80=A6 but the =
file we are
> > trying to
> > >> unlink could be a different one than the one we were fstating =
just a
> > moment ago.
> > >>
> > >> Another reason of implementing unlinkfd(2) came to us when we =
were
> > trying
> > >> to sandbox some applications like: uudecode/b64decode or bspatch. =
It
> > occured
> > >> to us that we don=E2=80=99t have a good way of removing single =
files. Of course
> > we can
> > >> try to determine in which directory we are in, and then open this
> > directory and
> > >> remove a single file.
> > >>
> > >> It looks even more bizarre if we would think about a program =
which
> > operates on
> > >> multiple files. If we would analyze a situation with two totally
> > different
> > >> directories like `/tmp` and `/home/oshogbo` we would end up with =
pre
> > opening
> > >> a root directory or keeping as many directories as we are working =
on
> > open.
> > >> All of that effort only to remove two files. This make it totally
> > impractical!
> > >>
> > >> I think that opening directories also presents some wider attack =
vector
> > because
> > >> we are keeping a single descriptor to a directory only to remove =
one
> > file.
> > >> Unfortunately this means that an attacker can remove all files in =
that
> > directory.
> > >>
> > >> I proposed this as well on the last Capsicum call. There was a
> > suggestion that
> > >> instead of doing a single syscall maybe we should have a Casper =
service
> > that
> > >> will allow us to remove files. Another idea was that we should =
perhaps
> > redesign
> > >> programs to create some subdirs work on the subdirs and then =
remove all
> > files in
> > >> this subdir. I don=E2=80=99t feel that creating a Casper service =
is a good idea
> > because
> > >> we still have exactly the same issue of race condition. In my =
opinion
> > creating
> > >> subdirs is also a problem for us.
> > >>
> > >> First we would need to redesign some of our tools and I think we =
should
> > >> simplyfiy capsicumizition of the process instead of making it =
harder.
> > >>
> > >> Secondly we can create a temporary subdirectory but what will =
remove it?
> > >> We are going back to having a fd to directory in which we just =
created
> > a subdir.
> > >> Another way would be to have Casper service which would remove a
> > directory but
> > >> with the risk of RC.
> > >>
> > >> In conclusion, I think we need syscall like unlinkfd(2), which =
turn out
> > taht it
> > >> is easy to implement. The only downside of this implementation is =
that
> > we not
> > >> only need to provide a fd but also a path file. This is because =
inodes
> > nor
> > >> vnodes don=E2=80=99t contain filenames. We are comparing vnodes =
of the fd and
> > the given
> > >> path, if they are exactly the same we remove a file. In the =
syscall we
> > are using
> > >> a fd so there is no Ambient Authority because we are proving that =
we
> > already
> > >> have access to that file. Thanks to that the syscall can be =
safely used
> > with
> > >> Caspsicum. I have already discussed this with some people and =
they said
> > >> `Hey I already had that idea a while ago=E2=80=A6` so let=E2=80=99s=
 do something with
> > that idea!
> > >> If you are intereted in patch you can find it here:
> > >> https://reviews.freebsd.org/D14567 =
<https://reviews.freebsd.org/D14567>;
> > >>
> > >> Thanks,
> > >> --
> > >> Mariusz Zaborski
> > >> oshogbo//vx             | http://oshogbo.vexillium.org =
<http://oshogbo.vexillium.org/>;
> > >> FreeBSD commiter        | https://freebsd.org =
<https://freebsd.org/>;
> > >> Software developer      | http://wheelsystems.com =
<http://wheelsystems.com/>;
> > >> If it's not broken, let's fix it till it is!!1
> > >
> >
> >
> >
> _______________________________________________
> freebsd-hackers@freebsd.org <mailto:freebsd-hackers@freebsd.org> =
mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers =
<https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>;
> To unsubscribe, send any mail to =
"freebsd-hackers-unsubscribe@freebsd.org =
<mailto:freebsd-hackers-unsubscribe@freebsd.org>"
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6E83EC9C-56C5-40E7-AED0-2692A15F7FD3>