Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Apr 2008 16:16:04 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Pawel Jakub Dawidek <pjd@freebsd.org>
Cc:        Roman Divacky <rdivacky@freebsd.org>, rwatson@FreeBSD.garage.freebsd.pl, freebsd-arch@freebsd.org
Subject:   Re: final decision about *at syscalls
Message-ID:  <20080412131604.GX21209@deviant.kiev.zoral.com.ua>
In-Reply-To: <20080412112019.GI45299@garage.freebsd.pl>
References:  <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help

--lFRK9w0o3gensELN
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Apr 12, 2008 at 01:20:19PM +0200, Pawel Jakub Dawidek wrote:
> On Thu, Dec 20, 2007 at 11:38:55AM -0500, John Baldwin wrote:
> > On Tuesday 18 December 2007 04:22:22 am Roman Divacky wrote:
> > > Dear arch@
> > >=20
> > > Over this summer I was working (among other things) on *at family of =
syscalls
> > > kindly sponsored by Google (in their Summer of Code). The resulting p=
atch is=20
> > > almost finished but I need to decide one design question. If you are =
not interested=20
> > > in *at/namei feel free to skip this mail.
> > >=20
> > > The *at syscalls are a threads-oriented extension to basic file sysca=
lls (think
> > > of open(), fstat(), etc.) adding the possibility to specify from wher=
e the search
> > > for relative path should start.
> > >=20
> > > image that we have /tmp/foo/bar
> > >=20
> > > and CWD is set to "/tmp/", and the process has opened "foo" as dirfd.=
 with ordinary
> > > open() syscall you have to either
> > >=20
> > > 	chdir("/tmp/foo");open("./bar");
> > >=20
> > > or
> > >=20
> > > 	open("/tmp/foo/bar");
> > >=20
> > > The first approach is problematic because it changes CWD for all thre=
ads in the process,
> > > the second is prone to race-conditions as some of the components of t=
he path can
> > > change in parallel with the "open".
> > >=20
> > > So POSIX introduced a new API, called "Extended API set part 2, ISBN:=
 1-931624-67-4" (at
> > > least this was the latest when I looked last time), which solves that=
 by introducing "*at"
> > > syscalls that supply an fd of previously opened directory which is us=
ed instead of CWD
> > > for searching relative path, ie. the previous example becomes
> > >=20
> > >    dirfd =3D open("/tmp/foo"); openat("foo", dirfd);
> > >=20
> > > I implemented the whole API as native FreeBSD syscalls + in linuxulat=
or emulation layer.
> > > Here's the problem:
> > >=20
> > > There are two approaches to the name translation from "filedescriptor=
" to the "vnode".
> > >=20
> > > 1) we can do it in the kern_fooat() syscall and pass namei() the resu=
lting vnode
> > > 2) we can pass namei() the filedescriptor and do the translation there
> > >=20
> > > PROs of #1:
> > >=20
> > > 	o	namei() does not need to know about the curthread, you can use thi=
s *at
> > > 		ability for different purposes, it's cleaner (imho)
> > >=20
> > > PROs of #2
> > >=20
> > > 	o	raceless implementation
> > > 	o	no code duplication
> > >=20
> > > CONs of #1
> > >=20
> > > 	o	some very small code duplication (the translation is done in every=
=20
> > > 		kern_fooat() function)
> > > 	o	there is a race between the name translation and the actual use of=
 the result
> > > 		of the translation that needs to be handled, the "path_to_file" str=
ing is copied
> > > 		to the kernel space twice hence a race
> > >=20
> > > CONs of #2
> > >=20
> > > 	o	namei is made thread dependant	=09
> > >=20
> > > Please tell me what approach you like more. I personally favour #1 be=
cause I don't like namei()
> > > being thread dependant, Kostik Belousov prefers #2.
> >=20
> > Considering Robert's paper on security race problems in things like sys=
trace
> > stemming from when you copy parameters out of userland and into the ker=
nel
> > multiple times, I think #2 is definitely the better choice.  Also, name=
i() is
> > already thread aware AFAICT since 'struct componentname' already contai=
ns a
> > 'cnp_thread' member (was 'cnp_proc' in 4.x).
>=20
> It looks like I'm a bit too late, but anyway...
>=20
> From what you write John, #1 is a better choice than #2. If you want to
> avoid races, you can pass already locked vnode. In case of file
> descriptors, if p_fd is not locked another thread can close and open
> different directory under the same descriptor number.
This is the application imposed race, not the externally imposed one.
Moreover, I would argue that this is application error.

>=20
> I also need such functionality for recent ZFS and #2 makes it impossible
> to use it. NDINIT_AT() is kernel (VFS) API so it should operate on
> vnodes, not file descriptor numbers, IMHO.
Following the same arguments, NDINIT() shall not operate on the pathes
too.

>=20
> For completness can you Kostik and Robert provide your arguments against
> #1?

The #2 was already committed.
The #1 caused a code duplication that was quite error-prone.

What are your problems with the #2 ?

--lFRK9w0o3gensELN
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (FreeBSD)

iEYEARECAAYFAkgAthMACgkQC3+MBN1Mb4gDYwCgykUIzTo3qvA5x0Ur5om8U88Q
QcEAoNn6skWv0ohUmvvp8V+XrcV0Xv4b
=cBi2
-----END PGP SIGNATURE-----

--lFRK9w0o3gensELN--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080412131604.GX21209>