Date: Sat, 12 Apr 2008 16:16:04 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: Pawel Jakub Dawidek <pjd@freebsd.org> Cc: Roman Divacky <rdivacky@freebsd.org>, rwatson@FreeBSD.garage.freebsd.pl, freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls Message-ID: <20080412131604.GX21209@deviant.kiev.zoral.com.ua> In-Reply-To: <20080412112019.GI45299@garage.freebsd.pl> References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
--lFRK9w0o3gensELN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Apr 12, 2008 at 01:20:19PM +0200, Pawel Jakub Dawidek wrote: > On Thu, Dec 20, 2007 at 11:38:55AM -0500, John Baldwin wrote: > > On Tuesday 18 December 2007 04:22:22 am Roman Divacky wrote: > > > Dear arch@ > > >=20 > > > Over this summer I was working (among other things) on *at family of = syscalls > > > kindly sponsored by Google (in their Summer of Code). The resulting p= atch is=20 > > > almost finished but I need to decide one design question. If you are = not interested=20 > > > in *at/namei feel free to skip this mail. > > >=20 > > > The *at syscalls are a threads-oriented extension to basic file sysca= lls (think > > > of open(), fstat(), etc.) adding the possibility to specify from wher= e the search > > > for relative path should start. > > >=20 > > > image that we have /tmp/foo/bar > > >=20 > > > and CWD is set to "/tmp/", and the process has opened "foo" as dirfd.= with ordinary > > > open() syscall you have to either > > >=20 > > > chdir("/tmp/foo");open("./bar"); > > >=20 > > > or > > >=20 > > > open("/tmp/foo/bar"); > > >=20 > > > The first approach is problematic because it changes CWD for all thre= ads in the process, > > > the second is prone to race-conditions as some of the components of t= he path can > > > change in parallel with the "open". > > >=20 > > > So POSIX introduced a new API, called "Extended API set part 2, ISBN:= 1-931624-67-4" (at > > > least this was the latest when I looked last time), which solves that= by introducing "*at" > > > syscalls that supply an fd of previously opened directory which is us= ed instead of CWD > > > for searching relative path, ie. the previous example becomes > > >=20 > > > dirfd =3D open("/tmp/foo"); openat("foo", dirfd); > > >=20 > > > I implemented the whole API as native FreeBSD syscalls + in linuxulat= or emulation layer. > > > Here's the problem: > > >=20 > > > There are two approaches to the name translation from "filedescriptor= " to the "vnode". > > >=20 > > > 1) we can do it in the kern_fooat() syscall and pass namei() the resu= lting vnode > > > 2) we can pass namei() the filedescriptor and do the translation there > > >=20 > > > PROs of #1: > > >=20 > > > o namei() does not need to know about the curthread, you can use thi= s *at > > > ability for different purposes, it's cleaner (imho) > > >=20 > > > PROs of #2 > > >=20 > > > o raceless implementation > > > o no code duplication > > >=20 > > > CONs of #1 > > >=20 > > > o some very small code duplication (the translation is done in every= =20 > > > kern_fooat() function) > > > o there is a race between the name translation and the actual use of= the result > > > of the translation that needs to be handled, the "path_to_file" str= ing is copied > > > to the kernel space twice hence a race > > >=20 > > > CONs of #2 > > >=20 > > > o namei is made thread dependant =09 > > >=20 > > > Please tell me what approach you like more. I personally favour #1 be= cause I don't like namei() > > > being thread dependant, Kostik Belousov prefers #2. > >=20 > > Considering Robert's paper on security race problems in things like sys= trace > > stemming from when you copy parameters out of userland and into the ker= nel > > multiple times, I think #2 is definitely the better choice. Also, name= i() is > > already thread aware AFAICT since 'struct componentname' already contai= ns a > > 'cnp_thread' member (was 'cnp_proc' in 4.x). >=20 > It looks like I'm a bit too late, but anyway... >=20 > From what you write John, #1 is a better choice than #2. If you want to > avoid races, you can pass already locked vnode. In case of file > descriptors, if p_fd is not locked another thread can close and open > different directory under the same descriptor number. This is the application imposed race, not the externally imposed one. Moreover, I would argue that this is application error. >=20 > I also need such functionality for recent ZFS and #2 makes it impossible > to use it. NDINIT_AT() is kernel (VFS) API so it should operate on > vnodes, not file descriptor numbers, IMHO. Following the same arguments, NDINIT() shall not operate on the pathes too. >=20 > For completness can you Kostik and Robert provide your arguments against > #1? The #2 was already committed. The #1 caused a code duplication that was quite error-prone. What are your problems with the #2 ? --lFRK9w0o3gensELN Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkgAthMACgkQC3+MBN1Mb4gDYwCgykUIzTo3qvA5x0Ur5om8U88Q QcEAoNn6skWv0ohUmvvp8V+XrcV0Xv4b =cBi2 -----END PGP SIGNATURE----- --lFRK9w0o3gensELN--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080412131604.GX21209>