From owner-freebsd-arch@FreeBSD.ORG Sat Apr 12 11:47:34 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 46499106566B; Sat, 12 Apr 2008 11:47:34 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello087206046210.chello.pl [87.206.46.210]) by mx1.freebsd.org (Postfix) with ESMTP id 9A6EB8FC17; Sat, 12 Apr 2008 11:47:33 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 7E37245E11; Sat, 12 Apr 2008 13:20:35 +0200 (CEST) Received: from localhost (chello087206046210.chello.pl [87.206.46.210]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 2CCF445D8D; Sat, 12 Apr 2008 13:20:30 +0200 (CEST) Date: Sat, 12 Apr 2008 13:20:19 +0200 From: Pawel Jakub Dawidek To: John Baldwin Message-ID: <20080412112019.GI45299@garage.freebsd.pl> References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rV8arf8D5Dod9UkK" Content-Disposition: inline In-Reply-To: <200712201138.56423.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 8.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham version=3.0.4 Cc: Roman Divacky , kib@FreeBSD.org, rwatson@FreeBSD.garage.freebsd.pl, freebsd-arch@FreeBSD.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Apr 2008 11:47:34 -0000 --rV8arf8D5Dod9UkK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 20, 2007 at 11:38:55AM -0500, John Baldwin wrote: > On Tuesday 18 December 2007 04:22:22 am Roman Divacky wrote: > > Dear arch@ > >=20 > > Over this summer I was working (among other things) on *at family of sy= scalls > > kindly sponsored by Google (in their Summer of Code). The resulting pat= ch is=20 > > almost finished but I need to decide one design question. If you are no= t interested=20 > > in *at/namei feel free to skip this mail. > >=20 > > The *at syscalls are a threads-oriented extension to basic file syscall= s (think > > of open(), fstat(), etc.) adding the possibility to specify from where = the search > > for relative path should start. > >=20 > > image that we have /tmp/foo/bar > >=20 > > and CWD is set to "/tmp/", and the process has opened "foo" as dirfd. w= ith ordinary > > open() syscall you have to either > >=20 > > chdir("/tmp/foo");open("./bar"); > >=20 > > or > >=20 > > open("/tmp/foo/bar"); > >=20 > > The first approach is problematic because it changes CWD for all thread= s in the process, > > the second is prone to race-conditions as some of the components of the= path can > > change in parallel with the "open". > >=20 > > So POSIX introduced a new API, called "Extended API set part 2, ISBN: 1= -931624-67-4" (at > > least this was the latest when I looked last time), which solves that b= y introducing "*at" > > syscalls that supply an fd of previously opened directory which is used= instead of CWD > > for searching relative path, ie. the previous example becomes > >=20 > > dirfd =3D open("/tmp/foo"); openat("foo", dirfd); > >=20 > > I implemented the whole API as native FreeBSD syscalls + in linuxulator= emulation layer. > > Here's the problem: > >=20 > > There are two approaches to the name translation from "filedescriptor" = to the "vnode". > >=20 > > 1) we can do it in the kern_fooat() syscall and pass namei() the result= ing vnode > > 2) we can pass namei() the filedescriptor and do the translation there > >=20 > > PROs of #1: > >=20 > > o namei() does not need to know about the curthread, you can use this = *at > > ability for different purposes, it's cleaner (imho) > >=20 > > PROs of #2 > >=20 > > o raceless implementation > > o no code duplication > >=20 > > CONs of #1 > >=20 > > o some very small code duplication (the translation is done in every= =20 > > kern_fooat() function) > > o there is a race between the name translation and the actual use of t= he result > > of the translation that needs to be handled, the "path_to_file" strin= g is copied > > to the kernel space twice hence a race > >=20 > > CONs of #2 > >=20 > > o namei is made thread dependant =09 > >=20 > > Please tell me what approach you like more. I personally favour #1 beca= use I don't like namei() > > being thread dependant, Kostik Belousov prefers #2. >=20 > Considering Robert's paper on security race problems in things like systr= ace > stemming from when you copy parameters out of userland and into the kernel > multiple times, I think #2 is definitely the better choice. Also, namei(= ) is > already thread aware AFAICT since 'struct componentname' already contains= a > 'cnp_thread' member (was 'cnp_proc' in 4.x). It looks like I'm a bit too late, but anyway... =46rom what you write John, #1 is a better choice than #2. If you want to avoid races, you can pass already locked vnode. In case of file descriptors, if p_fd is not locked another thread can close and open different directory under the same descriptor number. I also need such functionality for recent ZFS and #2 makes it impossible to use it. NDINIT_AT() is kernel (VFS) API so it should operate on vnodes, not file descriptor numbers, IMHO. For completness can you Kostik and Robert provide your arguments against #1? --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --rV8arf8D5Dod9UkK Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFIAJrzForvXbEpPzQRAuaLAJ9CTWpcMOvRjzqpLSqlCZUR7ThV5ACeIO2y DG+DRIroPDDqxpVveIREmnA= =wDOB -----END PGP SIGNATURE----- --rV8arf8D5Dod9UkK--