From owner-freebsd-arch@FreeBSD.ORG Sat Apr 12 13:16:09 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1F66106566C; Sat, 12 Apr 2008 13:16:09 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay03.kiev.sovam.com (relay03.kiev.sovam.com [62.64.120.201]) by mx1.freebsd.org (Postfix) with ESMTP id 79EB68FC12; Sat, 12 Apr 2008 13:16:09 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=skuns.kiev.zoral.com.ua) by relay03.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1JkfaR-000Ffh-Rf; Sat, 12 Apr 2008 16:16:08 +0300 Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by skuns.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m3CDGDQk005667 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 12 Apr 2008 16:16:13 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m3CDG4uH082249; Sat, 12 Apr 2008 16:16:04 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id m3CDG4CD082248; Sat, 12 Apr 2008 16:16:04 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 12 Apr 2008 16:16:04 +0300 From: Kostik Belousov To: Pawel Jakub Dawidek Message-ID: <20080412131604.GX21209@deviant.kiev.zoral.com.ua> References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="lFRK9w0o3gensELN" Content-Disposition: inline In-Reply-To: <20080412112019.GI45299@garage.freebsd.pl> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: ClamAV version 0.91.2, clamav-milter version 0.91.2 on skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.4 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on skuns.kiev.zoral.com.ua X-Scanner-Signature: a14d1af8d19e5cbf0f3a8f9bfbb6792f X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Header: Not Detected X-SpamTest-Info: Profiles 2617 [Apr 11 2008] X-SpamTest-Info: helo_type=3 X-SpamTest-Method: none X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0278], KAS30/Release Cc: Roman Divacky , rwatson@FreeBSD.garage.freebsd.pl, freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Apr 2008 13:16:10 -0000 --lFRK9w0o3gensELN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Apr 12, 2008 at 01:20:19PM +0200, Pawel Jakub Dawidek wrote: > On Thu, Dec 20, 2007 at 11:38:55AM -0500, John Baldwin wrote: > > On Tuesday 18 December 2007 04:22:22 am Roman Divacky wrote: > > > Dear arch@ > > >=20 > > > Over this summer I was working (among other things) on *at family of = syscalls > > > kindly sponsored by Google (in their Summer of Code). The resulting p= atch is=20 > > > almost finished but I need to decide one design question. If you are = not interested=20 > > > in *at/namei feel free to skip this mail. > > >=20 > > > The *at syscalls are a threads-oriented extension to basic file sysca= lls (think > > > of open(), fstat(), etc.) adding the possibility to specify from wher= e the search > > > for relative path should start. > > >=20 > > > image that we have /tmp/foo/bar > > >=20 > > > and CWD is set to "/tmp/", and the process has opened "foo" as dirfd.= with ordinary > > > open() syscall you have to either > > >=20 > > > chdir("/tmp/foo");open("./bar"); > > >=20 > > > or > > >=20 > > > open("/tmp/foo/bar"); > > >=20 > > > The first approach is problematic because it changes CWD for all thre= ads in the process, > > > the second is prone to race-conditions as some of the components of t= he path can > > > change in parallel with the "open". > > >=20 > > > So POSIX introduced a new API, called "Extended API set part 2, ISBN:= 1-931624-67-4" (at > > > least this was the latest when I looked last time), which solves that= by introducing "*at" > > > syscalls that supply an fd of previously opened directory which is us= ed instead of CWD > > > for searching relative path, ie. the previous example becomes > > >=20 > > > dirfd =3D open("/tmp/foo"); openat("foo", dirfd); > > >=20 > > > I implemented the whole API as native FreeBSD syscalls + in linuxulat= or emulation layer. > > > Here's the problem: > > >=20 > > > There are two approaches to the name translation from "filedescriptor= " to the "vnode". > > >=20 > > > 1) we can do it in the kern_fooat() syscall and pass namei() the resu= lting vnode > > > 2) we can pass namei() the filedescriptor and do the translation there > > >=20 > > > PROs of #1: > > >=20 > > > o namei() does not need to know about the curthread, you can use thi= s *at > > > ability for different purposes, it's cleaner (imho) > > >=20 > > > PROs of #2 > > >=20 > > > o raceless implementation > > > o no code duplication > > >=20 > > > CONs of #1 > > >=20 > > > o some very small code duplication (the translation is done in every= =20 > > > kern_fooat() function) > > > o there is a race between the name translation and the actual use of= the result > > > of the translation that needs to be handled, the "path_to_file" str= ing is copied > > > to the kernel space twice hence a race > > >=20 > > > CONs of #2 > > >=20 > > > o namei is made thread dependant =09 > > >=20 > > > Please tell me what approach you like more. I personally favour #1 be= cause I don't like namei() > > > being thread dependant, Kostik Belousov prefers #2. > >=20 > > Considering Robert's paper on security race problems in things like sys= trace > > stemming from when you copy parameters out of userland and into the ker= nel > > multiple times, I think #2 is definitely the better choice. Also, name= i() is > > already thread aware AFAICT since 'struct componentname' already contai= ns a > > 'cnp_thread' member (was 'cnp_proc' in 4.x). >=20 > It looks like I'm a bit too late, but anyway... >=20 > From what you write John, #1 is a better choice than #2. If you want to > avoid races, you can pass already locked vnode. In case of file > descriptors, if p_fd is not locked another thread can close and open > different directory under the same descriptor number. This is the application imposed race, not the externally imposed one. Moreover, I would argue that this is application error. >=20 > I also need such functionality for recent ZFS and #2 makes it impossible > to use it. NDINIT_AT() is kernel (VFS) API so it should operate on > vnodes, not file descriptor numbers, IMHO. Following the same arguments, NDINIT() shall not operate on the pathes too. >=20 > For completness can you Kostik and Robert provide your arguments against > #1? The #2 was already committed. The #1 caused a code duplication that was quite error-prone. What are your problems with the #2 ? --lFRK9w0o3gensELN Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkgAthMACgkQC3+MBN1Mb4gDYwCgykUIzTo3qvA5x0Ur5om8U88Q QcEAoNn6skWv0ohUmvvp8V+XrcV0Xv4b =cBi2 -----END PGP SIGNATURE----- --lFRK9w0o3gensELN--