From owner-freebsd-arch@FreeBSD.ORG Thu Dec 20 18:43:15 2007 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4935516A418; Thu, 20 Dec 2007 18:43:15 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from speedfactory.net (mail6.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id DC52813C4CE; Thu, 20 Dec 2007 18:43:14 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8q) with ESMTP id 225307900-1834499 for multiple; Thu, 20 Dec 2007 13:25:04 -0500 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id lBKIQThg077000; Thu, 20 Dec 2007 13:26:52 -0500 (EST) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Thu, 20 Dec 2007 11:38:55 -0500 User-Agent: KMail/1.9.6 References: <20071218092222.GA9695@freebsd.org> In-Reply-To: <20071218092222.GA9695@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200712201138.56423.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Thu, 20 Dec 2007 13:26:52 -0500 (EST) X-Virus-Scanned: ClamAV 0.91.2/5192/Thu Dec 20 12:24:15 2007 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: arch@FreeBSD.org, Roman Divacky Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 18:43:15 -0000 On Tuesday 18 December 2007 04:22:22 am Roman Divacky wrote: > Dear arch@ > > Over this summer I was working (among other things) on *at family of syscalls > kindly sponsored by Google (in their Summer of Code). The resulting patch is > almost finished but I need to decide one design question. If you are not interested > in *at/namei feel free to skip this mail. > > The *at syscalls are a threads-oriented extension to basic file syscalls (think > of open(), fstat(), etc.) adding the possibility to specify from where the search > for relative path should start. > > image that we have /tmp/foo/bar > > and CWD is set to "/tmp/", and the process has opened "foo" as dirfd. with ordinary > open() syscall you have to either > > chdir("/tmp/foo");open("./bar"); > > or > > open("/tmp/foo/bar"); > > The first approach is problematic because it changes CWD for all threads in the process, > the second is prone to race-conditions as some of the components of the path can > change in parallel with the "open". > > So POSIX introduced a new API, called "Extended API set part 2, ISBN: 1-931624-67-4" (at > least this was the latest when I looked last time), which solves that by introducing "*at" > syscalls that supply an fd of previously opened directory which is used instead of CWD > for searching relative path, ie. the previous example becomes > > dirfd = open("/tmp/foo"); openat("foo", dirfd); > > I implemented the whole API as native FreeBSD syscalls + in linuxulator emulation layer. > Here's the problem: > > There are two approaches to the name translation from "filedescriptor" to the "vnode". > > 1) we can do it in the kern_fooat() syscall and pass namei() the resulting vnode > 2) we can pass namei() the filedescriptor and do the translation there > > PROs of #1: > > o namei() does not need to know about the curthread, you can use this *at > ability for different purposes, it's cleaner (imho) > > PROs of #2 > > o raceless implementation > o no code duplication > > CONs of #1 > > o some very small code duplication (the translation is done in every > kern_fooat() function) > o there is a race between the name translation and the actual use of the result > of the translation that needs to be handled, the "path_to_file" string is copied > to the kernel space twice hence a race > > CONs of #2 > > o namei is made thread dependant > > Please tell me what approach you like more. I personally favour #1 because I don't like namei() > being thread dependant, Kostik Belousov prefers #2. Considering Robert's paper on security race problems in things like systrace stemming from when you copy parameters out of userland and into the kernel multiple times, I think #2 is definitely the better choice. Also, namei() is already thread aware AFAICT since 'struct componentname' already contains a 'cnp_thread' member (was 'cnp_proc' in 4.x). -- John Baldwin