From owner-freebsd-arch@FreeBSD.ORG Tue Jul 8 15:26:22 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C231A106564A; Tue, 8 Jul 2008 15:26:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4A3858FC0C; Tue, 8 Jul 2008 15:26:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 636D346BA5; Tue, 8 Jul 2008 11:26:19 -0400 (EDT) Date: Tue, 8 Jul 2008 16:26:19 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Sergey Babkin In-Reply-To: <20080708001929.E63144@fledge.watson.org> Message-ID: <20080708161802.N89342@fledge.watson.org> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.ORG, David Schultz , Poul-Henning Kamp Subject: Re: Re: Re: Proposal: a revoke() system call X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 15:26:23 -0000 On Tue, 8 Jul 2008, Robert Watson wrote: > Which can be accomplished by calling dup2(2) to replace the file descriptor > with another file descriptor, perhaps one to /dev/null. It would be worth > carefully reviewing the implementation of dup2(2) to make sure that the > close->replace there is atomic with respect to other threads simultaneously > allocating file descriptors, such as with pipe(2). BTW, on a similar note to the above: I've noticed there are several spots of relative non-atomicity in the Linux emulation code, where rather than just wrapping existing system calls with binary conversion of arguments and return values, we do a semantic wrapping that is necessarily non-atomic with respect to the native code. For example, consider the Linuxulator open code in linux_common_open(): 134 error = kern_openat(td, dirfd, path, UIO_SYSSPACE, bsd_flags, mode); 135 136 if (!error) { 137 fd = td->td_retval[0]; 138 /* 139 * XXX In between kern_open() and fget(), another process 140 * having the same filedesc could use that fd without 141 * checking below. 142 */ 143 error = fget(td, fd, &fp); 144 if (!error) { 145 sx_slock(&proctree_lock); 146 PROC_LOCK(p); 147 if (!(bsd_flags & O_NOCTTY) && 148 SESS_LEADER(p) && !(p->p_flag & P_CONTROLT)) { 149 PROC_UNLOCK(p); 150 sx_unlock(&proctree_lock); 151 if (fp->f_type == DTYPE_VNODE) 152 (void) fo_ioctl(fp, TIOCSCTTY, (caddr_t) 0, 153 td->td_ucred, td); 154 } else { 155 PROC_UNLOCK(p); 156 sx_sunlock(&proctree_lock); 157 } 158 if (l_flags & LINUX_O_DIRECTORY) { 159 if (fp->f_type != DTYPE_VNODE || 160 fp->f_vnode->v_type != VDIR) { 161 error = ENOTDIR; 162 } 163 } 164 fdrop(fp, td); 165 /* 166 * XXX as above, fdrop()/kern_close() pair is racy. 167 */ 168 if (error) 169 kern_close(td, fd); 170 } 171 } I think that comment is mine, or at least, got there because of a comment I made to Roman or the like. The fd has not yet been explicitly returned to userspace, since the open system call hasn't actually returned, but other threads could use the file descriptor in a system call that could lead to unexpected races. For example, if you dup2() on top of the file descriptor between the return of kern_openat() and the invocation of fget(), fo_ioctl() might be called on the wrong file, or the kern_close() in the error case might get invoked on the "wrong" file descriptor. In these cases, the races are mostly harmless since they involve incorrectly using a file descriptor from a second thread -- since it hasn't been returned yet, it isn't valid yet and the results will be undefined. However, there may well be cases where similar races exist that do affect the semantics of multi-threaded Linux applications, such as having a main event (open()) and an associated event (fo_ioctl()) be non-atomic and allowing a race between them that does have a semantically problematic result. These sorts of edge cases, btw, are one reason why I would *strongly* discourage application writers from doing things like calling close(2) on a file descriptor while still using it from another thread. :-) Robert N M Watson Computer Laboratory University of Cambridge