From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 20:39:10 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 980691065675; Thu, 8 Mar 2012 20:39:10 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7044F8FC14; Thu, 8 Mar 2012 20:39:10 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 2933446B95; Thu, 8 Mar 2012 15:39:10 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 88355B965; Thu, 8 Mar 2012 15:39:09 -0500 (EST) From: John Baldwin To: freebsd-fs@freebsd.org Date: Thu, 8 Mar 2012 15:39:07 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <201203071318.08241.jhb@freebsd.org> In-Reply-To: <201203071318.08241.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201203081539.07711.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 08 Mar 2012 15:39:09 -0500 (EST) Cc: pho@freebsd.org, kib@freebsd.org, fs@freebsd.org Subject: Re: close() of an flock'd file is not atomic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 20:39:10 -0000 On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote: > So I ran into this problem at work. Suppose you have a process that opens a > read-write file descriptor with O_EXLOCK (so it has an flock()). It then > writes out a binary into that file. Another process wants to execve() the > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK), and > will call execve() once it has locked the file. In theory, what should happen > is that the second process should wait until the first process has finished > and called close(). In practice what happens is that I occasionally see the > second process fail with ETXTBUSY. > > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the file > separately from the call to vn_close() which drops the writecount. Thus, the > second process can do an open() and flock() of the file and subsequently call > execve() after the first process has done the VOP_ADVLOCK(), but before it > calls into vn_close(). In fact, since vn_close() requires a write lock on the > vnode, this turns out to not be too hard to reproduce at all. Below is a > simple test program that reproduces this constantly. To use, copy /bin/test > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), then run > ./flock_close_race /tmp/foo. > > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release the lock > until after vn_close() executes. However, even with that fix applied, my test > case still fails. Now it is because open() with a given lock flag is > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writecount > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_child' > process has the fd locked, the writecount can still be bumped. One gross hack > would be to defer the bump of the writecount to the caller of vn_open() if the > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge, plus > it doesn't actually work. I ended up moving acquiring the lock into > vn_open_cred(). The current patch I'm testing has both of these approaches, > but the first one is #if 0'd out, and the second is #if 1'd. > > http://www.freebsd.org/~jhb/patches/flock_open_close.patch Based on some feedback from Konstantin, I've fixed some issues in the failure path handling for VOP_ADVLOCK(). I've also removed the #if 0'd code mentioned above, so the patch is now the actual change that I'm testing. So far it handles both my workload at work and my test program without any issues. -- John Baldwin