From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 23:19:01 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84722106566B for ; Thu, 8 Mar 2012 23:19:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 114B98FC1A for ; Thu, 8 Mar 2012 23:19:00 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q28MdKbW082679; Fri, 9 Mar 2012 00:39:20 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q28MdJqR097363; Fri, 9 Mar 2012 00:39:19 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q28MdJHL097362; Fri, 9 Mar 2012 00:39:19 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 9 Mar 2012 00:39:19 +0200 From: Konstantin Belousov To: John Baldwin Message-ID: <20120308223919.GU75778@deviant.kiev.zoral.com.ua> References: <201203071318.08241.jhb@freebsd.org> <201203081539.07711.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DmaKRSoEg59HPfHz" Content-Disposition: inline In-Reply-To: <201203081539.07711.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, pho@freebsd.org, fs@freebsd.org Subject: Re: close() of an flock'd file is not atomic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 23:19:01 -0000 --DmaKRSoEg59HPfHz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 08, 2012 at 03:39:07PM -0500, John Baldwin wrote: > On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote: > > So I ran into this problem at work. Suppose you have a process that op= ens a=20 > > read-write file descriptor with O_EXLOCK (so it has an flock()). It th= en=20 > > writes out a binary into that file. Another process wants to execve() = the=20 > > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK)= , and=20 > > will call execve() once it has locked the file. In theory, what should= happen=20 > > is that the second process should wait until the first process has fini= shed=20 > > and called close(). In practice what happens is that I occasionally se= e the=20 > > second process fail with ETXTBUSY. > >=20 > > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the= file=20 > > separately from the call to vn_close() which drops the writecount. Thu= s, the=20 > > second process can do an open() and flock() of the file and subsequentl= y call > > execve() after the first process has done the VOP_ADVLOCK(), but before= it=20 > > calls into vn_close(). In fact, since vn_close() requires a write lock= on the=20 > > vnode, this turns out to not be too hard to reproduce at all. Below is= a=20 > > simple test program that reproduces this constantly. To use, copy /bin= /test=20 > > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), th= en run=20 > > ./flock_close_race /tmp/foo. > >=20 > > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release t= he lock=20 > > until after vn_close() executes. However, even with that fix applied, = my test > > case still fails. Now it is because open() with a given lock flag is > > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writ= ecount > > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_= child'=20 > > process has the fd locked, the writecount can still be bumped. One gro= ss hack > > would be to defer the bump of the writecount to the caller of vn_open()= if the > > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge= , plus > > it doesn't actually work. I ended up moving acquiring the lock into=20 > > vn_open_cred(). The current patch I'm testing has both of these approa= ches, > > but the first one is #if 0'd out, and the second is #if 1'd. > >=20 > > http://www.freebsd.org/~jhb/patches/flock_open_close.patch >=20 > Based on some feedback from Konstantin, I've fixed some issues in the fai= lure > path handling for VOP_ADVLOCK(). I've also removed the #if 0'd code ment= ioned > above, so the patch is now the actual change that I'm testing. So far it > handles both my workload at work and my test program without any issues. I think a comment is needed for a reason to call vn_writechk() second time. Could you, please, point me, where the FHASLOCK is set for O_EXLOCK | O_SHL= OCK case in the patched kernel ? --DmaKRSoEg59HPfHz Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9ZNRUACgkQC3+MBN1Mb4g4iQCggp5r8WSezZNIVwxLO9/gp5v0 ZywAoPUpSUbOWJYiX/8EEcLzfhQEgQL7 =aoVo -----END PGP SIGNATURE----- --DmaKRSoEg59HPfHz--