Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 9 Mar 2012 00:39:19 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-fs@freebsd.org, pho@freebsd.org, fs@freebsd.org
Subject:   Re: close() of an flock'd file is not atomic
Message-ID:  <20120308223919.GU75778@deviant.kiev.zoral.com.ua>
In-Reply-To: <201203081539.07711.jhb@freebsd.org>
References:  <201203071318.08241.jhb@freebsd.org> <201203081539.07711.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--DmaKRSoEg59HPfHz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Mar 08, 2012 at 03:39:07PM -0500, John Baldwin wrote:
> On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote:
> > So I ran into this problem at work.  Suppose you have a process that op=
ens a=20
> > read-write file descriptor with O_EXLOCK (so it has an flock()).  It th=
en=20
> > writes out a binary into that file.  Another process wants to execve() =
the=20
> > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK)=
, and=20
> > will call execve() once it has locked the file.  In theory, what should=
 happen=20
> > is that the second process should wait until the first process has fini=
shed=20
> > and called close().  In practice what happens is that I occasionally se=
e the=20
> > second process fail with ETXTBUSY.
> >=20
> > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the=
 file=20
> > separately from the call to vn_close() which drops the writecount.  Thu=
s, the=20
> > second process can do an open() and flock() of the file and subsequentl=
y call
> > execve() after the first process has done the VOP_ADVLOCK(), but before=
 it=20
> > calls into vn_close().  In fact, since vn_close() requires a write lock=
 on the=20
> > vnode, this turns out to not be too hard to reproduce at all.  Below is=
 a=20
> > simple test program that reproduces this constantly.  To use, copy /bin=
/test=20
> > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), th=
en run=20
> > ./flock_close_race /tmp/foo.
> >=20
> > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release t=
he lock=20
> > until after vn_close() executes.  However, even with that fix applied, =
my test
> > case still fails.  Now it is because open() with a given lock flag is
> > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writ=
ecount
> > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_=
child'=20
> > process has the fd locked, the writecount can still be bumped.  One gro=
ss hack
> > would be to defer the bump of the writecount to the caller of vn_open()=
 if the
> > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge=
, plus
> > it doesn't actually work.  I ended up moving acquiring the lock into=20
> > vn_open_cred().  The current patch I'm testing has both of these approa=
ches,
> > but the first one is #if 0'd out, and the second is #if 1'd.
> >=20
> > http://www.freebsd.org/~jhb/patches/flock_open_close.patch
>=20
> Based on some feedback from Konstantin, I've fixed some issues in the fai=
lure
> path handling for VOP_ADVLOCK().  I've also removed the #if 0'd code ment=
ioned
> above, so the patch is now the actual change that I'm testing.  So far it
> handles both my workload at work and my test program without any issues.

I think a comment is needed for a reason to call vn_writechk() second time.

Could you, please, point me, where the FHASLOCK is set for O_EXLOCK | O_SHL=
OCK
case in the patched kernel ?

--DmaKRSoEg59HPfHz
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk9ZNRUACgkQC3+MBN1Mb4g4iQCggp5r8WSezZNIVwxLO9/gp5v0
ZywAoPUpSUbOWJYiX/8EEcLzfhQEgQL7
=aoVo
-----END PGP SIGNATURE-----

--DmaKRSoEg59HPfHz--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120308223919.GU75778>