Date: Fri, 9 May 2008 16:07:22 +1000 From: Paul Koch <paul.koch@statseeker.com> To: Doug Rabson <dfr@rabson.org> Cc: freebsd-stable@freebsd.org Subject: Re: flock incorrectly detects deadlock on 7-stable and current Message-ID: <200805091607.23035.paul.koch@statseeker.com> In-Reply-To: <1B6FCF23-413B-452A-B66D-3CCD6257F7BD@rabson.org> References: <200805081812.24692.paul.koch@statseeker.com> <1B6FCF23-413B-452A-B66D-3CCD6257F7BD@rabson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 8 May 2008 06:37:00 pm Doug Rabson wrote: > On 8 May 2008, at 09:12, Paul Koch wrote: > > Hi, > > > > We have been trying to track down a problem with one of our apps > > which does a lot of flock(2) calls. flock returns errno 11 > > (Resource deadlock avoided) under certain scenarios. Our app works > > fine on 7-Release, but fails on 7-stable and -current. > > > > The problem appears to be when we have at least three processes > > doing flock() on a file, and one is trying to upgrade a shared lock > > to an exclusive lock but fails with a deadlock avoided. > > > > Attached is a simple flock() test program. > > > > a. Process 1 requests and gets a shared lock > > b. Process 2 requests and blocks for an exclusive lock > > c. Process 3 requests and gets a shared lock > > d. Process 3 requests an upgrade to an exclusive lock but fails > > (errno 11) > > > > If we change 'd' to > > Process 3 requests unlock, then requests exclusive lock, it > > works. > > Could you possibly try this patch and tell me if it helps: > > ==== //depot/user/dfr/lockd/sys/kern/kern_lockf.c#57 - > /tank/projects/ lockd/src/sys/kern/kern_lockf.c ==== > @@ -1370,6 +1370,18 @@ > } > > /* > + * For flock type locks, we must first remove > + * any shared locks that we hold before we sleep > + * waiting for an exclusive lock. > + */ > + if ((lock->lf_flags & F_FLOCK) && > + lock->lf_type == F_WRLCK) { > + lock->lf_type = F_UNLCK; > + lf_activate_lock(state, lock); > + lock->lf_type = F_WRLCK; > + } > + > + /* > * We are blocked. Create edges to each blocking lock, > * checking for deadlock using the owner graph. For > * simplicity, we run deadlock detection for all > @@ -1389,17 +1401,6 @@ > } > > /* > - * For flock type locks, we must first remove > - * any shared locks that we hold before we sleep > - * waiting for an exclusive lock. > - */ > - if ((lock->lf_flags & F_FLOCK) && > - lock->lf_type == F_WRLCK) { > - lock->lf_type = F_UNLCK; > - lf_activate_lock(state, lock); > - lock->lf_type = F_WRLCK; > - } > - /* > * We have added edges to everything that blocks > * us. Sleep until they all go away. > */ Manually applied the patch to stable kern_lockf.c 1.57.2.1. Ran the flock_test program on many of our architectures and it works fine. Have also been testing our app on a single core i386 machine today with no locking problems. Just setup a quad core -stable amd64 build and it also appears to be running fine now. Thanks Paul.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200805091607.23035.paul.koch>