Date: Fri, 9 May 2008 16:07:22 +1000 From: Paul Koch <paul.koch@statseeker.com> To: Doug Rabson <dfr@rabson.org> Cc: freebsd-stable@freebsd.org Subject: Re: flock incorrectly detects deadlock on 7-stable and current Message-ID: <200805091607.23035.paul.koch@statseeker.com> In-Reply-To: <1B6FCF23-413B-452A-B66D-3CCD6257F7BD@rabson.org> References: <200805081812.24692.paul.koch@statseeker.com> <1B6FCF23-413B-452A-B66D-3CCD6257F7BD@rabson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 8 May 2008 06:37:00 pm Doug Rabson wrote:
> On 8 May 2008, at 09:12, Paul Koch wrote:
> > Hi,
> >
> > We have been trying to track down a problem with one of our apps
> > which does a lot of flock(2) calls. flock returns errno 11
> > (Resource deadlock avoided) under certain scenarios. Our app works
> > fine on 7-Release, but fails on 7-stable and -current.
> >
> > The problem appears to be when we have at least three processes
> > doing flock() on a file, and one is trying to upgrade a shared lock
> > to an exclusive lock but fails with a deadlock avoided.
> >
> > Attached is a simple flock() test program.
> >
> > a. Process 1 requests and gets a shared lock
> > b. Process 2 requests and blocks for an exclusive lock
> > c. Process 3 requests and gets a shared lock
> > d. Process 3 requests an upgrade to an exclusive lock but fails
> > (errno 11)
> >
> > If we change 'd' to
> > Process 3 requests unlock, then requests exclusive lock, it
> > works.
>
> Could you possibly try this patch and tell me if it helps:
>
> ==== //depot/user/dfr/lockd/sys/kern/kern_lockf.c#57 -
> /tank/projects/ lockd/src/sys/kern/kern_lockf.c ====
> @@ -1370,6 +1370,18 @@
> }
>
> /*
> + * For flock type locks, we must first remove
> + * any shared locks that we hold before we sleep
> + * waiting for an exclusive lock.
> + */
> + if ((lock->lf_flags & F_FLOCK) &&
> + lock->lf_type == F_WRLCK) {
> + lock->lf_type = F_UNLCK;
> + lf_activate_lock(state, lock);
> + lock->lf_type = F_WRLCK;
> + }
> +
> + /*
> * We are blocked. Create edges to each blocking lock,
> * checking for deadlock using the owner graph. For
> * simplicity, we run deadlock detection for all
> @@ -1389,17 +1401,6 @@
> }
>
> /*
> - * For flock type locks, we must first remove
> - * any shared locks that we hold before we sleep
> - * waiting for an exclusive lock.
> - */
> - if ((lock->lf_flags & F_FLOCK) &&
> - lock->lf_type == F_WRLCK) {
> - lock->lf_type = F_UNLCK;
> - lf_activate_lock(state, lock);
> - lock->lf_type = F_WRLCK;
> - }
> - /*
> * We have added edges to everything that blocks
> * us. Sleep until they all go away.
> */
Manually applied the patch to stable kern_lockf.c 1.57.2.1. Ran the
flock_test program on many of our architectures and it works fine.
Have also been testing our app on a single core i386 machine today with
no locking problems. Just setup a quad core -stable amd64 build and it
also appears to be running fine now.
Thanks
Paul.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200805091607.23035.paul.koch>
