Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 9 May 2008 16:07:22 +1000
From:      Paul Koch <paul.koch@statseeker.com>
To:        Doug Rabson <dfr@rabson.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: flock incorrectly detects deadlock on 7-stable and current
Message-ID:  <200805091607.23035.paul.koch@statseeker.com>
In-Reply-To: <1B6FCF23-413B-452A-B66D-3CCD6257F7BD@rabson.org>
References:  <200805081812.24692.paul.koch@statseeker.com> <1B6FCF23-413B-452A-B66D-3CCD6257F7BD@rabson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 8 May 2008 06:37:00 pm Doug Rabson wrote:
> On 8 May 2008, at 09:12, Paul Koch wrote:
> > Hi,
> >
> > We have been trying to track down a problem with one of our apps
> > which does a lot of flock(2) calls.  flock returns errno 11
> > (Resource deadlock avoided) under certain scenarios.  Our app works
> > fine on 7-Release, but fails on 7-stable and -current.
> >
> > The problem appears to be when we have at least three processes
> > doing flock() on a file, and one is trying to upgrade a shared lock
> > to an exclusive lock but fails with a deadlock avoided.
> >
> > Attached is a simple flock() test program.
> >
> > a. Process 1 requests and gets a shared lock
> > b. Process 2 requests and blocks for an exclusive lock
> > c. Process 3 requests and gets a shared lock
> > d. Process 3 requests an upgrade to an exclusive lock but fails
> > (errno 11)
> >
> > If we change 'd' to
> >   Process 3 requests unlock, then requests exclusive lock, it
> > works.
>
> Could you possibly try this patch and tell me if it helps:
>
> ==== //depot/user/dfr/lockd/sys/kern/kern_lockf.c#57 -
> /tank/projects/ lockd/src/sys/kern/kern_lockf.c ====
> @@ -1370,6 +1370,18 @@
>   		}
>
>   		/*
> +		 * For flock type locks, we must first remove
> +		 * any shared locks that we hold before we sleep
> +		 * waiting for an exclusive lock.
> +		 */
> +		if ((lock->lf_flags & F_FLOCK) &&
> +		    lock->lf_type == F_WRLCK) {
> +			lock->lf_type = F_UNLCK;
> +			lf_activate_lock(state, lock);
> +			lock->lf_type = F_WRLCK;
> +		}
> +
> +		/*
>   		 * We are blocked. Create edges to each blocking lock,
>   		 * checking for deadlock using the owner graph. For
>   		 * simplicity, we run deadlock detection for all
> @@ -1389,17 +1401,6 @@
>   		}
>
>   		/*
> -		 * For flock type locks, we must first remove
> -		 * any shared locks that we hold before we sleep
> -		 * waiting for an exclusive lock.
> -		 */
> -		if ((lock->lf_flags & F_FLOCK) &&
> -		    lock->lf_type == F_WRLCK) {
> -			lock->lf_type = F_UNLCK;
> -			lf_activate_lock(state, lock);
> -			lock->lf_type = F_WRLCK;
> -		}
> -		/*
>   		 * We have added edges to everything that blocks
>   		 * us. Sleep until they all go away.
>   		 */

Manually applied the patch to stable kern_lockf.c  1.57.2.1.  Ran the 
flock_test program on many of our architectures and it works fine.

Have also been testing our app on a single core i386 machine today with 
no locking problems.  Just setup a quad core -stable amd64 build and it 
also appears to be running fine now.

Thanks

	Paul.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200805091607.23035.paul.koch>