From owner-freebsd-current  Sun Jun 27 15:47:20 1999
Delivered-To: freebsd-current@freebsd.org
Received: from overcee.netplex.com.au (overcee.netplex.com.au [202.12.86.7])
	by hub.freebsd.org (Postfix) with ESMTP id C36EC151D3
	for <current@FreeBSD.ORG>; Sun, 27 Jun 1999 15:47:14 -0700 (PDT)
	(envelope-from peter@netplex.com.au)
Received: from netplex.com.au (localhost [127.0.0.1])
	by overcee.netplex.com.au (Postfix) with ESMTP
	id A61D781; Mon, 28 Jun 1999 06:47:13 +0800 (WST)
	(envelope-from peter@netplex.com.au)
X-Mailer: exmh version 2.0.2 2/24/98
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: current@FreeBSD.ORG, mckusick@mckusick.com
Subject: Re: BUF_LOCK() related panic.. 
In-reply-to: Your message of "Sun, 27 Jun 1999 13:06:13 MST."
             <199906272006.NAA15499@apollo.backplane.com> 
Date: Mon, 28 Jun 1999 06:47:13 +0800
From: Peter Wemm <peter@netplex.com.au>
Message-Id: <19990627224713.A61D781@overcee.netplex.com.au>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Matthew Dillon wrote:
> :
> :But that doesn't fix the UP problem where cluster_wbuild() tries to
> :recursively re-lock a buf that the current process already owns.  I have a
> :few ideas about that one though, I just don't understand the clustering
> :well enough yet to fix it.
> 
>     Ok, I just hit this testing the lockmgr changes.
> 
>     I think the problem is that cluster_wbuild's algorithm was polluted
>     a little by Kirk's commit.
> 
>     Previously it tested for B_BUSY to determine if it could then lock it
>     to include in the cluster.

Yep, I was aware of this before, but I didn't rip it out since I didn't
know what Kirk's intentions were.  I'm assuming he's got his experience
with making the BSD/OS vfs reentrant in mind, so I don't want to break
anything that gets closer to that.  A seperate test and then lock would
not be reentrant.  (Sure, there are far bigger problems than this, but
every bit helps when we get there)

>     Kirk changed this to actually attempt a lock, and then include it
>     if the lock succeeded and not include it if the lock failed.
> 
>     The problem is that if the buffer was already locked by the same process,
>     this change results in a panic instead of a simple failure to obtain the
>     lock.

>     The solution is to re-tool the code to use the original algorithm ( test
>     the lock before trying to get it, rather then simply trying to get it ),
>     but with the new locks.  I do not have time today to do this but I believ
    e
>     I have given sufficient information for Peter, Kirk, or Alan to make the
>     fix.

Actually, I think there is another set if missing BUF_KERNPROC() calls,
cluster_callback() frees buffers, so all buffers submitted with it had
better be reassigned.  This is (I think) part of the problem that
cluster_wbuild() is hitting - things were supposed to have been reassigned
but are still hanging onto the current process.

>     I believe there are two or three areas in Kirk's patchset where he 
>     replaced an explicit test with an attempt to actually gain the lock where
>     this sort of panic can occur.  I think Kirk was trying to optimize the
>     code :-)  Heh heh.  Just goes to show that combining functional 
>     replacements with optimizations all in one go does not always work.
> 
> 					-Matt
> 					Matthew Dillon 
> 					<dillon@backplane.com>
> 
> :Cheers,
> :-Peter
> :--
> :Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
> 
> 
> 

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message