Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 27 Oct 2001 21:19:21 -0400
From:      Jules Gilbert <jules@aasp.net>
To:        freebsd-questions@freebsd.org, freebsd-developers@freebsd.org
Cc:        mckusick@mckusick.com, green@freebsd.org, roberto@eurocontrol.fr
Subject:   [Fwd: panic: bqrelse: multiple ref .. thought fixed a year ago]
Message-ID:  <3BDB5D19.1E5C55C3@aasp.net>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------D3463132A50AEF4EA428D368
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit



Jules Gilbert wrote:

> Hello folks:
>
> We are having a big problem which is interfering with a whole lot of
> things.  We are running FreeBSD 4.3 and we are seeing the infamous
> "bqrelse: multiple refs" problem.
> The panic then the dump, syncing disks..  We thought this was fixed over
> a year ago in vfs_bio.c
>
> This problem occured most recently when I exit'ed a remote ssh session.
> The exit took several seconds, and caused me to believe something was
> wrong.  I  then logged back in, and sure enough, we had a 'sh.core' dump
> file (of zero size) and my running jobs had died.  (The machine dumped.)
>
> Later, I could not get in at all, (of course the machine was dead at
> that point)  and the other machines doing NFS writes failed as well. NFS
> structure IS:
>
> This machine with this problem, call it PRIME1,  NFS serves 6 other
> FBSD4.3 machines as clients. They ALL mount PRIME1's /mnt/public and all
> 6 write into this directory with their own files.
>
> So, this morning, searching the net, we found several references to
> "bqrelse" but none of the references seemed to assert that the fix was
> such-and-such.  Does a fix exist?  By the way, I maintain multiple
> FreeBSD boxes and am doing lot's of NFS activity, in addition to my
> occasional SSH login.
>
> I am willing to make queue's larger, change parameters or whatever else
> it takes to make this work.
>
> Pls help us.
>
> ===================================================================
> Our search results netted the following fr July 2000
>
> Search Result 1
> From: Kirk McKusick (mckusick@mckusick.com)
> Subject: Re: Panic: bqrelse: multiple refs
> Newsgroups: mailing.freebsd.current
> View: (This is the only article in this thread) | Original Format
> Date: 2000/07/26
>
> Date: Tue, 25 Jul 2000 11:47:03 -0400 (EDT)
>  From: Brian Fundakowski Feldman <green@FreeBSD.org>
>  To: Ollivier Robert <roberto@eurocontrol.fr>
>  Cc: "FreeBSD Current Users' list" <freebsd-current@FreeBSD.org>,
>   mckusick@mckusick.com
>  Subject: Panic: lockmgr: pid 5, not exclusive lock holder 0 unlocking
>  In-Reply-To: <20000725170455.F636@caerdonn.eurocontrol.fr>
>
>  On Tue, 25 Jul 2000, Ollivier Robert wrote:
>
>  > According to Brian Fundakowski Feldman:
>  > > Actually, I'm pretty certain this is the fix:
>  >
>  > Well it won't panic but isn't it putting the problem under the
> carpet?
>  > I agree the panic seems to be here temporarely but...
>
>  No, I'm really certain this isn't the case.  You see, struct buf has
>  a b_lock that until recently was a plain, exclusive lockmgr lock.  In
>  Kirk's last round of changes, he converted b_lock to be LK_CANRECURSE,
>  which means that the lock, while still an exclusive lock, may be
>  relocked multiple times by the same caller.
>
>  The panics are plain wrong.  What's left is to determine what is the
>  proper thing to do in each of these cases, which I'm certain that many
>  people already know already (you see, I'm still a bit green ;). What I
>  am _almost_ sure about is that the right thing is just to remove one
>  of the locks and let it get freed back up the call chain.  I'm almost
>  certain this is the case because if you are grabbing exclusive locks
>  and recursing upon them, your call chain is the only consumer and in
>  a recursive-locking-callchain, you will have multiple symmetric lock
>  and unlock pairs.  Anything else horribly complicates things, and this
>  makes me a good 95% certain that this is exactly the right fix, not
>  that it's sweeping any true bugs under the carpet.
>
>  Allowing recursive locks is pretty much the only way to solve many of
>  the problems here because it's simply not possible to support all code
>  paths without allowing for this recursion.  The code would either be
>  horribly complicated or non-functional.  I'm certain Kirk may be able
>  to back me up here.  It seems that the cleanup is meant to make the
>  locks recursive mostly to facilitate correct/proper call chains, and
>  that's consistent with my understand at least :)
>
>  Indeed, if you look at the comment in brelse() from the delta, you
>  will see that the intention of allowing this very situation to occur
>  and simply BUF_UNLOCK() was planned for and the panic()s were for
>  debugging during the previous time that b_locks weren't LK_CANRECURSE.
>
>  As always, take what I say with a grain of salt since I'm definitely
>  not a VFS guru in any manner; I just happen to think I understand this
>  one :)
>
>  > --
>  > Ollivier ROBERT -=- Eurocontrol EEC/ITM -=-
> Ollivier.Robert@eurocontrol.fr
>  > The Postman hits! The Postman hits! You have new mail.
>
>  --
>   Brian Fundakowski Feldman           \  FreeBSD: The Power to Serve!  /
>
>   green@FreeBSD.org                    `------------------------------'
>
> The above explanation is correct. When I made the change to allow
> recursive buffer locks, I should have removed that panic (but forgot
> that I had put it in there, sigh). I have just made the change on
> freefall. Sorry for the problems caused by that change.
>
>  Kirk McKusick

--------------D3463132A50AEF4EA428D368
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

X-Mozilla-Status2: 00000000
Message-ID: <3BDB09F8.F6E0D3A0@aasp.net>
Date: Sat, 27 Oct 2001 15:24:40 -0400
From: Jules Gilbert <jules@aasp.net>
X-Mailer: Mozilla 4.77 [en]C-CCK-MCD NSCPCD477  (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: freebsd-questions@freebsd.org
CC: pg@eth1.com, wfaxon@gis.net, david@catwhisker.org,
 	green@FreeBSD.org, mckusick@mckusick.com
Subject: panic: bqrelse: multiple ref .. thought fixed a year ago
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hello folks:

We are having a big problem which is interfering with a whole lot of
things.  We are running FreeBSD 4.3 and we are seeing the infamous
"bqrelse: multiple refs" problem.
The panic then the dump, syncing disks..  We thought this was fixed over
a year ago in vfs_bio.c

This problem occured most recently when I exit'ed a remote ssh session.
The exit took several seconds, and caused me to believe something was
wrong.  I  then logged back in, and sure enough, we had a 'sh.core' dump
file (of zero size) and my running jobs had died.  (The machine dumped.)

Later, I could not get in at all, (of course the machine was dead at
that point)  and the other machines doing NFS writes failed as well. NFS
structure IS:

This machine with this problem, call it PRIME1,  NFS serves 6 other
FBSD4.3 machines as clients. They ALL mount PRIME1's /mnt/public and all
6 write into this directory with their own files.

So, this morning, searching the net, we found several references to
"bqrelse" but none of the references seemed to assert that the fix was
such-and-such.  Does a fix exist?  By the way, I maintain multiple
FreeBSD boxes and am doing lot's of NFS activity, in addition to my
occasional SSH login.

I am willing to make queue's larger, change parameters or whatever else
it takes to make this work.

Pls help us.

===================================================================
Our search results netted the following fr July 2000

Search Result 1
From: Kirk McKusick (mckusick@mckusick.com)
Subject: Re: Panic: bqrelse: multiple refs
Newsgroups: mailing.freebsd.current
View: (This is the only article in this thread) | Original Format
Date: 2000/07/26


Date: Tue, 25 Jul 2000 11:47:03 -0400 (EDT)
 From: Brian Fundakowski Feldman <green@FreeBSD.org>
 To: Ollivier Robert <roberto@eurocontrol.fr>
 Cc: "FreeBSD Current Users' list" <freebsd-current@FreeBSD.org>,
  mckusick@mckusick.com
 Subject: Panic: lockmgr: pid 5, not exclusive lock holder 0 unlocking
 In-Reply-To: <20000725170455.F636@caerdonn.eurocontrol.fr>

 On Tue, 25 Jul 2000, Ollivier Robert wrote:

 > According to Brian Fundakowski Feldman:
 > > Actually, I'm pretty certain this is the fix:
 >
 > Well it won't panic but isn't it putting the problem under the
carpet?
 > I agree the panic seems to be here temporarely but...

 No, I'm really certain this isn't the case.  You see, struct buf has
 a b_lock that until recently was a plain, exclusive lockmgr lock.  In
 Kirk's last round of changes, he converted b_lock to be LK_CANRECURSE,
 which means that the lock, while still an exclusive lock, may be
 relocked multiple times by the same caller.

 The panics are plain wrong.  What's left is to determine what is the
 proper thing to do in each of these cases, which I'm certain that many
 people already know already (you see, I'm still a bit green ;). What I
 am _almost_ sure about is that the right thing is just to remove one
 of the locks and let it get freed back up the call chain.  I'm almost
 certain this is the case because if you are grabbing exclusive locks
 and recursing upon them, your call chain is the only consumer and in
 a recursive-locking-callchain, you will have multiple symmetric lock
 and unlock pairs.  Anything else horribly complicates things, and this
 makes me a good 95% certain that this is exactly the right fix, not
 that it's sweeping any true bugs under the carpet.

 Allowing recursive locks is pretty much the only way to solve many of
 the problems here because it's simply not possible to support all code
 paths without allowing for this recursion.  The code would either be
 horribly complicated or non-functional.  I'm certain Kirk may be able
 to back me up here.  It seems that the cleanup is meant to make the
 locks recursive mostly to facilitate correct/proper call chains, and
 that's consistent with my understand at least :)

 Indeed, if you look at the comment in brelse() from the delta, you
 will see that the intention of allowing this very situation to occur
 and simply BUF_UNLOCK() was planned for and the panic()s were for
 debugging during the previous time that b_locks weren't LK_CANRECURSE.

 As always, take what I say with a grain of salt since I'm definitely
 not a VFS guru in any manner; I just happen to think I understand this
 one :)

 > --
 > Ollivier ROBERT -=- Eurocontrol EEC/ITM -=-
Ollivier.Robert@eurocontrol.fr
 > The Postman hits! The Postman hits! You have new mail.

 --
  Brian Fundakowski Feldman           \  FreeBSD: The Power to Serve!  /

  green@FreeBSD.org                    `------------------------------'

The above explanation is correct. When I made the change to allow
recursive buffer locks, I should have removed that panic (but forgot
that I had put it in there, sigh). I have just made the change on
freefall. Sorry for the problems caused by that change.

 Kirk McKusick





--------------D3463132A50AEF4EA428D368--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3BDB5D19.1E5C55C3>