Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Aug 2001 16:36:18 -0400
From:      Michael Lucas <mwlucas@blackhelicopters.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        current@FreeBSD.org
Subject:   Re: devfs and Vinum (was: any -current && vinum problems?)
Message-ID:  <20010816163617.A52310@blackhelicopters.org>
In-Reply-To: <XFMail.010815164221.jhb@FreeBSD.org>; from jhb@FreeBSD.org on Wed, Aug 15, 2001 at 04:42:21PM -0700
References:  <20010815181100.A48748@blackhelicopters.org> <XFMail.010815164221.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
[cc's trimmed]

John,

Thanks for the suggestion, I appreciate it.  I did as you suggested
(diff below).

It paniced again, but this time savecore said "dump time is unreasonable."

The short panic message was:

panicstr: bremfree: bp 0xcc2a1ae4 not locked

Looks like the same thing to me, sorry.  Any other suggestions?

magpire/sys/kern;diff subr_witness.c subr_witness.c-dist 
392a393
>       mtx_lock(&all_mtx);
395d395
<       mtx_lock(&all_mtx);
magpire/sys/kern;diff -c subr_witness.c subr_witness.c-dist
*** subr_witness.c      Thu Aug 16 16:16:06 2001
--- subr_witness.c-dist Thu Aug 16 16:15:20 2001
***************
*** 390,398 ****
                mtx_unlock_spin(&w_mtx);
        }
  
        lock_cur_cnt--;
        STAILQ_REMOVE(&all_locks, lock, lock_object, lo_list);
-       mtx_lock(&all_mtx);
        lock->lo_flags &= ~LO_INITIALIZED;
        mtx_unlock(&all_mtx);
  }
--- 390,398 ----
                mtx_unlock_spin(&w_mtx);
        }
  
+       mtx_lock(&all_mtx);
        lock_cur_cnt--;
        STAILQ_REMOVE(&all_locks, lock, lock_object, lo_list);
        lock->lo_flags &= ~LO_INITIALIZED;
        mtx_unlock(&all_mtx);
  }
magpire/sys/kern;



On Wed, Aug 15, 2001 at 04:42:21PM -0700, John Baldwin wrote:
> 
> On 15-Aug-01 Michael Lucas wrote:
> > On Wed, Aug 15, 2001 at 10:21:39AM +0930, Greg Lehey wrote:
> >> To help localize this problem, could you please try this same thing on
> >> a kernel without devfs?  The dump you sent me did not look like a
> >> Vinum bug, as I said in my reply.
> > 
> > Sorry, it happens on a non-devfs kernel as well.  Since it doesn't
> > appear to be a Vinum bug, I'm taking the liberty of sending the whole
> > thing to -current.  (I sent my first dump to Greg in particular, since
> > a Vinum command triggered whatever this is.)
> > 
> 
> > Script started on Wed Aug 15 17:57:48 2001
> > magpire/var/crash;file /boot/kernel/vinum.ko 
> > /boot/kernel/vinum.ko: ELF 32-bit LSB shared object, Intel 80386, version 1
> > (FreeBSD), not stripped
> > magpire/var/crash;file kernel.debug.nodevfs 
> > kernel.debug.nodevfs: ELF 32-bit LSB executable, Intel 80386, version 1
> > (FreeBSD), dynamically linked (uses shared libs), not stripped
> > magpire/var/crash;gdb -k kernel.debug.nodevfs vmcore.3 
> > GNU gdb 4.18
> > Copyright 1998 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and you are
> > welcome to change it and/or distribute copies of it under certain conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > This GDB was configured as "i386-unknown-freebsd"...
> > IdlePTD 4284416
> > initial pcb at 34b860
> > panicstr: bremfree: bp 0xcc2a1ae4 not locked
> 
> Unfortunately this is the panic message from later on during the syncing disks
> stage, not the real panic. :(
> 
> >#15 0xc01f0783 in witness_destroy (lock=0xc1ec4e68) at
> >#../../../kern/subr_witness.c:395
> 
> This is the real problem:
> 
>         mtx_lock(&all_mtx);
>         lock_cur_cnt--;
>         STAILQ_REMOVE(&all_locks, lock, lock_object, lo_list);
>         lock->lo_flags &= ~LO_INITIALIZED;
>         mtx_unlock(&all_mtx);
> 
> It panics in the STAILQ_REMOVE().  I've seen this a couple of times but have no
> idea how that list pointer is getting corrupted.  My guess is that a mutex is
> being destroyed twice or something dumb like that; however, I'm not sure how.
> The LO_INITIALIZED flags and checks are supposed to catch that case.  I suppose
> there is a chance we could preempt in between the LO_INITIALIZED check and the
> actual removal and then free it and get in trouble that way.  Hmm.  Try moving
> the mtx_lock of &all_mtx before the check for LO_INITIALIZED and see if you can
> get a different panic.  It may be a bug in the ucred stuff.  (At least several
> other panics of this type have been the result of crfree's.)
> 
> -- 
> 
> John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
> PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
> "Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

-- 
Michael Lucas
mwlucas@blackhelicopters.org
http://www.blackhelicopters.org/~mwlucas/
Big Scary Daemons: http://www.oreillynet.com/pub/q/Big_Scary_Daemons

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010816163617.A52310>