Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Nov 1996 20:58:13 -0800
From:      David Greenman <dg@root.com>
To:        Terry Lambert <terry@lambert.org>
Cc:        michaelh@cet.co.jp, ponds!rivers@dg-rtp.dg.com, Hackers@FreeBSD.org
Subject:   Re: Even more info on daily panics... 
Message-ID:  <199611140458.UAA09784@root.com>
In-Reply-To: Your message of "Wed, 13 Nov 1996 20:49:56 MST." <199611140349.UAA23432@phaeton.artisoft.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
>>    Terry, the problem has nothing to do with functional abstractions,
>> automatons, layering errors, execution contexts, interface boundries,
>> race conditions, or little green men from Alpha Centauri.
>
>>    Vnodes on the free list are not allowed to have non-zero v_usecount's.
>
>That's not what the code says.
>
>The code inserts them on list wrap.

   What "list wrap"? What are you talking about?

>The problem is clearly a bogus list insertion.

   No, it isn't.

>Equally as clearly, a bogus list insertion in possible in the current
>code because of the lock race potentially causing a list wrap during
>the allocation to extend sleeping through a valid vrele.

   What "lock race potential"? The only one I know about is fixed by the
mutexes I added to the various filesystems 1.5 years ago. ...and it doesn't
matter anyway.

>This is possible because the lock is not a lock on list access, it is
>a lock across interface access... the lock is held in vclean above
>the VOP layer and in valloc below the VOP layer.

   Look, Terry, none of this matters!!! The panic is caused by a vnode on
the freelist having a non-zero v_usecount. That *can't* happen when vnode
references are gained via vget and released via vrele. It can *only* (other
than memory corruption of course) happen if vnode_pager_alloc() is passed a
vnode without a reference (v_usecount == 0). If that is the case, then we
have a simple reference count problem and the traceback will point to it.

>The *correct* soloution is to panic the bogus list insertion at the
>time of the insertion attempt (in vrele) so that the stack reflects
>who is responsible for the bogus insertion.  With an obviously bad
>parameter, it is apparent where the parameter orginated... from the
>insertion/list expansion race.

   We already do that. Look at the code.

>It is a bogus layering abstraction that permits such code to be
>written in the first place.  If the abstraction were correct, the
>race window would not exist.

   Fine. IT DOESN'T MATTER!!! I don't like spelling errors in comments, either.
Someday we'll fix it, and when that day happens, we'll have a whole new set of
vnode reference count bugs to deal with.
   Look, I encourage comments when they are constructive and work toward a
fix. David's machine is crashing and he's looking for a fix. We're right on
the heals of a release and I don't have the time nor the energy to get caught
up in Terry's grand VFS design model. For what's it's worth, I agree with you
that the current model of vnode reference/release is wrong, and we'll likely
change it at some point. Doing so is very difficult, however, and arguing
about it now doesn't stop David's machine from panicing. These are long-term
design goals that span over the next few years and are not germane to finding
and fixing this problem.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199611140458.UAA09784>