From owner-freebsd-hackers Wed Nov 13 21:00:15 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id VAA13441 for hackers-outgoing; Wed, 13 Nov 1996 21:00:15 -0800 (PST) Received: from root.com (implode.root.com [198.145.90.17]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id VAA13431 for ; Wed, 13 Nov 1996 21:00:06 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by root.com (8.7.6/8.6.5) with SMTP id UAA09784; Wed, 13 Nov 1996 20:58:13 -0800 (PST) Message-Id: <199611140458.UAA09784@root.com> X-Authentication-Warning: implode.root.com: Host localhost [127.0.0.1] didn't use HELO protocol To: Terry Lambert cc: michaelh@cet.co.jp, ponds!rivers@dg-rtp.dg.com, Hackers@FreeBSD.org Subject: Re: Even more info on daily panics... In-reply-to: Your message of "Wed, 13 Nov 1996 20:49:56 MST." <199611140349.UAA23432@phaeton.artisoft.com> From: David Greenman Reply-To: dg@root.com Date: Wed, 13 Nov 1996 20:58:13 -0800 Sender: owner-hackers@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk >> Terry, the problem has nothing to do with functional abstractions, >> automatons, layering errors, execution contexts, interface boundries, >> race conditions, or little green men from Alpha Centauri. > >> Vnodes on the free list are not allowed to have non-zero v_usecount's. > >That's not what the code says. > >The code inserts them on list wrap. What "list wrap"? What are you talking about? >The problem is clearly a bogus list insertion. No, it isn't. >Equally as clearly, a bogus list insertion in possible in the current >code because of the lock race potentially causing a list wrap during >the allocation to extend sleeping through a valid vrele. What "lock race potential"? The only one I know about is fixed by the mutexes I added to the various filesystems 1.5 years ago. ...and it doesn't matter anyway. >This is possible because the lock is not a lock on list access, it is >a lock across interface access... the lock is held in vclean above >the VOP layer and in valloc below the VOP layer. Look, Terry, none of this matters!!! The panic is caused by a vnode on the freelist having a non-zero v_usecount. That *can't* happen when vnode references are gained via vget and released via vrele. It can *only* (other than memory corruption of course) happen if vnode_pager_alloc() is passed a vnode without a reference (v_usecount == 0). If that is the case, then we have a simple reference count problem and the traceback will point to it. >The *correct* soloution is to panic the bogus list insertion at the >time of the insertion attempt (in vrele) so that the stack reflects >who is responsible for the bogus insertion. With an obviously bad >parameter, it is apparent where the parameter orginated... from the >insertion/list expansion race. We already do that. Look at the code. >It is a bogus layering abstraction that permits such code to be >written in the first place. If the abstraction were correct, the >race window would not exist. Fine. IT DOESN'T MATTER!!! I don't like spelling errors in comments, either. Someday we'll fix it, and when that day happens, we'll have a whole new set of vnode reference count bugs to deal with. Look, I encourage comments when they are constructive and work toward a fix. David's machine is crashing and he's looking for a fix. We're right on the heals of a release and I don't have the time nor the energy to get caught up in Terry's grand VFS design model. For what's it's worth, I agree with you that the current model of vnode reference/release is wrong, and we'll likely change it at some point. Doing so is very difficult, however, and arguing about it now doesn't stop David's machine from panicing. These are long-term design goals that span over the next few years and are not germane to finding and fixing this problem. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project