From owner-freebsd-hackers Thu Nov 14 11:40:09 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id LAA29208 for hackers-outgoing; Thu, 14 Nov 1996 11:40:09 -0800 (PST) Received: from phaeton.artisoft.com ([198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id LAA29139 for ; Thu, 14 Nov 1996 11:40:02 -0800 (PST) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id MAA24570; Thu, 14 Nov 1996 12:28:04 -0700 From: Terry Lambert Message-Id: <199611141928.MAA24570@phaeton.artisoft.com> Subject: Re: Even more info on daily panics... To: dg@root.com Date: Thu, 14 Nov 1996 12:28:03 -0700 (MST) Cc: terry@lambert.org, michaelh@cet.co.jp, ponds!rivers@dg-rtp.dg.com, Hackers@FreeBSD.org In-Reply-To: <199611140458.UAA09784@root.com> from "David Greenman" at Nov 13, 96 08:58:13 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk > >> Vnodes on the free list are not allowed to have non-zero v_usecount's. > > > >That's not what the code says. > > > >The code inserts them on list wrap. > > What "list wrap"? What are you talking about? I've explained this 4 or 5 times now. I'm not going to explain it again. Your feigned incredulity is simply an ill-disguised ad hominum attack to make my statements appear less credible than they are. If you want to attack them, attack them on logical grounds. > >The problem is clearly a bogus list insertion. > > No, it isn't. Let's see... IF Bad data is on the list AND Data gets on the list via insertion THEN Bad data was inserted on the list > >Equally as clearly, a bogus list insertion in possible in the current > >code because of the lock race potentially causing a list wrap during > >the allocation to extend sleeping through a valid vrele. > > What "lock race potential"? The only one I know about is fixed by the > mutexes I added to the various filesystems 1.5 years ago. ...and it doesn't > matter anyway. Then explain the crashes. Explain why the panic stack tracebak didn't lead you to say "oh, just change this line of code in this function in the traceback". > >This is possible because the lock is not a lock on list access, it is > >a lock across interface access... the lock is held in vclean above > >the VOP layer and in valloc below the VOP layer. > > Look, Terry, none of this matters!!! The panic is caused by a vnode on > the freelist having a non-zero v_usecount. That *can't* happen when vnode > references are gained via vget and released via vrele. It can *only* (other > than memory corruption of course) happen if vnode_pager_alloc() is passed a > vnode without a reference (v_usecount == 0). If that is the case, then we > have a simple reference count problem and the traceback will point to it. >From getnewvnode() in vfs_subr.c: if (freevnodes < (numvnodes >> 2) || numvnodes < desiredvnodes || vnode_free_list.tqh_first == NULL) { simple_unlock(&vnode_free_list_slock); ************* ************* vp = (struct vnode *) malloc((u_long) sizeof *vp, M_VNODE, M_WAITOK); ******** ******** bzero((char *) vp, sizeof *vp); numvnodes++; } else { > >The *correct* soloution is to panic the bogus list insertion at the > >time of the insertion attempt (in vrele) so that the stack reflects > >who is responsible for the bogus insertion. With an obviously bad > >parameter, it is apparent where the parameter orginated... from the > >insertion/list expansion race. > > We already do that. Look at the code. That is a reactive panic, and will not identify the cause of the problem (hence your inability to fix the problem for David with a simple and obvious-from-context hack). It is still possible to have a cascade failure hide the real problem. This is what I am claiming is happening. > Someday we'll fix it, and when that day happens, we'll have a whole new set of > vnode reference count bugs to deal with. > Look, I encourage comments when they are constructive and work toward a > fix. David's machine is crashing and he's looking for a fix. We're right on > the heals of a release and I don't have the time nor the energy to get caught > up in Terry's grand VFS design model. If that were all it was, David's problems would not have gone away for 4+ days now with my patch. There are at least three people running the patch over the past 15 months, besides myself, and all of them have reported (to me) that the problem has gone away. Like it or not, I have identified a problem and provided a patch. Maybe what is upsetting you is that I labelled the patch a kludge? > For what's it's worth, I agree with you that the current model of > vnode reference/release is wrong, and we'll likely change it at some > point. Doing so is very difficult, however, and arguing about it now > doesn't stop David's machine from panicing. My patch does. And I'm not arguing about it. I argued in June of 1995, when I first published patches. Now I'm simply explaining why the existing code acts like it does. I've provided a kludge patch to get around the problem, without requiring that you actually fix it to use the patch. In my book, that's bending over backwards: "here's something that will stop the pain, but you really should consider corrective surgery". At worst, I'm editorializing about a problem that has existed as recognized but unaddressed (for no justifiable reason) for two years. This is very different from "demanding" you actually do something about it... I am only raising its visibility. Of course, you *could* do something about it, and take the wind right out of my sails, and force me to look elsewhere for unaddressed problems... but I'm hardly "arguing about it". > These are long-term design goals that span over the next few years > and are not germane to finding and fixing this problem. The "problem" of NULL pointer dereferences causing a core dump could also be "addressed" by mapping page 0. Better to address the actual problem. You have already addressed the FS problems that would result in a nonzero reference count (don't latch bogus data that could cause the black box to malfunction onto the inputs of the black box). The only thing left that could be the cause is a race condition (the black box is internally malfunctioning at a boundry condition). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.