From owner-freebsd-hackers Tue Nov 14 10:15:01 1995 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id KAA28692 for hackers-outgoing; Tue, 14 Nov 1995 10:15:01 -0800 Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id KAA28683 for ; Tue, 14 Nov 1995 10:14:54 -0800 Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id LAA20459; Tue, 14 Nov 1995 11:06:20 -0700 From: Terry Lambert Message-Id: <199511141806.LAA20459@phaeton.artisoft.com> Subject: Re: Odd crash after inode depletion.. To: davidg@root.com Date: Tue, 14 Nov 1995 11:06:20 -0700 (MST) Cc: terry@lambert.org, rashid@rk.ios.com, jgreco@brasil.moneng.mei.com, hackers@freebsd.org In-Reply-To: <199511141756.JAA00503@corbin.Root.COM> from "David Greenman" at Nov 14, 95 09:56:07 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 4456 Sender: owner-hackers@freebsd.org Precedence: bulk > > >> Do you know the difference of appearance of a cicular queue in our sources? > >> Based on your comments, I must conclude that you do not. tailq's are NOT > >> circular queues. The vnode free list is NOT a circular queue and it never has > >> been in FreeBSD. > > > >First of all, I correctly identified the code that was screwing up from > >a written description of the problem. I wrote the part about the circular > >queues before going to the code (I was going to not attempt a cursor fix > >precisely because it doesn't matter what I post, I get attacked). > > No you did not identify the problem. vnodes on the free list are not > supposed to *ever* have non-zero usecounts. Something gained a reference to > the vnode without pulling it off the freelist. THAT is the problem. int vget(vp, lockflag) register struct vnode *vp; int lockflag; { /* * If the vnode is in the process of being cleaned out for another * use, we wait for the cleaning to finish and then return failure. * Cleaning is determined either by checking that the VXLOCK flag is * set, or that the use count is zero with the back pointer set to * show that it has been removed from the free list by getnewvnode. * The VXLOCK flag may not have been set yet because vclean is blocked * in the VOP_LOCK call waiting for the VOP_INACTIVE to complete. */ if ((vp->v_flag & VXLOCK) || (vp->v_usecount == 0 && vp->v_freelist.tqe_prev == (struct vnode **) 0xdeadb)) { vp->v_flag |= VXWANT; (void) tsleep((caddr_t) vp, PINOD, "vget", 0); return (1); } ****** ****** Next compare assumes node from freelist can have non-0 usecount ****** if (vp->v_usecount == 0) { TAILQ_REMOVE(&vnode_free_list, vp, v_freelist); freevnodes--; } ****** ****** This ordering is evil. To make it less evil would require ****** abstracting the vnode locking from the underlying file systems ****** into a file system independent layer using an owner and a counting ****** semaphore. Like SunOS or SVR4 or Dynix. ****** vp->v_usecount++; if (lockflag) VOP_LOCK(vp); return (0); } > > >Second, it was a change to the queue management that cause the screwup, > >since even you must admit that it didn't happen in the past, and the > >change was to add more vnodes, and the failure mode was in the queue > >head check that guards the allocation code. > > We did !!!NOT!!! change the queue management for vnodes! How many times do > I have to say this? Until you change the comments. /* * Return the next vnode from the free list. */ int getnewvnode(tag, mp, vops, vpp) enum vtagtype tag; struct mount *mp; vop_t **vops; struct vnode **vpp; { register struct vnode *vp; vp = vnode_free_list.tqh_first; /* * we allocate a new vnode if ******** * 1. we don't have any free ******** * Pretty obvious, we actually used to panic, but that ******** * is a silly thing to do. * 2. we havn't filled our pool yet * We don't want to trash the incore (VM-)vnodecache. * 3. if less that 1/4th of our vnodes are free. * We don't want to trash the namei cache either. */ if (freevnodes < (numvnodes >> 2) || numvnodes < desiredvnodes || vp == NULL) { vp = (struct vnode *) malloc((u_long) sizeof *vp, M_VNODE, M_WAITOK); bzero((char *) vp, sizeof *vp); numvnodes++; } else { TAILQ_REMOVE(&vnode_free_list, vp, v_freelist); freevnodes--; "used to panic" == "no longer panic" == "changed code". > >Third, I identified two potential quick hack fixes. > > Which are wrong. You don't understand the problem or the code. Which is why I used that little adjective "potential" and didn't claim them as fixes in my post. > >> The change to a circular queue for the mountlist was NOT made for aesthetic > >> reasons. It was made because it is required to traverse the mount queue in the > >> reverse direction in order to properly dismount filesystems (because of mount > >> point dependencies). > > > >This is why god invented the stack and function recursion. It's called a > >depth first traversal. > > I had considered doing the mount insertion at the head. "df" then comes out > backwards. circleq's are the correct solution. It would have been equally valid to do a recursive descent, unmounting on ascent instead of descent to ensure proper ordering. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.