From owner-freebsd-hackers  Tue Nov 14 10:15:01 1995
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id KAA28692
          for hackers-outgoing; Tue, 14 Nov 1995 10:15:01 -0800
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id KAA28683
          for <hackers@freebsd.org>; Tue, 14 Nov 1995 10:14:54 -0800
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id LAA20459; Tue, 14 Nov 1995 11:06:20 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199511141806.LAA20459@phaeton.artisoft.com>
Subject: Re: Odd crash after inode depletion..
To: davidg@root.com
Date: Tue, 14 Nov 1995 11:06:20 -0700 (MST)
Cc: terry@lambert.org, rashid@rk.ios.com, jgreco@brasil.moneng.mei.com,
        hackers@freebsd.org
In-Reply-To: <199511141756.JAA00503@corbin.Root.COM> from "David Greenman" at Nov 14, 95 09:56:07 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 4456      
Sender: owner-hackers@freebsd.org
Precedence: bulk

> 
> >>    Do you know the difference of appearance of a cicular queue in our sources?
> >> Based on your comments, I must conclude that you do not. tailq's are NOT
> >> circular queues. The vnode free list is NOT a circular queue and it never has
> >> been in FreeBSD.
> >
> >First of all, I correctly identified the code that was screwing up from
> >a written description of the problem.  I wrote the part about the circular
> >queues before going to the code (I was going to not attempt a cursor fix
> >precisely because it doesn't matter what I post, I get attacked).
> 
>    No you did not identify the problem. vnodes on the free list are not
> supposed to *ever* have non-zero usecounts. Something gained a reference to
> the vnode without pulling it off the freelist. THAT is the problem.

int
vget(vp, lockflag)
	register struct vnode *vp;
	int lockflag;
{

	/*
	 * If the vnode is in the process of being cleaned out for another
	 * use, we wait for the cleaning to finish and then return failure.
	 * Cleaning is determined either by checking that the VXLOCK flag is
	 * set, or that the use count is zero with the back pointer set to
	 * show that it has been removed from the free list by getnewvnode.
	 * The VXLOCK flag may not have been set yet because vclean is blocked
	 * in the VOP_LOCK call waiting for the VOP_INACTIVE to complete.
	 */
	if ((vp->v_flag & VXLOCK) ||
	    (vp->v_usecount == 0 &&
		vp->v_freelist.tqe_prev == (struct vnode **) 0xdeadb)) {
		vp->v_flag |= VXWANT;
		(void) tsleep((caddr_t) vp, PINOD, "vget", 0);
		return (1);
	}
******
****** Next compare assumes node from freelist can have non-0 usecount
******
	if (vp->v_usecount == 0) {
		TAILQ_REMOVE(&vnode_free_list, vp, v_freelist);
		freevnodes--;
	}
******
****** This ordering is evil.  To make it less evil would require
****** abstracting the vnode locking from the underlying file systems
****** into a file system independent layer using an owner and a counting
****** semaphore.  Like SunOS or SVR4 or Dynix.
******
	vp->v_usecount++;
	if (lockflag)
		VOP_LOCK(vp);
	return (0);
}
> 

> >Second, it was a change to the queue management that cause the screwup,
> >since even you must admit that it didn't happen in the past, and the
> >change was to add more vnodes, and the failure mode was in the queue
> >head check that guards the allocation code.
> 
>    We did !!!NOT!!! change the queue management for vnodes! How many times do
> I have to say this?

Until you change the comments.

/*
 * Return the next vnode from the free list.
 */
int
getnewvnode(tag, mp, vops, vpp)
	enum vtagtype tag;
	struct mount *mp;
	vop_t **vops;
	struct vnode **vpp;
{
	register struct vnode *vp;

	vp = vnode_free_list.tqh_first;
	/*
	 * we allocate a new vnode if
******** * 	1. we don't have any free
******** *		Pretty obvious, we actually used to panic, but that
******** *		is a silly thing to do.
	 *	2. we havn't filled our pool yet
	 *		We don't want to trash the incore (VM-)vnodecache.
	 *	3. if less that 1/4th of our vnodes are free.
	 *		We don't want to trash the namei cache either.
	 */
	if (freevnodes < (numvnodes >> 2) ||
	    numvnodes < desiredvnodes ||
	    vp == NULL) {
		vp = (struct vnode *) malloc((u_long) sizeof *vp,
		    M_VNODE, M_WAITOK);
		bzero((char *) vp, sizeof *vp);
		numvnodes++;
	} else {
		TAILQ_REMOVE(&vnode_free_list, vp, v_freelist);
		freevnodes--;

"used to panic" == "no longer panic" == "changed code".

> >Third, I identified two potential quick hack fixes.
> 
>    Which are wrong. You don't understand the problem or the code.

Which is why I used that little adjective "potential" and didn't claim
them as fixes in my post.


> >>    The change to a circular queue for the mountlist was NOT made for aesthetic
> >> reasons. It was made because it is required to traverse the mount queue in the
> >> reverse direction in order to properly dismount filesystems (because of mount
> >> point dependencies).
> >
> >This is why god invented the stack and function recursion.  It's called a
> >depth first traversal.
> 
>    I had considered doing the mount insertion at the head. "df" then comes out
> backwards. circleq's are the correct solution.

It would have been equally valid to do a recursive descent, unmounting on
ascent instead of descent to ensure proper ordering.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.