FreeBSD Mail Archives

Date:      Wed, 9 Apr 1997 10:09:09 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        ponds!rivers@dg-rtp.dg.com (Thomas David Rivers)
Cc:        hackers@freebsd.org
Subject:   Re: Some insight on "dup alloc" problems.....
Message-ID:  <199704091709.KAA07150@phaeton.artisoft.com>
In-Reply-To: <199704090059.UAA07429@lakes.water.net> from "Thomas David Rivers" at Apr 8, 97 08:59:07 pm




>  I have it on good authority that there is a mutex lock there to
> prevent this problem... can you point me to the 2.1.7 source you're
> talking about (with say, some arrows labeling "<<< right here", no
> need to be less than obvious...)  and we can verify that.
> 
>  Ummm... besides; the location I'm dealing with doesn't sleep,
> it could only be interrupted by something higher than splbio().
> [like a hardclock, for instance, which I'm not even sure FreeBSD
> has - I haven't read that source yet.]

kern/vfs_subr.c:
| int
| getnewvnode(tag, mp, vops, vpp)
| 	enum vtagtype tag;
| 	struct mount *mp;
| 	vop_t **vops;
| 	struct vnode **vpp;
| {
| 	register struct vnode *vp;
| 
| retry:
| 	vp = vnode_free_list.tqh_first;
| 	if (freevnodes < (numvnodes >> 2) ||
| 	    numvnodes < desiredvnodes ||
| 	    vp == NULL) {
| 		vp = (struct vnode *) malloc((u_long) sizeof *vp,
| 		    M_VNODE, M_WAITOK);
                *************************************************
| 		bzero((char *) vp, sizeof *vp);
| 		numvnodes++;
| 	} else {

kern/kern_malloc.c:
| void *
| malloc(size, type, flags)
  *************************
### malloc( sizeof(struct vnode), M_VNODE, M_WAITOK)
| 	unsigned long size;
| 	int type, flags;
| {
| 	register struct kmembuckets *kbp;
| 	register struct kmemusage *kup;
| 	register struct freelist *freep;
| 	long indx, npg, allocsize;
| 	int s;
| 	caddr_t va, cp, savedlist;
| 	register struct kmemstats *ksp = &kmemstats[type];
        **************************************************
### ksp = &kmemstats[ M_VNODE];
| 
| 	if (((unsigned long)type) > M_LAST)
| 		panic("malloc - bogus type");
| 	indx = BUCKETINDX(size);
| 	kbp = &bucket[indx];
| 	s = splhigh();
| 	while (ksp->ks_memuse >= ksp->ks_limit) {
| 		if (flags & M_NOWAIT) {
| 			splx(s);
| 			return ((void *) NULL);
| 		}
| 		if (ksp->ks_limblocks < 65535)
| 			ksp->ks_limblocks++;
| 		tsleep((caddr_t)ksp, PSWP+2, memname[type], 0);
                ***********************************************
### tsleep((caddr_t)&kmemstats[ M_VNODE], PSWP+2, memname[M_VNODE], 0);
###
### All getnewvnode callers sleep on this same address if the soft limit
###      is reached.
### All getnewvnode callers sleeping on this address wake up at the
###      same time (only ksp->ks_limblocks is manipulated inside the
###      loop, and the loop is conditional on a variable modified
###      later in the function.
###
| 	}
| 	ksp->ks_size |= 1 << indx;
| 	if (kbp->kb_next == NULL) {
            ********************
### Say kbp->kb_next is not NULL; say that while we were sleeping, we
### had one or more processes returning stuff.
###             [ ... elided code ... ]
| 	}
| 	va = kbp->kb_next;
| 	kbp->kb_next = ((struct freelist *)va)->next;
        *********************************************
###
### Terry thinks the ->next is not correctly set because the buffer
###     is not really free at the time that the freeing context sleeps,
###     is interrupted, etc..
###
### How can this happen?  Well, it takes three processes: one to go
###     in and block on the bucket sleep, another to go in and block
###     on the bucket sleep, and another to go in and do a free to
###     split the continuation path for the first two, and do a
###     wakeup which wakes up both, when only one should be awakened.
###     Then the first one that wakes up has to go block in the
###     pager so that the second one can "catch up" and screw it.
###
### How do I test this theory?  Well, look at free() in kern/kern_malloc.c
###     ...can the diagnostic for multiple frees be triggered even if
###     there are not multiple explicit calls to free()?
###
| 	kup = btokup(va);
| 	if (kup->ku_indx != indx)
| 		panic("malloc: wrong bucket");
| 	if (kup->ku_freecnt == 0)
| 		panic("malloc: lost data");
| 	kup->ku_freecnt--;
| 	kbp->kb_totalfree--;
| 	ksp->ks_memuse += 1 << indx;
| out:
| 	kbp->kb_calls++;
| 	ksp->ks_inuse++;
| 	ksp->ks_calls++;
| 	if (ksp->ks_memuse > ksp->ks_maxused)
| 		ksp->ks_maxused = ksp->ks_memuse;
| 	splx(s);
        ********
###
### The interrupt theory, shot to hell...
###
| 	return ((void *) va);
| }



I'm *NOT* talking about the vm_map_lock() in kmem_malloc() in vm/vm_kern.c;
*that* IS a mutex that isn't reentered (l->want_write is tested, and
l->want_write is set, with no chance of reentrancy in lock_write()).


How can I test this theory and fix the bug?  Protect getnewvnode() so that
it can not be reentered.  If the problem still occurs, you need to use
the same lock address to protect other vnode manipulation routines in
vfs_subr.c (do not use the lock twoce in the same call graph); eventually,
you will isolate the list corruption and the problem will go away.


Other things to try:

1)	Turn off KMEMSTATS.  This will be hard to do; the code would
	not be unconditionally on if it could be safely turned off.

2)	Turn on DIAGNOSTIC and see if you get a different panic than
	the "dup alloc" panic.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199704091709.KAA07150>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation