From owner-freebsd-hackers@FreeBSD.ORG  Sat Apr 19 16:51:46 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7D6E637B401; Sat, 19 Apr 2003 16:51:46 -0700 (PDT)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net
	[207.217.120.188])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id DE7B343FA3; Sat, 19 Apr 2003 16:51:45 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0077.cvx40-bradley.dialup.earthlink.net ([216.244.42.77]
	helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 19727T-0000ry-00; Sat, 19 Apr 2003 16:51:45 -0700
Message-ID: <3EA1E0C4.F097ADBA@mindspring.com>
Date: Sat, 19 Apr 2003 16:50:28 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Don Lewis <truckman@FreeBSD.org>
References: <200304192156.h3JLuDXB019980@gw.catspoiler.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a49862cec7e0c614debcc007437d5d55e6350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
cc: hackers@FreeBSD.org
Subject: Re: Repeated similar panics on -STABLE
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 23:51:46 -0000

Don Lewis wrote:
> > Take an interrupt somewhere around here, and have the available
> > entries removed from the freelist by an interrupt level driver.
> >
> > Or take a page fault, and have the same thing happen with
> > page-related metadata coming from the freelist in question.
> 
> How can an interrupt or another process touch the freelist while we're
> protected by splmem()?  If that were possible, the block could be stolen
> out from under us in the code below between the assignment to va and the
> update of kbb->kb_next, allocating the same block of memory to two
> different consumers.

Personally, I think it's a page fault.

In any case, the stack traces were posted about 02 Apr 2003, and
the patch fixes the problem empirically, so we can argue about why,
or we can fix the problem for everyone.


> >                         if (cp <= va)
> >                                 break;
> >                         cp -= allocsize;
> >
> > ?  The "<= saves you.
> 
> It only works because allocsize evenly divides npg*PAGE_SIZE.

Yes.

> If there was a heavy consumer of 129 byte blocks, someone might get
> the bright idea to allocate a special bucket for them because a lot
> more 129 byte blocks fit in a page than 256 byte blocks.  As the
> "for" loop iterated, we'd get to the point where va < cp < va +
> allocsize.  The "<=" test would pass, we'd decrement cp, causing it
> to be less than va, do the
>         freep->next = cp;
> assignment, return to the top of the "for" loop, do the
>         freep = (struct freelist *)cp;
> assignment, so that freep now points outside the block of memory
> allocated by kmem_malloc().  Now the "<=" test will kick us out of the
> loop and we'll do the
>         freep->next = savedlist;
> assignment and stomping on someone else's memory.

Yes.  8-).  I did exactly this, in fact, at Clickarray, though I
rounded to an 8 byte alignment boundary.  The way I did it was to
figure out the minimal number of pages to allocate at one time that
resulted in an even number of structures.

This actually saved a hell of a lot of RAM.  This is the method I
described in my first response to you.  8-).

> A safer, but slightly
> more expensive test would be
>         cp < va + allocsize

This is really painful, actually.  You can lose a full object per
page doing this.  It also makes the coelesce logic almost
incomprehensible (at least in the implementation I tried).

-- Tery