From owner-freebsd-hackers@FreeBSD.ORG  Sun Apr 20 04:04:53 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4AB3B37B401; Sun, 20 Apr 2003 04:04:46 -0700 (PDT)
Received: from demos.su (mx.demos.su [194.87.0.32])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id D5D0A43FCB; Sun, 20 Apr 2003 04:04:44 -0700 (PDT)
	(envelope-from mitya@fling-wing.demos.su)
Received: from [194.87.5.69] (HELO fling-wing.demos.su)
  by demos.su (CommuniGate Pro SMTP 4.0.6/D4)
  with ESMTP-TLS id 67657348; Sun, 20 Apr 2003 15:04:43 +0400
Received: from fling-wing.demos.su (localhost [127.0.0.1])
	by fling-wing.demos.su (8.12.9/8.12.6) with ESMTP id h3KB4gAu085400;
	Sun, 20 Apr 2003 15:04:42 +0400 (MSD)
	(envelope-from mitya@fling-wing.demos.su)
Received: (from mitya@localhost)
	by fling-wing.demos.su (8.12.9/8.12.6/Submit) id h3KB4gBs085399;
	Sun, 20 Apr 2003 15:04:42 +0400 (MSD)
Date: Sun, 20 Apr 2003 15:04:42 +0400
From: Dmitry Sivachenko <demon@FreeBSD.org>
To: Don Lewis <truckman@FreeBSD.org>
Message-ID: <20030420110442.GA85172@fling-wing.demos.su>
References: <20030420093628.GA76333@fling-wing.demos.su>
	<200304201018.h3KAI5XB021318@gw.catspoiler.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
In-Reply-To: <200304201018.h3KAI5XB021318@gw.catspoiler.org>
WWW-Home-Page: http://mitya.pp.ru/
X-PGP-Key: http://mitya.pp.ru/mitya.asc
User-Agent: Mutt/1.5.4i
cc: hackers@FreeBSD.org
Subject: Re: Repeated similar panics on -STABLE
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Apr 2003 11:04:53 -0000

On Sun, Apr 20, 2003 at 03:18:05AM -0700, Don Lewis wrote:
> On 20 Apr, Dmitry Sivachenko wrote:
> > On Sun, Apr 20, 2003 at 01:16:16AM -0700, Don Lewis wrote:
> 
> >> If kbp is pointing to a non-existent page, why does Terry's patch seem
> >> to fix the problem for you?
> > 
> > Well, here is probably a misunderstanding..
> > We did NOT apply Terry's patch.  Let me quote a bit from my e-mail to Terry:
> > 
> > TL> Did my patch fix your problem?
> > TL>
> > TL> Or did you tune your kernel, as I suggested, to fix your problem?
> > TL>
> > TL> Or is it still a problem?
> > 
> > DS>We changed maxusers from 512 to 0 and decreased the number of
> > DS>NMBCLUSTERS.  Now everything is working fine, but since these panics occured
> > DS>about once a week I can't say for sure they are completely gone.
> > DS>Let's wait at least one more week...
> > 
> > Thus I wanted to say that we only tuned maxusers and NMBCLUSTERS.  We
> > run virgin -STABLE kernel without any patches.  Probably my english leaves much
> > to be desired ;-((
> 
> Your English seems just fine to me.
> 
> I just got the impression from Terry that the patch is what fixed the
> problem for you.
> 
> 
> >> I wonder if things are getting further munged after the trap occurs?
> >> That would make it more difficult to track down the problem from the
> >> core file.
> >> 
> >> Something else of interest to print is
> >> 	bucket[7]
> >> bucket[7].kb_next and bucket[7].kb_last might shed some light.
> >> 
> > 
> > (kgdb) up 22
> > #22 0xc015daff in malloc (size=72, type=0xc029fee0, flags=0)
> >     at /mnt/se3/releng_4/src/sys/kern/kern_malloc.c:243
> > 243             va = kbp->kb_next;
> > (kgdb) p bucket[7]
> > $1 = {kb_next = 0x5cdd8000 <Address 0x5cdd8000 out of bounds>,
> >   kb_last = 0xc8fcb000 "", kb_calls = 2127276, kb_total = 4256,
> >   kb_elmpercl = 32, kb_totalfree = 1264, kb_highwat = 160, kb_couldfree = 5497}
> > (kgdb) p bucket[7].kb_next
> > $2 = 0x5cdd8000 <Address 0x5cdd8000 out of bounds>
> > (kgdb) p bucket[7].kb_last
> > $3 = 0xc8fcb000 ""
> > (kgdb)
> 
> That explains a quite a bit.  The free list is somehow getting
> corrupted. That's why the 0x5cdd8000 value shows up in both stack
> frames. The value of kb_last looks ok, though.  Because kb_next is not
> NULL, we skip the "if" block that allocates more memory and proceed to
> line 243. Gdb is lying a bit though, the trap isn't happening on line
> 243, va is just getting the garbage value there.  The trap is actually
> happening on the next line when we try to dereference this garbage
> pointer:
> 	kbp->kb_next = ((struct freelist *)va)->next;
> 
> It sure would be nice to know the source of this wierd value.  It's
> obviously not a pointer, but it's not obvious to me what it might be.
> 
> It sure looks to me like something is writing to memory that has already
> been put back on the free list and is stomping on the next pointer in
> one of the memory blocks on the list.  When this block gets allocated
> again, malloc() does:
> 	va = kbp->kb_next;
> 	kbp->kb_next = ((struct freelist *)va)->next;
> and stores the garbage in kb_next, where we trip over it on the next
> allocation.
> 
> 
> >> One other question ... is your kernel compiled with INVARIANTS?  That
> >> changes the definition of struct freelist.
> > 
> > Without.
> 
> This could potentially be difficult to track down.  Probably the best
> bet is to go back to the previous configuration and compile with
> INVARIANTS and hope that this will catch the problem a bit closer to the
> source.
> 

OK, I'll restore our previous configuration tomorrow and add INVARIANTS.

I'll let you know when a fresh crash dump will be ready.