From owner-freebsd-hackers@FreeBSD.ORG Sun Apr 20 04:04:53 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4AB3B37B401; Sun, 20 Apr 2003 04:04:46 -0700 (PDT) Received: from demos.su (mx.demos.su [194.87.0.32]) by mx1.FreeBSD.org (Postfix) with ESMTP id D5D0A43FCB; Sun, 20 Apr 2003 04:04:44 -0700 (PDT) (envelope-from mitya@fling-wing.demos.su) Received: from [194.87.5.69] (HELO fling-wing.demos.su) by demos.su (CommuniGate Pro SMTP 4.0.6/D4) with ESMTP-TLS id 67657348; Sun, 20 Apr 2003 15:04:43 +0400 Received: from fling-wing.demos.su (localhost [127.0.0.1]) by fling-wing.demos.su (8.12.9/8.12.6) with ESMTP id h3KB4gAu085400; Sun, 20 Apr 2003 15:04:42 +0400 (MSD) (envelope-from mitya@fling-wing.demos.su) Received: (from mitya@localhost) by fling-wing.demos.su (8.12.9/8.12.6/Submit) id h3KB4gBs085399; Sun, 20 Apr 2003 15:04:42 +0400 (MSD) Date: Sun, 20 Apr 2003 15:04:42 +0400 From: Dmitry Sivachenko To: Don Lewis Message-ID: <20030420110442.GA85172@fling-wing.demos.su> References: <20030420093628.GA76333@fling-wing.demos.su> <200304201018.h3KAI5XB021318@gw.catspoiler.org> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <200304201018.h3KAI5XB021318@gw.catspoiler.org> WWW-Home-Page: http://mitya.pp.ru/ X-PGP-Key: http://mitya.pp.ru/mitya.asc User-Agent: Mutt/1.5.4i cc: hackers@FreeBSD.org Subject: Re: Repeated similar panics on -STABLE X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Apr 2003 11:04:53 -0000 On Sun, Apr 20, 2003 at 03:18:05AM -0700, Don Lewis wrote: > On 20 Apr, Dmitry Sivachenko wrote: > > On Sun, Apr 20, 2003 at 01:16:16AM -0700, Don Lewis wrote: > > >> If kbp is pointing to a non-existent page, why does Terry's patch seem > >> to fix the problem for you? > > > > Well, here is probably a misunderstanding.. > > We did NOT apply Terry's patch. Let me quote a bit from my e-mail to Terry: > > > > TL> Did my patch fix your problem? > > TL> > > TL> Or did you tune your kernel, as I suggested, to fix your problem? > > TL> > > TL> Or is it still a problem? > > > > DS>We changed maxusers from 512 to 0 and decreased the number of > > DS>NMBCLUSTERS. Now everything is working fine, but since these panics occured > > DS>about once a week I can't say for sure they are completely gone. > > DS>Let's wait at least one more week... > > > > Thus I wanted to say that we only tuned maxusers and NMBCLUSTERS. We > > run virgin -STABLE kernel without any patches. Probably my english leaves much > > to be desired ;-(( > > Your English seems just fine to me. > > I just got the impression from Terry that the patch is what fixed the > problem for you. > > > >> I wonder if things are getting further munged after the trap occurs? > >> That would make it more difficult to track down the problem from the > >> core file. > >> > >> Something else of interest to print is > >> bucket[7] > >> bucket[7].kb_next and bucket[7].kb_last might shed some light. > >> > > > > (kgdb) up 22 > > #22 0xc015daff in malloc (size=72, type=0xc029fee0, flags=0) > > at /mnt/se3/releng_4/src/sys/kern/kern_malloc.c:243 > > 243 va = kbp->kb_next; > > (kgdb) p bucket[7] > > $1 = {kb_next = 0x5cdd8000
, > > kb_last = 0xc8fcb000 "", kb_calls = 2127276, kb_total = 4256, > > kb_elmpercl = 32, kb_totalfree = 1264, kb_highwat = 160, kb_couldfree = 5497} > > (kgdb) p bucket[7].kb_next > > $2 = 0x5cdd8000
> > (kgdb) p bucket[7].kb_last > > $3 = 0xc8fcb000 "" > > (kgdb) > > That explains a quite a bit. The free list is somehow getting > corrupted. That's why the 0x5cdd8000 value shows up in both stack > frames. The value of kb_last looks ok, though. Because kb_next is not > NULL, we skip the "if" block that allocates more memory and proceed to > line 243. Gdb is lying a bit though, the trap isn't happening on line > 243, va is just getting the garbage value there. The trap is actually > happening on the next line when we try to dereference this garbage > pointer: > kbp->kb_next = ((struct freelist *)va)->next; > > It sure would be nice to know the source of this wierd value. It's > obviously not a pointer, but it's not obvious to me what it might be. > > It sure looks to me like something is writing to memory that has already > been put back on the free list and is stomping on the next pointer in > one of the memory blocks on the list. When this block gets allocated > again, malloc() does: > va = kbp->kb_next; > kbp->kb_next = ((struct freelist *)va)->next; > and stores the garbage in kb_next, where we trip over it on the next > allocation. > > > >> One other question ... is your kernel compiled with INVARIANTS? That > >> changes the definition of struct freelist. > > > > Without. > > This could potentially be difficult to track down. Probably the best > bet is to go back to the previous configuration and compile with > INVARIANTS and hope that this will catch the problem a bit closer to the > source. > OK, I'll restore our previous configuration tomorrow and add INVARIANTS. I'll let you know when a fresh crash dump will be ready.