Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Apr 2003 03:18:05 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        demon@FreeBSD.org
Cc:        hackers@FreeBSD.org
Subject:   Re: Repeated similar panics on -STABLE
Message-ID:  <200304201018.h3KAI5XB021318@gw.catspoiler.org>
In-Reply-To: <20030420093628.GA76333@fling-wing.demos.su>

next in thread | previous in thread | raw e-mail | index | archive | help
On 20 Apr, Dmitry Sivachenko wrote:
> On Sun, Apr 20, 2003 at 01:16:16AM -0700, Don Lewis wrote:

>> If kbp is pointing to a non-existent page, why does Terry's patch seem
>> to fix the problem for you?
> 
> Well, here is probably a misunderstanding..
> We did NOT apply Terry's patch.  Let me quote a bit from my e-mail to Terry:
> 
> TL> Did my patch fix your problem?
> TL>
> TL> Or did you tune your kernel, as I suggested, to fix your problem?
> TL>
> TL> Or is it still a problem?
> 
> DS>We changed maxusers from 512 to 0 and decreased the number of
> DS>NMBCLUSTERS.  Now everything is working fine, but since these panics occured
> DS>about once a week I can't say for sure they are completely gone.
> DS>Let's wait at least one more week...
> 
> Thus I wanted to say that we only tuned maxusers and NMBCLUSTERS.  We
> run virgin -STABLE kernel without any patches.  Probably my english leaves much
> to be desired ;-((

Your English seems just fine to me.

I just got the impression from Terry that the patch is what fixed the
problem for you.


>> I wonder if things are getting further munged after the trap occurs?
>> That would make it more difficult to track down the problem from the
>> core file.
>> 
>> Something else of interest to print is
>> 	bucket[7]
>> bucket[7].kb_next and bucket[7].kb_last might shed some light.
>> 
> 
> (kgdb) up 22
> #22 0xc015daff in malloc (size=72, type=0xc029fee0, flags=0)
>     at /mnt/se3/releng_4/src/sys/kern/kern_malloc.c:243
> 243             va = kbp->kb_next;
> (kgdb) p bucket[7]
> $1 = {kb_next = 0x5cdd8000 <Address 0x5cdd8000 out of bounds>,
>   kb_last = 0xc8fcb000 "", kb_calls = 2127276, kb_total = 4256,
>   kb_elmpercl = 32, kb_totalfree = 1264, kb_highwat = 160, kb_couldfree = 5497}
> (kgdb) p bucket[7].kb_next
> $2 = 0x5cdd8000 <Address 0x5cdd8000 out of bounds>
> (kgdb) p bucket[7].kb_last
> $3 = 0xc8fcb000 ""
> (kgdb)

That explains a quite a bit.  The free list is somehow getting
corrupted. That's why the 0x5cdd8000 value shows up in both stack
frames. The value of kb_last looks ok, though.  Because kb_next is not
NULL, we skip the "if" block that allocates more memory and proceed to
line 243. Gdb is lying a bit though, the trap isn't happening on line
243, va is just getting the garbage value there.  The trap is actually
happening on the next line when we try to dereference this garbage
pointer:
	kbp->kb_next = ((struct freelist *)va)->next;

It sure would be nice to know the source of this wierd value.  It's
obviously not a pointer, but it's not obvious to me what it might be.

It sure looks to me like something is writing to memory that has already
been put back on the free list and is stomping on the next pointer in
one of the memory blocks on the list.  When this block gets allocated
again, malloc() does:
	va = kbp->kb_next;
	kbp->kb_next = ((struct freelist *)va)->next;
and stores the garbage in kb_next, where we trip over it on the next
allocation.


>> One other question ... is your kernel compiled with INVARIANTS?  That
>> changes the definition of struct freelist.
> 
> Without.

This could potentially be difficult to track down.  Probably the best
bet is to go back to the previous configuration and compile with
INVARIANTS and hope that this will catch the problem a bit closer to the
source.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200304201018.h3KAI5XB021318>