From owner-freebsd-hackers  Sun Nov 19 14:56:56 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from mail.matriplex.com (ns1.matriplex.com [208.131.42.8])
	by hub.freebsd.org (Postfix) with ESMTP
	id 752BE37B479; Sun, 19 Nov 2000 14:56:53 -0800 (PST)
Received: from mail.matriplex.com (mail.matriplex.com [208.131.42.9])
	by mail.matriplex.com (8.9.2/8.9.2) with ESMTP id OAA53236;
	Sun, 19 Nov 2000 14:56:47 -0800 (PST)
	(envelope-from rh@matriplex.com)
Date: Sun, 19 Nov 2000 14:56:47 -0800 (PST)
From: Richard Hodges <rh@matriplex.com>
To: Mike Smith <msmith@FreeBSD.ORG>
Cc: freebsd-hackers@FreeBSD.ORG
Subject: Re: page fault question 
In-Reply-To: <200011151129.eAFBToF02993@mass.osd.bsdi.com>
Message-ID: <Pine.BSF.4.10.10011191443280.52841-100000@mail.matriplex.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 15 Nov 2000, Mike Smith wrote:

> > I have been having a great time :-) debugging a device driver,
> > and have run into a really fun way to panic.  With one type 
> > of traffic, [something] happens and the kernel drops into
> > DDB, just the way I want.

[snip panic info]

> This is pretty normal; ddb is a little fragile sometimes.  You want to go 
> back and look at the very first trap; it will probably be different and 
> will be the *real* trap.  All the rest are just ddb exploding.

Yep.  Unfortunately, the original trap led me on a wild goose chase,
trying to figure out why system memory was being overwritten by
received device data.  I really suspected something funny in the DMA...

It turns out that the network stack gets really unhappy when you
trim an mbuf chain and leave the last mbuf with a negative length :-(

> > Now looking back at the panic message, it looks like the stack has
> > pushed into the "frame pointer".  Is this an actual problem, or
> > just some side effect of the page fault?
 
> The frame pointer is a pointer into the stack, so no, it's not a problem.

Of course (doh!)  I realized that shortly after posting.  

> Typically stack overruns lead to double faults (because there's no stack 
> on which to handle the fault) and a spontaneous reboot.  This just sounds 
> like there's something about your first trap that kills DDB (eg. an 
> invalid instruction pointer, etc.)

I did check the SP, and it looks like the kernel stack stays in the
"temporary" 8k stack set up in i386/i386/locore.s  Does that sound right?

> Hope this helps; let us know if the first trap isn't any more 
> illuminating.  You might also try using remote gdb instead of ddb.

Thanks.  I also had to dig out a couple bugs involving word alignment
when doing DMA transfers, and learned NOT to mess with the data inside
mbufs with external data ;-)  I guess I've left enough offerings at
the altar of stupidity, so maybe Loki will leave me alone now.

All the best,

-Richard

-------------------------------------------
   Richard Hodges   | Matriplex, inc.
      <title>       | 769 Basque Way
  rh@matriplex.com  | Carson City, NV 89706
    775-886-6477    | www.matriplex.com 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message