From owner-freebsd-questions  Wed Jul 29 01:29:49 1998
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id BAA10821
          for freebsd-questions-outgoing; Wed, 29 Jul 1998 01:29:49 -0700 (PDT)
          (envelope-from owner-freebsd-questions@FreeBSD.ORG)
Received: from allegro.lemis.com (allegro.lemis.com [192.109.197.134])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id BAA10810
          for <freebsd-questions@FreeBSD.ORG>; Wed, 29 Jul 1998 01:29:45 -0700 (PDT)
          (envelope-from grog@freebie.lemis.com)
Received: from freebie.lemis.com (freebie.lemis.com [192.109.197.137])
	by allegro.lemis.com (8.9.1/8.9.0) with ESMTP id RAA22369;
	Wed, 29 Jul 1998 17:58:58 +0930 (CST)
Received: (from grog@localhost)
	by freebie.lemis.com (8.9.1/8.9.0) id RAA28899;
	Wed, 29 Jul 1998 17:58:57 +0930 (CST)
Message-ID: <19980729175856.Q716@freebie.lemis.com>
Date: Wed, 29 Jul 1998 17:58:56 +0930
From: Greg Lehey <grog@lemis.com>
To: Les LaCroix <Les.LaCroix@Carleton.edu>, freebsd-questions@FreeBSD.ORG
Subject: Re: (long) page fault in kernel mode: suggestions?
References: <4027246050.901675410@miranda.INFOZOO.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.91.1i
In-Reply-To: <4027246050.901675410@miranda.INFOZOO.com>; from Les LaCroix on Wed, Jul 29, 1998 at 01:23:30AM -0500
WWW-Home-Page: http://www.lemis.com/~grog
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-41-739-7062
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wednesday, 29 July 1998 at  1:23:30 -0500, Les LaCroix wrote:
> I've been fighting a "fatal trap 12: page fault while in kernel mode"
> problem.  Clues are appreciated.  I'm running out of ideas.
>
> New machine (configuration below).  Crashes in a similar (if not the exactly
> the same) way with GENERIC kernel and a custom kernel with virtually
> everything removed, in both 2.2.6 and 2.2.7.  I've not changed anything in
> the kernel source.
>
> I don't have the panic screen from other days, but tonight it crashed 3
> times in 5 hours like this:
>
> Fatal trap 12: page fault while in kernel mode
> ...

You don't need this information if you have a dump.

> Each crash was the same: same instruction, stack and frame pointers, same
> everything.  gdb -k on the dumps all look like:
>
> (kgdb) symbol-file /kernel
> Reading symbols from /kernel...done.
> (kgdb) exec-file /var/crash/kernel.2
> (kgdb) core-file /var/crash/vmcore.2
> IdlePTD 1c1000
> current pcb at 1a8bb0
> panic: page fault
> #0  boot (howto=256) at ../../kern/kern_shutdown.c:266
> 266                                     dumppcb.pcb_cr3 = rcr3();
> (kgdb) where
> #0  boot (howto=256) at ../../kern/kern_shutdown.c:266
> #1  0xf010eb12 in panic (fmt=0xf017693f "page fault")
>     at ../../kern/kern_shutdown.c:400
> #2  0xf017751e in trap_fatal (frame=0xf019cf64) at
> ./../i386/i386/trap.c:772
> #3  0xf0176fe0 in trap_pfault (frame=0xf019cf64, usermode=0)
>     at ../../i386/i386/trap.c:681
> #4  0xf0176c77 in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -1073741824,
>       tf_esi = -535754628, tf_ebp = -266743880, tf_isp = -266743924,
>       tf_ebx = -260199936, tf_edx = -226815792, tf_ecx = 1073741823,
>       tf_eax = -2147483648, tf_trapno = 12, tf_err = 0, tf_eip = -535754628,
>       tf_cs = 8, tf_eflags = 66118, tf_esp = -267363380, tf_ss =
> -260199936})
>     at ../../i386/i386/trap.c:324
> #5  0xe011087c in ?? ()
>
> I'm not familiar enough (yet) with gdb and kernel debugging to try to figure
> out what's going on.  My current hunch is that something is corrupting the
> stack, changing the return address, and causing the page fault when
> something does a return.

Yes, it looks like that.  Not an easy dump to crack. 

> The machine:
>
> Epox 100Mhz 51MVP3E-M ATX board with 1MB cache:
>         bus clock       = 100 MHz
>         multiplier      = 3x
>         SDRAM clock     = CPU bus clock
> AMD K6 300 MMX CPU

Hmmm.  We haven't seen many of these yet.

> 128MB PC100 SDRAM/ECC 8ns 168-pin DIMM w/ EPROM, 100MHz Mbrds
> Seagate 6.4GB 7200 RPM IDE drive (ST36530A)
> Adaptec ISA 1520 SCSI-2 Controller (for an external ZIP, but nothing
> attached yet)

I would look carefully at this.  Not many people use them, and so
they're more likely than most to cause problems.  Try removing the
board for a while and see if the crashes continue.

> Intel EtherExpress Pro/100B
> 8MB Millenium II PCI (but not running X or doing anything but dumb console
> work yet)
> Teac 24x, IDE (ATAPI)
>
> There's nothing interesting running, usually.  I killed sendmail and cron
> (although I left inetd, syslogd, portmap and a couple getty's running).

I'd be more likely to suspect the hardware configuration.  Can you
change the Ethernet board for some other model?  If the crashes
continue after removing the SCSI board, that would be the next thing
I'd look at.  After that, if it still continues, consider temporarily
replacing the CPU with some other model. 

Greg
--
See complete headers for address and phone numbers
finger grog@lemis.com for PGP public key

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message