From owner-freebsd-stable@FreeBSD.ORG  Wed May 26 14:35:35 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A401E16A4D2
	for <stable@freebsd.org>; Wed, 26 May 2004 14:35:35 -0700 (PDT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DA8F143D53
	for <stable@freebsd.org>; Wed, 26 May 2004 14:35:33 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i4QLYgSM026401;
	Wed, 26 May 2004 17:34:42 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)i4QLYfgx026398;
	Wed, 26 May 2004 17:34:42 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Wed, 26 May 2004 17:34:41 -0400 (EDT)
From: Robert Watson <rwatson@freebsd.org>
X-Sender: robert@fledge.watson.org
To: Vivek Khera <vivek@khera.org>
In-Reply-To: <6EF3382E-AF4E-11D8-A6A8-000A9578CFCC@khera.org>
Message-ID: <Pine.NEB.3.96L.1040526173125.20947I-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: stable@freebsd.org
Subject: Re: how to interpret crash?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 26 May 2004 21:35:35 -0000


On Wed, 26 May 2004, Vivek Khera wrote:

> I'm loading some data into a postgres database and I keep
> crashing/locking up my box.  I hooked up a serial console to try to
> figure it out.  Here is what the console displayed.  What does this
> mean?  Can it be hardware or is it most likely software? 

There's a fairly useful section in the Handbook on preparing bug reports,
attaching debuggers, performing core dumps, etc.  Depending on how moved
you are to hack kernels, varying degrees of that might or might not be
appealing :-).

> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x0
> fault code              = supervisor write, page not present
> instruction pointer     = 0x8:0xc0230fee
> stack pointer           = 0x10:0xc0276ed8
> frame pointer           = 0x10:0xc0276ed8
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                          = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = Idle
> interrupt mask          = none
> trap number             = 12
> panic: page fault

This is a NULL pointer dereference in some piece of code.  The instruction
pointer is 0xc0230fee, which if you have a kernel with debugging symbols,
you can convert into a source file and line number (see the handbook for
details).  If you don't have, or can't get a kernel with debugging
symbols, you can use 'nm' on your kernel to look for the symbols on either
side of the address -- the symbol before is the function the crash occurs
in.  That information can be used to help determine what has taken place.
If this is a reproduceable problem, try compiling DDB into the kernel and
using a serial console to capture a stack trace (instructions in the
handbook), as that should be most helpful.

Regarding kernel bug or hardware problem: it could be either, as NULL
pointer dereferences can be the result of code that isn't written to
handle an error case that occurs in the real world, a logic error, or can
occur if your hardware isn't operating to specification, or is failing in
some or another form.  Knowing what function the fault takes place in
would go a long way in resolving that, however.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Senior Research Scientist, McAfee Research


> 
> syncing disks...
> 
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x30
> fault code              = supervisor read, page not present
> instruction pointer     = 0x8:0xc01f4830
> stack pointer           = 0x10:0xc0276d04
> frame pointer           = 0x10:0xc0276d0c
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                          = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = Idle
> interrupt mask          = bio
> trap number             = 12
> panic: page fault
> Uptime: 2h15m33s
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting...
> 
> 
> 
> A few days ago I noticed a "write timeout resetting device" on the ad1 
> disk.  Both IDE drives are on the same controller.
> 
> This is running FreeBSD 4.10-PRERELEASE #6: Wed May 26 13:23:05 EDT 
> 2004, but was cvsup'd yesterday.  The CPU is an AMD Duron 850 with 
> 1.5GB RAM.
> 
> What should I look to fix on this sytem?
> 
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"