From owner-freebsd-stable@FreeBSD.ORG  Thu Jun  1 03:10:19 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: stable@freebsd.org
Delivered-To: freebsd-stable@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 978A216AF1F
	for <stable@freebsd.org>; Thu,  1 Jun 2006 03:10:19 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.FreeBSD.org (Postfix) with ESMTP id EF33B43D6B
	for <stable@freebsd.org>; Thu,  1 Jun 2006 03:10:13 +0000 (GMT)
	(envelope-from scottl@samsco.org)
Received: from [192.168.254.14] (imini.samsco.home [192.168.254.14])
	(authenticated bits=0)
	by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k51393hU003191;
	Wed, 31 May 2006 21:09:08 -0600 (MDT)
	(envelope-from scottl@samsco.org)
Message-ID: <447E5A94.3030602@samsco.org>
Date: Wed, 31 May 2006 21:10:12 -0600
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US;
	rv:1.7.7) Gecko/20050416
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: David Wolfskill <david@catwhisker.org>
References: <20060601003101.GE1991@bunrab.catwhisker.org>
In-Reply-To: <20060601003101.GE1991@bunrab.catwhisker.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-1.1 required=3.8 tests=ALL_TRUSTED,PLING_QUERY 
	autolearn=failed version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org
Cc: stable@freebsd.org
Subject: Re: 6.1-STABLE; Fatal trap 12: page fault while in kernel mode; kgdb
 isn't working??!?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Jun 2006 03:10:19 -0000

David Wolfskill wrote:

> In testing a vendor's product, I managed (as I had been warned might
> happen) to crash the machine on which the product was running.
> 
> It's a moderately-recent 6.1-STABLE:
> 
> mx-out05# uname -a
> FreeBSD mx-out05.lab.example.org 6.1-STABLE FreeBSD 6.1-STABLE #3: Sun May  7 10:06:44 PDT 2006     dhw@mx-out05.lab.example.org:/usr/obj/usr/src/sys/SMP_PAE  i386
> mx-out05# 
> 
> Hardware-wise, it's a dual 3 GHz Xeon box with 4 GB RAM.
> 
> In case it's relevant:
> 
> mx-out05# mount; df; swapinfo
> /dev/aacd0s2a on / (ufs, local, soft-updates)
> devfs on /dev (devfs, local)
> /dev/aacd0s2d on /usr (ufs, local, soft-updates)
> /dev/aacd0s3d on /home (ufs, local, soft-updates)
> /dev/aacd0s3e on /var (ufs, local, soft-updates)
> /dev/aacd1s1d on /var/spool (ufs, local, noatime)
> devfs on /var/named/dev (devfs, local)
> /dev/md0 on /tmp (ufs, local, soft-updates)
> Filesystem    1K-blocks    Used    Avail Capacity  Mounted on
> /dev/aacd0s2a    507630   37008   430012     8%    /
> devfs                 1       1        0   100%    /dev
> /dev/aacd0s2d   2280880 1676226   422184    80%    /usr
> /dev/aacd0s3d   5077038   50950  4619926     1%    /home
> /dev/aacd0s3e   7270492  949650  5739204    14%    /var
> /dev/aacd1s1d  34678048   14136 31889670     0%    /var/spool
> devfs                 1       1        0   100%    /var/named/dev
> /dev/md0        9159102      16  8426358     0%    /tmp
> Device          1K-blocks     Used    Avail Capacity
> /dev/aacd0s3b    16777216        0 16777216     0%
> mx-out05# 
> 
> Yes, swap is ridiculously huge (but note that /tmp is swap-backed).
> So are a few other allocations (huge, that is); in general, I prefer
> to avoid exhausting resources.  :-}
> 
> The crash appears to be quite reproducible by using
> ports/benchmarks/postal.  It's fairly likely that I need to configure
> some resource-consumption constraints so the application doesn't go
> completely berserk.  I note that running postal using the same
> parameters against a similar box running Postfix just chugs along, no
> problem at all.
> 
> Here's a typical complaint as extracted from /var/log/messages:
> 
> May 31 16:02:13 mx-out05 kernel: Fatal trap 12: page fault while in kernel mode
> May 31 16:02:13 mx-out05 kernel: cpuid = 0; apic id = 00
> May 31 16:02:13 mx-out05 kernel: fault virtual address  
> May 31 16:02:13 mx-out05 kernel: = 0x0
> May 31 16:02:13 mx-out05 kernel: fault code             = supervisor read, page not present
> May 31 16:02:13 mx-out05 kernel: instruction pointer    = 0x20:0x0
> May 31 16:02:13 mx-out05 kernel: stack pointer          = 0x28:0xf06f8b98
> May 31 16:02:13 mx-out05 kernel: frame pointer          = 0x28:0xf06f8bcc
> May 31 16:02:13 mx-out05 kernel: code segment           = base 0x0, limit 0xf
> May 31 16:02:13 mx-out05 kernel: f
> 
> 
> I did manage to set things up to get a kernel crash dump, and I'm about
> as certain as I can be that the kernel, userland, and crash dump are all
> in sync.
> 
> Still, when I
> 
> cd /usr/obj/usr/src/sys/SMP_PAE/ && kgdb kernel.debug /var/crash/vmcore.0
> 
> I get a repeating:
> kgdb: kvm_read: invalid address (0xc9ff5624)
> kgdb: kvm_read: invalid address (0xc9ff8600)
> kgdb: kvm_read: invalid address (0xc9ff5624)
> kgdb: kvm_read: invalid address (0xc9ff8600)
> 
> The pattern repeats until I interrupt it.
> 
> Now, this box is in a lab; it is for testing (at this time), so I have
> rather more flexibility than I might for a production system.  The
> product was built for FreeBSD 5.x; I have the ports/misc/compat-5x port
> installed, and the product does run -- at least, until I start
> stress-testing it.  :-}
> 
> I could bring the box up to a more recent -STABLE fairly easily; for that
> matter, I could probably bring it up to -CURRENT fairly easily, but I
> have no intent to be running a production service on -CURRENT.  (My
> laptop?  Sometimes.  A production box in a colo?  Uhh... maybe I'm just
> not sufficiently daring, but no thanks.  :-})
> 
> I'd appreciate suggestions (or pointers to same) as to how I might
> proceed to determine what I can do to get the product to run reliably
> iin a FreeBSD environment.  (The vendor has suggested eithe rRed Hat or
> Suse Linux as more stable platforms, and has complained about an
> inability to get debugging information from FreeBSD.  I have pointe dout
> that there's been some progress of late on getting DTrace ported to
> FreeBSD, and they've seemed at least somewhat interested, but in the
> mean time....)
> 
> Anyway, I'll plan on summarizing off-list responses that are relevant.
> 
> Thanks!
> 
> Peace,
> david

kgdb seems to be more broken than not.  COuld you enable KDB+DDB and at
least get a stack trace from the fault?

Scott