From owner-freebsd-stable@FreeBSD.ORG Thu Jun 1 03:10:19 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 978A216AF1F for ; Thu, 1 Jun 2006 03:10:19 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id EF33B43D6B for ; Thu, 1 Jun 2006 03:10:13 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.14] (imini.samsco.home [192.168.254.14]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k51393hU003191; Wed, 31 May 2006 21:09:08 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <447E5A94.3030602@samsco.org> Date: Wed, 31 May 2006 21:10:12 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.7) Gecko/20050416 X-Accept-Language: en-us, en MIME-Version: 1.0 To: David Wolfskill References: <20060601003101.GE1991@bunrab.catwhisker.org> In-Reply-To: <20060601003101.GE1991@bunrab.catwhisker.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.1 required=3.8 tests=ALL_TRUSTED,PLING_QUERY autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: stable@freebsd.org Subject: Re: 6.1-STABLE; Fatal trap 12: page fault while in kernel mode; kgdb isn't working??!? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2006 03:10:19 -0000 David Wolfskill wrote: > In testing a vendor's product, I managed (as I had been warned might > happen) to crash the machine on which the product was running. > > It's a moderately-recent 6.1-STABLE: > > mx-out05# uname -a > FreeBSD mx-out05.lab.example.org 6.1-STABLE FreeBSD 6.1-STABLE #3: Sun May 7 10:06:44 PDT 2006 dhw@mx-out05.lab.example.org:/usr/obj/usr/src/sys/SMP_PAE i386 > mx-out05# > > Hardware-wise, it's a dual 3 GHz Xeon box with 4 GB RAM. > > In case it's relevant: > > mx-out05# mount; df; swapinfo > /dev/aacd0s2a on / (ufs, local, soft-updates) > devfs on /dev (devfs, local) > /dev/aacd0s2d on /usr (ufs, local, soft-updates) > /dev/aacd0s3d on /home (ufs, local, soft-updates) > /dev/aacd0s3e on /var (ufs, local, soft-updates) > /dev/aacd1s1d on /var/spool (ufs, local, noatime) > devfs on /var/named/dev (devfs, local) > /dev/md0 on /tmp (ufs, local, soft-updates) > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/aacd0s2a 507630 37008 430012 8% / > devfs 1 1 0 100% /dev > /dev/aacd0s2d 2280880 1676226 422184 80% /usr > /dev/aacd0s3d 5077038 50950 4619926 1% /home > /dev/aacd0s3e 7270492 949650 5739204 14% /var > /dev/aacd1s1d 34678048 14136 31889670 0% /var/spool > devfs 1 1 0 100% /var/named/dev > /dev/md0 9159102 16 8426358 0% /tmp > Device 1K-blocks Used Avail Capacity > /dev/aacd0s3b 16777216 0 16777216 0% > mx-out05# > > Yes, swap is ridiculously huge (but note that /tmp is swap-backed). > So are a few other allocations (huge, that is); in general, I prefer > to avoid exhausting resources. :-} > > The crash appears to be quite reproducible by using > ports/benchmarks/postal. It's fairly likely that I need to configure > some resource-consumption constraints so the application doesn't go > completely berserk. I note that running postal using the same > parameters against a similar box running Postfix just chugs along, no > problem at all. > > Here's a typical complaint as extracted from /var/log/messages: > > May 31 16:02:13 mx-out05 kernel: Fatal trap 12: page fault while in kernel mode > May 31 16:02:13 mx-out05 kernel: cpuid = 0; apic id = 00 > May 31 16:02:13 mx-out05 kernel: fault virtual address > May 31 16:02:13 mx-out05 kernel: = 0x0 > May 31 16:02:13 mx-out05 kernel: fault code = supervisor read, page not present > May 31 16:02:13 mx-out05 kernel: instruction pointer = 0x20:0x0 > May 31 16:02:13 mx-out05 kernel: stack pointer = 0x28:0xf06f8b98 > May 31 16:02:13 mx-out05 kernel: frame pointer = 0x28:0xf06f8bcc > May 31 16:02:13 mx-out05 kernel: code segment = base 0x0, limit 0xf > May 31 16:02:13 mx-out05 kernel: f > > > I did manage to set things up to get a kernel crash dump, and I'm about > as certain as I can be that the kernel, userland, and crash dump are all > in sync. > > Still, when I > > cd /usr/obj/usr/src/sys/SMP_PAE/ && kgdb kernel.debug /var/crash/vmcore.0 > > I get a repeating: > kgdb: kvm_read: invalid address (0xc9ff5624) > kgdb: kvm_read: invalid address (0xc9ff8600) > kgdb: kvm_read: invalid address (0xc9ff5624) > kgdb: kvm_read: invalid address (0xc9ff8600) > > The pattern repeats until I interrupt it. > > Now, this box is in a lab; it is for testing (at this time), so I have > rather more flexibility than I might for a production system. The > product was built for FreeBSD 5.x; I have the ports/misc/compat-5x port > installed, and the product does run -- at least, until I start > stress-testing it. :-} > > I could bring the box up to a more recent -STABLE fairly easily; for that > matter, I could probably bring it up to -CURRENT fairly easily, but I > have no intent to be running a production service on -CURRENT. (My > laptop? Sometimes. A production box in a colo? Uhh... maybe I'm just > not sufficiently daring, but no thanks. :-}) > > I'd appreciate suggestions (or pointers to same) as to how I might > proceed to determine what I can do to get the product to run reliably > iin a FreeBSD environment. (The vendor has suggested eithe rRed Hat or > Suse Linux as more stable platforms, and has complained about an > inability to get debugging information from FreeBSD. I have pointe dout > that there's been some progress of late on getting DTrace ported to > FreeBSD, and they've seemed at least somewhat interested, but in the > mean time....) > > Anyway, I'll plan on summarizing off-list responses that are relevant. > > Thanks! > > Peace, > david kgdb seems to be more broken than not. COuld you enable KDB+DDB and at least get a stack trace from the fault? Scott