Date: Wed, 31 May 2006 21:10:12 -0600 From: Scott Long <scottl@samsco.org> To: David Wolfskill <david@catwhisker.org> Cc: stable@freebsd.org Subject: Re: 6.1-STABLE; Fatal trap 12: page fault while in kernel mode; kgdb isn't working??!? Message-ID: <447E5A94.3030602@samsco.org> In-Reply-To: <20060601003101.GE1991@bunrab.catwhisker.org> References: <20060601003101.GE1991@bunrab.catwhisker.org>
next in thread | previous in thread | raw e-mail | index | archive | help
David Wolfskill wrote: > In testing a vendor's product, I managed (as I had been warned might > happen) to crash the machine on which the product was running. > > It's a moderately-recent 6.1-STABLE: > > mx-out05# uname -a > FreeBSD mx-out05.lab.example.org 6.1-STABLE FreeBSD 6.1-STABLE #3: Sun May 7 10:06:44 PDT 2006 dhw@mx-out05.lab.example.org:/usr/obj/usr/src/sys/SMP_PAE i386 > mx-out05# > > Hardware-wise, it's a dual 3 GHz Xeon box with 4 GB RAM. > > In case it's relevant: > > mx-out05# mount; df; swapinfo > /dev/aacd0s2a on / (ufs, local, soft-updates) > devfs on /dev (devfs, local) > /dev/aacd0s2d on /usr (ufs, local, soft-updates) > /dev/aacd0s3d on /home (ufs, local, soft-updates) > /dev/aacd0s3e on /var (ufs, local, soft-updates) > /dev/aacd1s1d on /var/spool (ufs, local, noatime) > devfs on /var/named/dev (devfs, local) > /dev/md0 on /tmp (ufs, local, soft-updates) > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/aacd0s2a 507630 37008 430012 8% / > devfs 1 1 0 100% /dev > /dev/aacd0s2d 2280880 1676226 422184 80% /usr > /dev/aacd0s3d 5077038 50950 4619926 1% /home > /dev/aacd0s3e 7270492 949650 5739204 14% /var > /dev/aacd1s1d 34678048 14136 31889670 0% /var/spool > devfs 1 1 0 100% /var/named/dev > /dev/md0 9159102 16 8426358 0% /tmp > Device 1K-blocks Used Avail Capacity > /dev/aacd0s3b 16777216 0 16777216 0% > mx-out05# > > Yes, swap is ridiculously huge (but note that /tmp is swap-backed). > So are a few other allocations (huge, that is); in general, I prefer > to avoid exhausting resources. :-} > > The crash appears to be quite reproducible by using > ports/benchmarks/postal. It's fairly likely that I need to configure > some resource-consumption constraints so the application doesn't go > completely berserk. I note that running postal using the same > parameters against a similar box running Postfix just chugs along, no > problem at all. > > Here's a typical complaint as extracted from /var/log/messages: > > May 31 16:02:13 mx-out05 kernel: Fatal trap 12: page fault while in kernel mode > May 31 16:02:13 mx-out05 kernel: cpuid = 0; apic id = 00 > May 31 16:02:13 mx-out05 kernel: fault virtual address > May 31 16:02:13 mx-out05 kernel: = 0x0 > May 31 16:02:13 mx-out05 kernel: fault code = supervisor read, page not present > May 31 16:02:13 mx-out05 kernel: instruction pointer = 0x20:0x0 > May 31 16:02:13 mx-out05 kernel: stack pointer = 0x28:0xf06f8b98 > May 31 16:02:13 mx-out05 kernel: frame pointer = 0x28:0xf06f8bcc > May 31 16:02:13 mx-out05 kernel: code segment = base 0x0, limit 0xf > May 31 16:02:13 mx-out05 kernel: f > > > I did manage to set things up to get a kernel crash dump, and I'm about > as certain as I can be that the kernel, userland, and crash dump are all > in sync. > > Still, when I > > cd /usr/obj/usr/src/sys/SMP_PAE/ && kgdb kernel.debug /var/crash/vmcore.0 > > I get a repeating: > kgdb: kvm_read: invalid address (0xc9ff5624) > kgdb: kvm_read: invalid address (0xc9ff8600) > kgdb: kvm_read: invalid address (0xc9ff5624) > kgdb: kvm_read: invalid address (0xc9ff8600) > > The pattern repeats until I interrupt it. > > Now, this box is in a lab; it is for testing (at this time), so I have > rather more flexibility than I might for a production system. The > product was built for FreeBSD 5.x; I have the ports/misc/compat-5x port > installed, and the product does run -- at least, until I start > stress-testing it. :-} > > I could bring the box up to a more recent -STABLE fairly easily; for that > matter, I could probably bring it up to -CURRENT fairly easily, but I > have no intent to be running a production service on -CURRENT. (My > laptop? Sometimes. A production box in a colo? Uhh... maybe I'm just > not sufficiently daring, but no thanks. :-}) > > I'd appreciate suggestions (or pointers to same) as to how I might > proceed to determine what I can do to get the product to run reliably > iin a FreeBSD environment. (The vendor has suggested eithe rRed Hat or > Suse Linux as more stable platforms, and has complained about an > inability to get debugging information from FreeBSD. I have pointe dout > that there's been some progress of late on getting DTrace ported to > FreeBSD, and they've seemed at least somewhat interested, but in the > mean time....) > > Anyway, I'll plan on summarizing off-list responses that are relevant. > > Thanks! > > Peace, > david kgdb seems to be more broken than not. COuld you enable KDB+DDB and at least get a stack trace from the fault? Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?447E5A94.3030602>
