FreeBSD Mail Archives

Date:      Fri, 04 Nov 2005 11:24:33 +0100
From:      Koen Martens <fbsd@metro.cx>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        Koen Martens <fbsd@metro.cx>, freebsd-hackers@FreeBSD.org, Dimitry Andric <dimitry@andric.com>, Vinod Kashyap <vkashyap@amcc.com>, jhb@FreeBSD.org
Subject:   Re: panic in propagate_priority w/ postgresql under heavy load
Message-ID:  <436B36E1.7010704@metro.cx>
In-Reply-To: <20051005090715.D84936@fledge.watson.org>
References:  <2B3B2AA816369A4E87D7BE63EC9D2F269B7B4D@SDCEXCHANGE01.ad.amcc.com> <432F1310.80007@metro.cx> <20050920153806.F34322@fledge.watson.org> <433FF87C.3090101@metro.cx> <20051005090715.D84936@fledge.watson.org>

Robert Watson wrote:

>
> On Sun, 2 Oct 2005, Koen Martens wrote:
>
>> kernel trap 12 with interrupts disabled
>>
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 1; apic id = 06
>> fault virtual address   = 0x24
>> fault code              = supervisor read, page not present
>> instruction pointer     = 0x8:0xc051c253
>> stack pointer           = 0x10:0xe93efb3c
>> frame pointer           = 0x10:0xe93efb50
>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>                        = DPL 0, pres 1, def32 1, gran 1
>> processor eflags        = resume, IOPL = 0
>> current process         = 6092 (postgres)
>>
>> And that, that is all.. No ddb> no 'dumping xxxxMB', just that. So 
>> basically, i fear this is a non-debugable problem, since putting in 
>> witness and such slows the kernel to a point where the panic does not 
>> occur anymore (at least, not in the 4 weeks i've been running the box 
>> with witness & invariants). Clueless :)
>
>
> This looks like a NULL pointer dereference in kernel code.  Probably, 
> this is not a locking problem, so running without WITNESS to debug 
> this should be OK.  Are you using a serial console?  If not, you might 
> find that it increases the reliability of entering DDB.  If this box 
> is an SMP box, you may also want to add options KDB_STOP_NMI to your 
> kernel config.
>
> Using gdb, could you work out what function 0xc051c253 is, and where 
> in the function.  You should be able to run gdb on your kernel.debug 
> (or kernel on 7.x), and use "l *0xc051c253" to generate a pointer to 
> the line and snippet, which will give us a substantial hint about what 
> is happening.

Sorry for not getting back on this timely, have had rather a busy period
(lousy excuse, i know). Anyway, I have currently downgraded the machine
to a 5.3-RELEASE-p22 kernel, which seems to have solved the problem.
There are no panics anymore (it has been two weeks since the downgrade).
Makes me a bit warry about upgrading anything to 6.x :)

Anyway, i did get into the ddb prompt on one of the last panics, and put
some of the resources online:

http://www.sonologic.nl/fbsd/

As you can see, i was pretty clueless about what to do, and just traced
about everything that was not swapped out..

Did not put the core dump online, as i don't feel like sharing that with
the world. Available upon request though for those who want to get a
crack at this.

I don't have a copy of the kernel.debug lying around, for which I
apologise. I cannot however upgrade to 5.4 again, we've had enought
trouble with this machine and the user load on that machine has
increased to a point where i cannot afford these random panics anymore.
I don't have the spare identical hardware lying around at this point to
copy the entire setup for testing purposes..

What i will try when i find some time is doing incremental upgrades from
5.3-RELEASE-p22 to 5.4-RELEASE-p6, step by step, to see what patchlevel
introduces the problem.

Best,

Koen

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?436B36E1.7010704>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation