Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 May 2005 02:25:38 +0200
From:      Palle Girgensohn <girgen@FreeBSD.org>
To:        kwsn@earthlink.net, freebsd-amd64@freebsd.org
Cc:        toby.murray@gmail.com
Subject:   Re: Panic while running jdk15
Message-ID:  <24CD85AD72E7F49E3A9AC091@rambutan.pingpong.net>
In-Reply-To: <1115965490.59966.18.camel@jonnyv.kwsn.lan>
References:  <1115839640.59966.12.camel@jonnyv.kwsn.lan> <1115965490.59966.18.camel@jonnyv.kwsn.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
--On torsdag, maj 12, 2005 23.24.50 -0700 Jon Kuster <kwsn@earthlink.net> 
wrote:

> On Wed, 2005-05-11 at 12:27 -0700, Jon Kuster wrote:
>> After we managed to get jdk15 built and then shipped our box to the
>> colo, it has started panicing.  We haven't been able to reliably
>> reproduce this yet, but it always happens when our java program is doing
>> it's thing.
>>
>> kernel trap 12 with interrupts disabled
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 0; apic id=00
>> fault virtual address = 0x1c0
>> fault code = supervisor write, page not present
>> instruction pointer = 0x8 :0xffffffff80382348
>> stack pointer = 0x10 :0xffffffff7935aa0
>> frame pointer = 0x10 :0xffffffff7935ae0
>> code segment = base 0x0, limit 0xfffff, type 0x1b
>>              = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags = resume, IOPL = 0
>> current process = 6503 (sh)
>>
>> I haven't been able to get a dump yet, or even a trace in ddb - our
>> remote management card apparently emulates a usb keyboard which doesn't
>> seem to work when the box is paniced.
>>
>> nm -n /boot/kernel/kernel |grep ffffffff803823
>> ffffffff80382330 T cpu_throw
>> ffffffff80382380 T cpu_switch
>
> We've switched off Hyperthreading (we're running em64T xeons), and that
> seems to have worked around the problem.  It's a little too early to say
> for sure, but we were seeing panics twice a day, and we haven't had a
> panic in about a day and a half.

Hi!

This looks very similar to our problem. Dell 2850 (i.e. em64T xeon, two 
CPUs). Turning off HTT made it live longer (long enough for med to believe 
it actually solved the problem), but after a week or so it crashed twice a 
day again. We're *not* running java, though. Apache 1.3, php4, 
postgres8.0.3, amavis (i.e. perl), postfix. apache, postgres and php are 
very loaded, the machine has a load >= .8 most of the time (mostly due to 
sloppy code, but anyway).

5.4-release made it better, for a few days, but then it started crashing 
again. Today, I've built a non-SMP kernel, so we're effectively running a 
single CPU. It has not crashed so far (but it is slow).

Always Fatal trap 12: page fault while in kernel mode

It also hangs and does not reboot by itself. it seems so hard it never 
manages to save a core dump, and has to be restarted by hitting the big 
button.

Contacted Dell support, as I'm beginning to suspect the hardware. After 
BIOS upgrade today, recommended by Dell, The machine hung at userland 
startup, when starting the various daemons. Five times in a row, at least. 
Then it decided to actually come up, and stayed up for eight hours. then 
down again. sic...

If it works fine with one CPU, is it likely to be hardware problem or 
software?

Jon, you report is a few weeks old, what happened? Does it live happily w/o 
HTT?

/Palle




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?24CD85AD72E7F49E3A9AC091>