From owner-freebsd-amd64@FreeBSD.ORG Wed May 25 00:25:42 2005 Return-Path: X-Original-To: freebsd-amd64@freebsd.org Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C829D16A41C for ; Wed, 25 May 2005 00:25:42 +0000 (GMT) (envelope-from girgen@FreeBSD.org) Received: from melon.pingpong.net (82.milagro.bahnhof.net [195.178.168.82]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6054A43D49 for ; Wed, 25 May 2005 00:25:42 +0000 (GMT) (envelope-from girgen@FreeBSD.org) Received: from localhost (localhost.pingpong.net [127.0.0.1]) by melon.pingpong.net (Postfix) with ESMTP id 9658A4ACB9; Wed, 25 May 2005 02:25:41 +0200 (CEST) Received: from melon.pingpong.net ([127.0.0.1]) by localhost (melon.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 98727-01-8; Wed, 25 May 2005 02:25:41 +0200 (CEST) Received: from [192.168.1.187] (rambutan.pingpong.net [192.168.1.187]) by melon.pingpong.net (Postfix) with ESMTP id 4FB964ACB5; Wed, 25 May 2005 02:25:41 +0200 (CEST) Date: Wed, 25 May 2005 02:25:38 +0200 From: Palle Girgensohn To: kwsn@earthlink.net, freebsd-amd64@freebsd.org Message-ID: <24CD85AD72E7F49E3A9AC091@rambutan.pingpong.net> In-Reply-To: <1115965490.59966.18.camel@jonnyv.kwsn.lan> References: <1115839640.59966.12.camel@jonnyv.kwsn.lan> <1115965490.59966.18.camel@jonnyv.kwsn.lan> X-Mailer: Mulberry/3.1.6 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Scanned: by amavisd-new at pingpong.net Cc: toby.murray@gmail.com Subject: Re: Panic while running jdk15 X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 May 2005 00:25:43 -0000 --On torsdag, maj 12, 2005 23.24.50 -0700 Jon Kuster wrote: > On Wed, 2005-05-11 at 12:27 -0700, Jon Kuster wrote: >> After we managed to get jdk15 built and then shipped our box to the >> colo, it has started panicing. We haven't been able to reliably >> reproduce this yet, but it always happens when our java program is doing >> it's thing. >> >> kernel trap 12 with interrupts disabled >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id=00 >> fault virtual address = 0x1c0 >> fault code = supervisor write, page not present >> instruction pointer = 0x8 :0xffffffff80382348 >> stack pointer = 0x10 :0xffffffff7935aa0 >> frame pointer = 0x10 :0xffffffff7935ae0 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = resume, IOPL = 0 >> current process = 6503 (sh) >> >> I haven't been able to get a dump yet, or even a trace in ddb - our >> remote management card apparently emulates a usb keyboard which doesn't >> seem to work when the box is paniced. >> >> nm -n /boot/kernel/kernel |grep ffffffff803823 >> ffffffff80382330 T cpu_throw >> ffffffff80382380 T cpu_switch > > We've switched off Hyperthreading (we're running em64T xeons), and that > seems to have worked around the problem. It's a little too early to say > for sure, but we were seeing panics twice a day, and we haven't had a > panic in about a day and a half. Hi! This looks very similar to our problem. Dell 2850 (i.e. em64T xeon, two CPUs). Turning off HTT made it live longer (long enough for med to believe it actually solved the problem), but after a week or so it crashed twice a day again. We're *not* running java, though. Apache 1.3, php4, postgres8.0.3, amavis (i.e. perl), postfix. apache, postgres and php are very loaded, the machine has a load >= .8 most of the time (mostly due to sloppy code, but anyway). 5.4-release made it better, for a few days, but then it started crashing again. Today, I've built a non-SMP kernel, so we're effectively running a single CPU. It has not crashed so far (but it is slow). Always Fatal trap 12: page fault while in kernel mode It also hangs and does not reboot by itself. it seems so hard it never manages to save a core dump, and has to be restarted by hitting the big button. Contacted Dell support, as I'm beginning to suspect the hardware. After BIOS upgrade today, recommended by Dell, The machine hung at userland startup, when starting the various daemons. Five times in a row, at least. Then it decided to actually come up, and stayed up for eight hours. then down again. sic... If it works fine with one CPU, is it likely to be hardware problem or software? Jon, you report is a few weeks old, what happened? Does it live happily w/o HTT? /Palle