From owner-freebsd-stable@FreeBSD.ORG Sat May 30 18:31:06 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49406106566B; Sat, 30 May 2009 18:31:06 +0000 (UTC) (envelope-from nakal@web.de) Received: from fmmailgate03.web.de (fmmailgate03.web.de [217.72.192.234]) by mx1.freebsd.org (Postfix) with ESMTP id D17C98FC12; Sat, 30 May 2009 18:31:05 +0000 (UTC) (envelope-from nakal@web.de) Received: from smtp06.web.de (fmsmtp06.dlan.cinetic.de [172.20.5.172]) by fmmailgate03.web.de (Postfix) with ESMTP id 93191FE774FF; Sat, 30 May 2009 20:31:04 +0200 (CEST) Received: from [217.236.13.34] (helo=zelda.local) by smtp06.web.de with asmtp (TLSv1:AES128-SHA:128) (WEB.DE 4.110 #277) id 1MATKi-0005pD-00; Sat, 30 May 2009 20:31:04 +0200 Date: Sat, 30 May 2009 20:31:02 +0200 From: Martin To: John Baldwin Message-ID: <20090530203102.27f548c0@zelda.local> In-Reply-To: <200905151205.47672.jhb@freebsd.org> References: <1696198956@web.de> <200905151109.21127.jhb@freebsd.org> <20090515173800.071e53c2@zelda.local> <200905151205.47672.jhb@freebsd.org> X-Mailer: Claws Mail 3.7.1 (GTK+ 2.16.1; amd64-portbld-freebsd8.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: nakal@web.de X-Sender: nakal@web.de X-Provags-ID: V01U2FsdGVkX1+5n0kiOGpBxWVlkWmkjYp1WQiWaznKeogHmEIw Ah66MGIiRwMPcxwU5gAa6NDE5AmbM94vjgkJKo/zLRvKpW0SXn aCq5qA8vM= Cc: freebsd-stable@freebsd.org Subject: [Solved] Re: kernel trap 12 with interrupts disabled [bge0 on 7.2R] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 May 2009 18:31:06 -0000 Am Fri, 15 May 2009 12:05:47 -0400 schrieb John Baldwin : > On Friday 15 May 2009 11:38:00 am Martin wrote: > > Am Fri, 15 May 2009 11:09:20 -0400 > > schrieb John Baldwin : > > > > > x/i please. The /i decodes it as an instruction so I can see > > > which registers it was attempting to dereference. > > > > Oh sorry... > > > > (kgdb) x/i 0xffffffff805bbc66 > > 0xffffffff805bbc66 : movzbl (%rdx),%edx > > Hmm, your %rdx is garbage. :( > > rdx 0xef3fdf377db53afa -1207000745686779142 > > That should at least be > > 0xffffff.......... > > Looks like r9 and r14 have the same odd value. Normally I would see > a more obvious breakage such as one of the 'f' nibbles being set to > '0' or 'e', etc. You could try looking for that odd pointer value in > the route structure or as arguments to other functions in the stack > trace to see if you can find a corrupted data structure. Hi John, I want to thank you once again. You have been right that the hardware was broken. I've contacted the hardware support and after replacing things like memory and mainboard (that haven't been the solution), I could finally find out that the CPU was broken. I think, this is why we haven't seen obvious memory failures, like single unflipped bits and broken patterns, but TOTALLY different memory contents. What I have learned from this: - FreeBSD 7.2R hasn't let me down :) - memtest or sometimes called memtester is a good utility when you want to test memory AND to have high load to heat up the components a bit. It is possible, because it runs within the OS. memtest86+ is first broken on FreeBSD/amd64 and it does not find anything in my case (tested on Linux), because it does not put enough load compared to 3 parallel memtest processes. - One thing I noticed on memtest is that it cannot mlock (lock memory) a big chunk of memory more than one time (just in one process). And mlock above 2GB is a problem. Perhaps, it might be interesting to look at this. Thanks John, you have been a great help for me, our web server is running stable again. -- Martin