From owner-freebsd-questions@FreeBSD.ORG Tue Jun 12 17:24:45 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 677BC16A46B for ; Tue, 12 Jun 2007 17:24:45 +0000 (UTC) (envelope-from wbishop@twosensemedia.com) Received: from twosensemedia.com (twosensemedia.com [69.15.36.137]) by mx1.freebsd.org (Postfix) with ESMTP id 00B5413C45D for ; Tue, 12 Jun 2007 17:24:44 +0000 (UTC) (envelope-from wbishop@twosensemedia.com) Received: from [10.0.1.8] (account wbishop HELO S0030153310) by twosensemedia.com (CommuniGate Pro SMTP 4.2.10) with ESMTP id 3203158 for freebsd-questions@freebsd.org; Tue, 12 Jun 2007 11:33:17 -0400 Message-ID: <004701c7ad07$0e9f5d90$0801000a@S0030153310> From: "Worth Bishop" To: Date: Tue, 12 Jun 2007 11:33:41 -0400 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3028 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Subject: FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12: page fault; warning: 'T2' might be used uninitialized X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jun 2007 17:24:45 -0000 Please help if you can... BACKGROUND This crash is occurring on a dual-AMD 1.6Ghz cpu white-box system with 1 Gb ram, 250Gb storage running GENERIC kernel. The system has been in production use as a web server for nearly five years. About 3 - 4 months ago, the system was upgraded from an earlier FreeBSD version to 6.1. At the same time, all supporting applications (Apache webserver, PERL, PostgreSQL, PHP, countless other applications & libraries) were upgraded to the current releases. The system was stable up until a couple of weeks ago. FIRST ERROR EVENT The system crashed during normal usage. The following message was displayed on the console which was not responsive to keyboard input: Sleeping thread (tid 100122, pid 11099) owns a non-sleepable lock panic: sleeping thread cpuid=1 The system was restarted, an fsck routine was completed (answering "yes" to all the "Do you want to salvage" type questions) and the server ran fine. For about a week. It then crashed again several times, at intervals varying from a few minutes of uptime to a few days. SECOND ERROR EVENT After some crashes, a message similar to that above was displayed. However, at other times a message similar to this was displayed: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid=0; apic id=01 fault virtual address =0x100 fault code =supervisor read, page not present instruction pointer =0x20:0xc066c731 stack pointer =0x28:0xe432ebf0 framepointer =0x28:0xe432ebfc code segment =base 0x0, limit0xfffff, type 0x1b =DPL 0, pres 1, def32 1, gran1 processor eflags =resume, IOPL=0 current process =36 (syncer) trap number = 12 panic: page fault cpuid=0 uptime: 3d10h11m44s Dumping 1535 Mb (2 chunks) [NOTE: the system had 1.5Gb memory at that time. Memory was removed, reseated, swapped, etc., now 1Gb] chunk 0:1Mb (159 pages) CORRECTIONS ATTEMPTED Somewhere during this ordeal, a Google search revealed a number of other people experiencing the "Sleeping thread" problem. One of these was apparently experienced in a FreeBSD 6.x development version stress test. No definitive solution was identified in anything we say, except a single reference to the problem being a kernel bug fixed in FreeBSD 6.2. Accordingly, we upgraded from 6.1 to 6.2 but have still experienced the problem. We reviewed the 'messages' file and found references to several things which led us to check FreeBSD 6.2 ERRATA (http://www.freebsd.org/releases/6.2R/errata.html). This suggested adding 'kern.ipc.nmbclusters="0"' to the /boot/loader.conf file which might avoid a known issue. We tried this, but saw no relief. We also found a reference in the manual that suggested the issue might be a problem with the APIC in 6.x. This recommended adding 'hint.apic.0.disabled="1"' to loader.conf. Tried this; no help. In order to try to get more information about the system dumps we added: dumpdev="AUTO" and dumpdir="/usr/crash" [to get more storage space than available in /var/] and have generated several vmcore.# files of ~1 Gb each (all identical size). We attempted to use DDB to analyze the dumps (struggling now, unfamiliar with kernel debugging process) with no success. Research suggested we needed to create a debug version of the kernel (i.e., KERNEL.DEBUG) with debugging options enabled. We duly copied GENERIC and edited it, noting that "options ddb" was already enabled. We added 'makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols' as suggested and tried to "make buildkernel" which errored out stating that KDB must be enabled to use DDB. We edited KERNEL.DEBUG to add 'options KDB # Enable kernel debugger' and attempted to "make buildkernel" again. This time, the process stopped again with the message: THIRD ERROR EVENT [snip] inline-unit-growth=100 --param arge-function-growth=1000 -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Werror /usr/src/sys/crypto/sha2/sha2.c /usr/src/sys/crypto/sha2/sha2.c: In function `SHA512_Transform': /usr/src/sys/crypto/sha2/sha2.c:753: warning: 'T2' might be used uninitialized in this function *** Error code 1 Stop in /usr/obj/usr/src/sys/KERNEL.DEBUG. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. www:/usr/src# With this, we are stumped. HELP PLEASE! Can anyone: - lead us to a solution based on these error messages? - help us understand why the GENERIC kernel with only the debugging options added failed to make? - help us understand what '/usr/src/crypto/sha2/sha2.c' has to do with anything? - help us understand what we need to do to extract useful information from the vmcore.# files? - offer any other suggestions? Thanks in advance!