From owner-freebsd-hackers@FreeBSD.ORG Thu Dec 14 19:05:28 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3AEA416A519 for ; Thu, 14 Dec 2006 19:05:28 +0000 (UTC) (envelope-from bsd@bsdhome.com) Received: from ms-smtp-02.southeast.rr.com (ms-smtp-02.southeast.rr.com [24.25.9.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7705D43D86 for ; Thu, 14 Dec 2006 19:03:40 +0000 (GMT) (envelope-from bsd@bsdhome.com) Received: from neutrino.bsdhome.com (cpe-071-070-208-236.nc.res.rr.com [71.70.208.236]) by ms-smtp-02.southeast.rr.com (8.13.6/8.13.6) with ESMTP id kBEJ5FcN013066 for ; Thu, 14 Dec 2006 14:05:15 -0500 (EST) Received: from neutrino.bsdhome.com (localhost [127.0.0.1]) by neutrino.bsdhome.com (8.13.1/8.13.1) with ESMTP id kBEJ5Eo9027309; Thu, 14 Dec 2006 14:05:14 -0500 (EST) (envelope-from bsd@neutrino.bsdhome.com) Received: (from bsd@localhost) by neutrino.bsdhome.com (8.13.1/8.13.1/Submit) id kBEJ5A4Z027306; Thu, 14 Dec 2006 14:05:10 -0500 (EST) (envelope-from bsd) Date: Thu, 14 Dec 2006 14:05:10 -0500 From: Brian Dean To: freebsd-hackers@freebsd.org Message-ID: <20061214190510.GA26590@neutrino.bsdhome.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.11 X-Virus-Scanned: Symantec AntiVirus Scan Engine X-Mailman-Approved-At: Thu, 14 Dec 2006 20:18:52 +0000 Subject: Kernel hang on 6.x X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Dec 2006 19:05:28 -0000 Hi, We're experiencing a kernel hang on a 6.x quad processor Sun amd64 based system. We are able to reproduce it fairly reliably, but the environment to do so is not easily replicatable so I cannot provide a simple test case. However, I have been able to build a debug kernel and when the system "hangs", I can break to the debugger prompt. But once there, I'm not sure what to do to isolate where the system is hung up. I have confirmed that the hang occurs in both SMP and uniprocessor mode. Here are some system details: uname -a: FreeBSD bb02f54 6.2-BETA2 FreeBSD 6.2-BETA2 #4: Wed Dec 13 11:43:38 EST 2006 root@bb02f54:/usr/src/sys/amd64/compile/BBKERN amd64 FreeBSD 6.2-BETA2 #4: Wed Dec 13 11:43:38 EST 2006 root@bb02f54:/usr/src/sys/amd64/compile/BBKERN WARNING: WITNESS option enabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Dual Core AMD Opteron(tm) Processor 275 (2193.76-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x20f12 Stepping = 2 Features=0x178bfbff Features2=0x1 AMD Features=0xe2500800 AMD Features2=0x3 Cores per package: 2 real memory = 17179869184 (16384 MB) avail memory = 16518569984 (15753 MB) ACPI APIC Table: The hang appears to occur under heavy memory usage and usually seems to happen when the process size approaches the size of swap. If anyone can offer a suggestion as to what information from the db> prompt might help home in on this problem, please let me know. A simple backtrace wasn't terribly englightening, at least to me: db> bt Tracing pid 18 tid 100011 td 0xffffff03e1563980 kdb_enter() at kdb_enter+0x2f scgetc() at scgetc+0x43e sckbdevent() at sckbdevent+0x83 kbdmux_intr() at kbdmux_intr+0x4d kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20 taskqueue_run() at taskqueue_run+0x135 ithread_loop() at ithread_loop+0x132 fork_exit() at fork_exit+0x87 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffbf50ad00, rbp = 0 --- db> db> show reg cs 0x8 ss 0x10 rax 0x26 rcx 0x10457a rdx 0x1 rbx 0 rsp 0xffffffffbf50aac0 rbp 0xffffffffbf50aad0 rsi 0xffffffff80c11000 rdi 0 r8 0xffe00 r9 0xa r10 0xffffffffbf50a9e0 r11 0xa r12 0xffffffff80957c20 main_softc r13 0xffffffff809579c0 main_console r14 0x2 r15 0 rip 0xffffffff803fd57f kdb_enter+0x2f rflags 0x286 dr0 0 dr1 0 dr2 0 dr3 0 dr4 0xffff0ff0 dr5 0x400 dr6 0xffff0ff0 dr7 0x400 kdb_enter+0x2f: nop db> Thanks! -Brian