From owner-freebsd-hackers@FreeBSD.ORG Fri Dec 15 08:45:42 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8A58216A40F for ; Fri, 15 Dec 2006 08:45:42 +0000 (UTC) (envelope-from hselasky@c2i.net) Received: from swip.net (mailfe05.swip.net [212.247.154.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id CCCBA43CC2 for ; Fri, 15 Dec 2006 08:44:00 +0000 (GMT) (envelope-from hselasky@c2i.net) X-Cloudmark-Score: 0.000000 [] Received: from [193.217.134.54] (account mc467741@c2i.net HELO [10.0.0.249]) by mailfe05.swip.net (CommuniGate Pro SMTP 5.0.12) with ESMTPA id 257050322; Fri, 15 Dec 2006 09:45:38 +0100 From: Hans Petter Selasky To: freebsd-hackers@freebsd.org Date: Fri, 15 Dec 2006 09:45:18 +0100 User-Agent: KMail/1.7 References: <20061214190510.GA26590@neutrino.bsdhome.com> In-Reply-To: <20061214190510.GA26590@neutrino.bsdhome.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200612150945.19310.hselasky@c2i.net> Cc: Brian Dean Subject: Re: Kernel hang on 6.x X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Dec 2006 08:45:42 -0000 On Thursday 14 December 2006 20:05, Brian Dean wrote: > Hi, > > We're experiencing a kernel hang on a 6.x quad processor Sun amd64 > based system. We are able to reproduce it fairly reliably, but the > environment to do so is not easily replicatable so I cannot provide a > simple test case. However, I have been able to build a debug kernel > and when the system "hangs", I can break to the debugger prompt. But > once there, I'm not sure what to do to isolate where the system is > hung up. I have confirmed that the hang occurs in both SMP and > uniprocessor mode. Here are some system details: > > uname -a: > FreeBSD bb02f54 6.2-BETA2 FreeBSD 6.2-BETA2 #4: Wed Dec 13 11:43:38 EST > 2006 root@bb02f54:/usr/src/sys/amd64/compile/BBKERN amd64 > > FreeBSD 6.2-BETA2 #4: Wed Dec 13 11:43:38 EST 2006 > root@bb02f54:/usr/src/sys/amd64/compile/BBKERN > WARNING: WITNESS option enabled, expect reduced performance. > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Dual Core AMD Opteron(tm) Processor 275 (2193.76-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0x20f12 Stepping = 2 > > Features=0x178bfbffA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x1 > AMD Features=0xe2500800 > AMD Features2=0x3 > Cores per package: 2 > real memory = 17179869184 (16384 MB) > avail memory = 16518569984 (15753 MB) > ACPI APIC Table: > > The hang appears to occur under heavy memory usage and usually seems > to happen when the process size approaches the size of swap. > > If anyone can offer a suggestion as to what information from the db> > prompt might help home in on this problem, please let me know. A > simple backtrace wasn't terribly englightening, at least to me: > Does this happen when you work on the console? Or when you switch from X11 to the console. I know that the keyboard driver is called from several places without Giant locked. In my new USB keyboard driver I have added several "if (!mtx_owned(&Giant)) return XXX;". Maybe you can try adding such to the SUN keyboard driver? Please see: http://www.turbocat.net/~hselasky/isdn4bsd/sources/src/sys/dev/usb/ukbd.c > db> bt > Tracing pid 18 tid 1000http://www.turbocat.net/~hselasky/isdn4bsd/sources/src/sys/dev/usb/ukbd.c11 td 0xffffff03e1563980 > kdb_enter() at kdb_enter+0x2f > scgetc() at scgetc+0x43e > sckbdevent() at sckbdevent+0x83 > kbdmux_intr() at kbdmux_intr+0x4d > kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20 > taskqueue_run() at taskqueue_run+0x135 > ithread_loop() at ithread_loop+0x132 > fork_exit() at fork_exit+0x87 > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffffffbf50ad00, rbp = 0 --- > db> > > db> show reg > cs 0x8 > ss 0x10 > rax 0x26 > rcx 0x10457a > rdx 0x1 > rbx 0 > rsp 0xffffffffbf50aac0 > rbp 0xffffffffbf50aad0 > rsi 0xffffffff80c11000 > rdi 0 > r8 0xffe00 > r9 0xa > r10 0xffffffffbf50a9e0 > r11 0xa > r12 0xffffffff80957c20 main_softc > r13 0xffffffff809579c0 main_console > r14 0x2 > r15 0 > rip 0xffffffff803fd57f kdb_enter+0x2f > rflags 0x286 > dr0 0 > dr1 0 > dr2 0 > dr3 0 > dr4 0xffff0ff0 > dr5 0x400 > dr6 0xffff0ff0 > dr7 0x400 > kdb_enter+0x2f: nop > db> > --HPS