Date: Fri, 15 Dec 2006 09:45:18 +0100 From: Hans Petter Selasky <hselasky@c2i.net> To: freebsd-hackers@freebsd.org Cc: Brian Dean <brian@bsdhome.com> Subject: Re: Kernel hang on 6.x Message-ID: <200612150945.19310.hselasky@c2i.net> In-Reply-To: <20061214190510.GA26590@neutrino.bsdhome.com> References: <20061214190510.GA26590@neutrino.bsdhome.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 14 December 2006 20:05, Brian Dean wrote: > Hi, > > We're experiencing a kernel hang on a 6.x quad processor Sun amd64 > based system. We are able to reproduce it fairly reliably, but the > environment to do so is not easily replicatable so I cannot provide a > simple test case. However, I have been able to build a debug kernel > and when the system "hangs", I can break to the debugger prompt. But > once there, I'm not sure what to do to isolate where the system is > hung up. I have confirmed that the hang occurs in both SMP and > uniprocessor mode. Here are some system details: > > uname -a: > FreeBSD bb02f54 6.2-BETA2 FreeBSD 6.2-BETA2 #4: Wed Dec 13 11:43:38 EST > 2006 root@bb02f54:/usr/src/sys/amd64/compile/BBKERN amd64 > > FreeBSD 6.2-BETA2 #4: Wed Dec 13 11:43:38 EST 2006 > root@bb02f54:/usr/src/sys/amd64/compile/BBKERN > WARNING: WITNESS option enabled, expect reduced performance. > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Dual Core AMD Opteron(tm) Processor 275 (2193.76-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0x20f12 Stepping = 2 > > Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MC >A,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x1<SSE3> > AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow+,3DNow> > AMD Features2=0x3<LAHF,CMP> > Cores per package: 2 > real memory = 17179869184 (16384 MB) > avail memory = 16518569984 (15753 MB) > ACPI APIC Table: <SUN X4200 > > > The hang appears to occur under heavy memory usage and usually seems > to happen when the process size approaches the size of swap. > > If anyone can offer a suggestion as to what information from the db> > prompt might help home in on this problem, please let me know. A > simple backtrace wasn't terribly englightening, at least to me: > Does this happen when you work on the console? Or when you switch from X11 to the console. I know that the keyboard driver is called from several places without Giant locked. In my new USB keyboard driver I have added several "if (!mtx_owned(&Giant)) return XXX;". Maybe you can try adding such to the SUN keyboard driver? Please see: http://www.turbocat.net/~hselasky/isdn4bsd/sources/src/sys/dev/usb/ukbd.c > db> bt > Tracing pid 18 tid 1000http://www.turbocat.net/~hselasky/isdn4bsd/sources/src/sys/dev/usb/ukbd.c11 td 0xffffff03e1563980 > kdb_enter() at kdb_enter+0x2f > scgetc() at scgetc+0x43e > sckbdevent() at sckbdevent+0x83 > kbdmux_intr() at kbdmux_intr+0x4d > kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20 > taskqueue_run() at taskqueue_run+0x135 > ithread_loop() at ithread_loop+0x132 > fork_exit() at fork_exit+0x87 > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffffffbf50ad00, rbp = 0 --- > db> > > db> show reg > cs 0x8 > ss 0x10 > rax 0x26 > rcx 0x10457a > rdx 0x1 > rbx 0 > rsp 0xffffffffbf50aac0 > rbp 0xffffffffbf50aad0 > rsi 0xffffffff80c11000 > rdi 0 > r8 0xffe00 > r9 0xa > r10 0xffffffffbf50a9e0 > r11 0xa > r12 0xffffffff80957c20 main_softc > r13 0xffffffff809579c0 main_console > r14 0x2 > r15 0 > rip 0xffffffff803fd57f kdb_enter+0x2f > rflags 0x286 > dr0 0 > dr1 0 > dr2 0 > dr3 0 > dr4 0xffff0ff0 > dr5 0x400 > dr6 0xffff0ff0 > dr7 0x400 > kdb_enter+0x2f: nop > db> > --HPS
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200612150945.19310.hselasky>