From owner-freebsd-current@FreeBSD.ORG Mon Dec 11 13:50:44 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 079FB16A40F; Mon, 11 Dec 2006 13:50:44 +0000 (UTC) (envelope-from avatar@mmlab.cse.yzu.edu.tw) Received: from www.mmlab.cse.yzu.edu.tw (www.mmlab.cse.yzu.edu.tw [140.138.150.166]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6AC8C43D36; Mon, 11 Dec 2006 13:49:07 +0000 (GMT) (envelope-from avatar@mmlab.cse.yzu.edu.tw) Received: by www.mmlab.cse.yzu.edu.tw (qmail, from userid 1000) id 7FAE18C9D8F; Mon, 11 Dec 2006 21:50:14 +0800 (CST) Received: from localhost (localhost [127.0.0.1]) by www.mmlab.cse.yzu.edu.tw (qmail) with ESMTP id 6AB1A8C9D8B; Mon, 11 Dec 2006 21:50:14 +0800 (CST) Date: Mon, 11 Dec 2006 21:50:14 +0800 (CST) From: Tai-hwa Liang To: Robert Watson In-Reply-To: <20061210084254.X9926@fledge.watson.org> Message-ID: <06121121220014.48706@www.mmlab.cse.yzu.edu.tw> References: <52944.192.168.1.110.1165679313.squirrel@yal.hopto.org> <20061209195519.B60055@mp2.macomnet.net> <20061209204924.N9926@fledge.watson.org> <20061209214233.L2273@fledge.watson.org> <0612101036232.41529@www.mmlab.cse.yzu.edu.tw> <20061210084254.X9926@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@freebsd.org Subject: Re: CURRENT freezes on Laitude D520 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Dec 2006 13:50:44 -0000 On Sun, 10 Dec 2006, Robert Watson wrote: > > On Sun, 10 Dec 2006, Tai-hwa Liang wrote: > >>> which get a bit more to the heart of most problems. debug.mpsafenet=1 >>> really exists for the purposes of supporting components which are not >>> sufficiently locked to allow the stack to run MPSAFE, rather than as a >>> means of disabling direct dispatch and preemption, which speak to >>> different types of problems. The main reason that I haven't removed the >>> administrator tunable to date is that I suspect it will be quite helpful >>> when KAME IPSEC locking happens, but since that appears not to have >>> happened yet, debug.mpsafenet as an option is likely causing more harm >>> than good by being available as a stand-in sysctl masking other problems, >>> causing people to not get to the point of properly identifying the actual >>> cause (device driver bugs, etc). >> >> Can the aforementioned tricks(1/2/3) being applied to RELENG_6 as well? > > WITNESS is available in RELENG_6, and should be used in combination with > INVARIANTS, DDB, KDB, and BREAK_TO_DEBUGGER to debug deadlocks. Would a kernel with WITNESS/[KD]DB/BREAK_TO_DEBUGGER enabled but w/o INVARIANTS compiled adequate to dump useful information through remote serial console? > In RELENG_6, net.isr.direct is not enabled by default, so unless you've > enabled it yourself (or are using IP fast forwarding, which is functionally > similar), that won't apply. > > In RELENG_6, PREEMPTION is in GENERIC and hence enabled by default, and it > can be disabled by removing it from your kernel configuration. I'd like it > if we could add a run-time sysctl to disable preemption even if PREEMPTION is > compiled in, as it would make it easier to explore its stability and > performance impact. However, this is also just a debugging step to see if > that quiesces the problem, and not a fix for the actual bug. > > Right now, we're discussing removing the manual debug.mpsafenet configuration > flag from 7.x, and not 6.x. I fully recognize the importance of having it in > place as a workaround for bugs in production, although it concerns me greatly > that we're not getting these problems debugged and fixed, and instead masking > them. Architectural changes are on the way that will require these bugs to > be fixed properly, not just masked. > >> We are using RELENG_6 as our production server(postfix, squid, pf >> firewall/NAT, FAST_IPSEC VPN, ...), which is a dual Athlon MP board with >> three NICs(two fxp cards and one onboard xl, connected to three different >> networks). >> >> I haven't try WITNESS, yet; however, I'm very sure that net.isr.direct=0 >> plus that there is no PREEMPTION in current kernel. The problem is that, >> with debug.mpsafenet=1, we'll always run into hard freeze w/o having any >> kdb> prompt on console. >> >> Whilst turning debug.mpsafenet off only masks the real problem, I'm still >> wondering about if there is any less damaging way to track such problem >> down in a _production_ environment. > > It sounds like you need to follow the instructions for kernel debugging. > Depending on your tolerance of performance loss, downtime, etc, a good > starting point is to configure the kernel with INVARIANTS and WITNESS. > WITNESS is particularly important, if you can tolerate the performance hit, > as it warns of potential deadlocks, not just actual deadlocks. Also, compile With WITNESS enabled(debug.mpsafenet=0), I got another three pf related warnings in the last 8 hours: # # this one looks like lor#046; however, I'm not sure if it is safe to drop # pf_task_mtx with PF_UNLOCK before invoking pf_test() in pf_check_out(). # lock order reversal: 1st 0xc06fca2c tcp (tcp) @ netinet/tcp_input.c:625 2nd 0xc69a1060 pf task mtx (pf task mtx) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6386 KDB: stack backtrace: kdb_backtrace(ffffffff,c06bc0d0,c06ba6b8,c06847a4,c06f8b00,...) at kdb_backtrace+0x29 witness_checkorder(c69a1060,9,c699e609,18f2) at witness_checkorder+0x4cd _mtx_lock_flags(c69a1060,0,c699e609,18f2,0,...) at _mtx_lock_flags+0x1e pf_test(2,c64c6800,e4f6cae0,0,0,...) at pf_test+0x6b pf_check_out(0,e4f6cae0,c64c6800,2,0,...) at pf_check_out+0x3d pfil_run_hooks(c06fc5e0,e4f6cb5c,c64c6800,2,0,...) at pfil_run_hooks+0xb3 ip_output(c64e0300,0,e4f6cb28,0,0,...) at ip_output+0x75e tcp_respond(0,c650a030,c650a044,c64e0300,eb232182,0,14) at tcp_respond+0x22a tcp_input(c64e0300,14,8,121245cb,1,...) at tcp_input+0x2559 ip_input(c64e0300) at ip_input+0x729 netisr_processqueue(c06fa298) at netisr_processqueue+0x6e swi_net(0) at swi_net+0x8c ithread_execute_handlers(c63ff430,c644a080) at ithread_execute_handlers+0xce ithread_loop(c63b6b10,e4f6cd38,c06aafc0,0,c06514db,...) at ithread_loop+0x4f fork_exit(c04dd70c,c63b6b10,e4f6cd38) at fork_exit+0x61 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe4f6cd6c, ebp = 0 --- lock order reversal: 1st 0xc6b3984c inp (tcpinp) @ netinet/tcp_usrreq.c:368 2nd 0xc69a1060 pf task mtx (pf task mtx) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6386 KDB: stack backtrace: kdb_backtrace(ffffffff,c06bc0a8,c06ba6b8,c06847a4,c06f8b38,...) at kdb_backtrace+0x29 witness_checkorder(c69a1060,9,c699e609,18f2) at witness_checkorder+0x4cd _mtx_lock_flags(c69a1060,0,c699e609,18f2,0,...) at _mtx_lock_flags+0x1e pf_test(2,c6568800,e8ef7b40,0,c6b397bc,...) at pf_test+0x6b pf_check_out(0,e8ef7b40,c6568800,2,c6b397bc,...) at pf_check_out+0x3d pfil_run_hooks(c06fc5e0,e8ef7bbc,c6568800,2,c6b397bc,...) at pfil_run_hooks+0xb3 ip_output(c66d3400,0,e8ef7b88,0,0,c6b397bc) at ip_output+0x75e tcp_output(c6cf8740) at tcp_output+0xd51 tcp_usr_connect(c6ca242c,c6c25260,c6b48600) at tcp_usr_connect+0xe3 soconnect(c6ca242c,c6c25260,c6b48600) at soconnect+0x4e kern_connect(c6b48600,4,c6c25260,c6c25260,0,...) at kern_connect+0xbb connect(c6b48600,e8ef7d04) at connect+0x2f syscall(3b,3b,3b,804f380,80521a0,...) at syscall+0x22f Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (98, FreeBSD ELF32, connect), eip = 0x28126fbf, esp = 0xbfbfd26c, ebp = 0xbfbfd3d8 --- lock order reversal: 1st 0xc6d667d0 ipsec request (ipsec request) @ netipsec/xform_esp.c:897 2nd 0xc69a1060 pf task mtx (pf task mtx) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6386 KDB: stack backtrace: kdb_backtrace(ffffffff,c06ba4b0,c06ba6b8,c06847a4,c06f8ac8,...) at kdb_backtrace+0x29 witness_checkorder(c69a1060,9,c699e609,18f2) at witness_checkorder+0x4cd _mtx_lock_flags(c69a1060,0,c699e609,18f2,0,...) at _mtx_lock_flags+0x1e pf_test(2,c64c6800,e4f6c7cc,0,0,...) at pf_test+0x6b pf_check_out(0,e4f6c7cc,c64c6800,2,0,...) at pf_check_out+0x3d pfil_run_hooks(c06fc5e0,e4f6c848,c64c6800,2,0,...) at pfil_run_hooks+0xb3 ip_output(c64e4e00,0,e4f6c814,2,0,...) at ip_output+0x75e ipsec_process_done(c64e4e00,c6d66780,c6f59e88,c6f4a240,c0693460,...) at ipsec_process_done+0x120 esp_output_cb(c6f59e88) at esp_output_cb+0x1ed crypto_done(c6f59e88,2,c6452000,c6f59e88,c6f4a264,...) at crypto_done+0xa0 swcr_process(0,c6f59e88,0,c04e7b7d,c0693460,...) at swcr_process+0x126 crypto_invoke(c6452000,c6f59e88,0) at crypto_invoke+0x9e crypto_dispatch(c6f59e88,1a,c6f59e88,c6f5be38,c6f0d488,...) at crypto_dispatch+0x4a esp_output(c64e4e00,c6d66780,0,14,9,...) at esp_output+0x4ee ipsec4_process_packet(c64e4e00,c6d66780,0,0,c6786400,...) at ipsec4_process_packet+0x206 ip_output(c64e4e00,0,c6de9bb0,0,0,...) at ip_output+0x6df in_gif_output(c6539000,2,c64e4e00,c6de9b84,c6de9b80,...) at in_gif_output+0x237 gif_output(c6539000,c6b95300,c6d73790,c6ba04a4,c6e1de00,...) at gif_output+0x20f ip_output(c6b95300,0,e4f6caf8,0,0,...) at ip_output+0x8de syncache_respond(c6d5eaf0,c6b95300) at syncache_respond+0x2a1 syncache_add(e4f6cbec,e4f6cc3c,c694a86c,e4f6cbe8,c6912b00,e4f6cc3c,c694a880,8,1) at syncache_add+0x37f tcp_input(c6912b00,14,8,c065b728,0,...) at tcp_input+0x6dc ip_input(c6912b00) at ip_input+0x729 netisr_processqueue(c06fa298) at netisr_processqueue+0x6e swi_net(0) at swi_net+0x8c ithread_execute_handlers(c63ff430,c644a080) at ithread_execute_handlers+0xce ithread_loop(c63b6b10,e4f6cd38,c06aafc0,0,c06514db,...) at ithread_loop+0x4f fork_exit(c04dd70c,c63b6b10,e4f6cd38) at fork_exit+0x61 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe4f6cd6c, ebp = 0 --- > the kernel with KDB, DDB, and BREAK_TO_DEBUGGER, and user a serial or > firewire console. If the hang occurs, see if you can get into the debugger, > in which case the logged output from DDB for the following commands would be > very useful: > > show pcpu > show allpcpu > trace > alltrace > ps > show locks > show alllocks > show lockedvnods > show uma > show malloc > > Please open a PR that describes your configuration, includes your kernel > config (since it seems quite customized), any loader.conf settings, a > detailed description of the problem, and the output. I'd be quite interested Okay, I'll file a PR once I can collect more information with the serial console(probably weekend). For now our system administrator is pretty nervous about my suggestion on turning debug.mpsafenet back to 1. ;) > in know, once the machine is in a hung state, whether the numlock light goes > on and off when you hit the numlock key on the keyboard. The numlock light doesn't respond to the key when the machine is hanging; hence Ctrl-Alt-Esc wouldn't break to debugger. -- Thanks, Tai-hwa Liang