From owner-freebsd-current@FreeBSD.ORG  Mon Dec 11 13:50:44 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 079FB16A40F;
	Mon, 11 Dec 2006 13:50:44 +0000 (UTC)
	(envelope-from avatar@mmlab.cse.yzu.edu.tw)
Received: from www.mmlab.cse.yzu.edu.tw (www.mmlab.cse.yzu.edu.tw
	[140.138.150.166])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6AC8C43D36;
	Mon, 11 Dec 2006 13:49:07 +0000 (GMT)
	(envelope-from avatar@mmlab.cse.yzu.edu.tw)
Received: by www.mmlab.cse.yzu.edu.tw (qmail, from userid 1000)
	id 7FAE18C9D8F; Mon, 11 Dec 2006 21:50:14 +0800 (CST)
Received: from localhost (localhost [127.0.0.1])
	by www.mmlab.cse.yzu.edu.tw (qmail) with ESMTP id 6AB1A8C9D8B;
	Mon, 11 Dec 2006 21:50:14 +0800 (CST)
Date: Mon, 11 Dec 2006 21:50:14 +0800 (CST)
From: Tai-hwa Liang <avatar@mmlab.cse.yzu.edu.tw>
To: Robert Watson <rwatson@FreeBSD.org>
In-Reply-To: <20061210084254.X9926@fledge.watson.org>
Message-ID: <06121121220014.48706@www.mmlab.cse.yzu.edu.tw>
References: <52944.192.168.1.110.1165679313.squirrel@yal.hopto.org> 
	<20061209195519.B60055@mp2.macomnet.net>
	<20061209204924.N9926@fledge.watson.org>
	<cb5206420612091310r719f7b3en2d4fb35b23453ddf@mail.gmail.com>
	<20061209214233.L2273@fledge.watson.org>
	<0612101036232.41529@www.mmlab.cse.yzu.edu.tw>
	<20061210084254.X9926@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-current@freebsd.org
Subject: Re: CURRENT freezes on Laitude D520
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Dec 2006 13:50:44 -0000

On Sun, 10 Dec 2006, Robert Watson wrote:
>
> On Sun, 10 Dec 2006, Tai-hwa Liang wrote:
>
>>> which get a bit more to the heart of most problems.  debug.mpsafenet=1 
>>> really exists for the purposes of supporting components which are not 
>>> sufficiently locked to allow the stack to run MPSAFE, rather than as a 
>>> means of disabling direct dispatch and preemption, which speak to 
>>> different types of problems. The main reason that I haven't removed the 
>>> administrator tunable to date is that I suspect it will be quite helpful 
>>> when KAME IPSEC locking happens, but since that appears not to have 
>>> happened yet, debug.mpsafenet as an option is likely causing more harm 
>>> than good by being available as a stand-in sysctl masking other problems, 
>>> causing people to not get to the point of properly identifying the actual 
>>> cause (device driver bugs, etc).
>> 
>> Can the aforementioned tricks(1/2/3) being applied to RELENG_6 as well?
>
> WITNESS is available in RELENG_6, and should be used in combination with 
> INVARIANTS, DDB, KDB, and BREAK_TO_DEBUGGER to debug deadlocks.

   Would a kernel with WITNESS/[KD]DB/BREAK_TO_DEBUGGER enabled but w/o
INVARIANTS compiled adequate to dump useful information through remote
serial console?

> In RELENG_6, net.isr.direct is not enabled by default, so unless you've 
> enabled it yourself (or are using IP fast forwarding, which is functionally 
> similar), that won't apply.
>
> In RELENG_6, PREEMPTION is in GENERIC and hence enabled by default, and it 
> can be disabled by removing it from your kernel configuration.  I'd like it 
> if we could add a run-time sysctl to disable preemption even if PREEMPTION is 
> compiled in, as it would make it easier to explore its stability and 
> performance impact.  However, this is also just a debugging step to see if 
> that quiesces the problem, and not a fix for the actual bug.
>
> Right now, we're discussing removing the manual debug.mpsafenet configuration 
> flag from 7.x, and not 6.x.  I fully recognize the importance of having it in 
> place as a workaround for bugs in production, although it concerns me greatly 
> that we're not getting these problems debugged and fixed, and instead masking 
> them.  Architectural changes are on the way that will require these bugs to 
> be fixed properly, not just masked.
>
>> We are using RELENG_6 as our production server(postfix, squid, pf 
>> firewall/NAT, FAST_IPSEC VPN, ...), which is a dual Athlon MP board with 
>> three NICs(two fxp cards and one onboard xl, connected to three different 
>> networks).
>> 
>> I haven't try WITNESS, yet; however, I'm very sure that net.isr.direct=0 
>> plus that there is no PREEMPTION in current kernel.  The problem is that, 
>> with debug.mpsafenet=1, we'll always run into hard freeze w/o having any 
>> kdb> prompt on console.
>> 
>> Whilst turning debug.mpsafenet off only masks the real problem, I'm still 
>> wondering about if there is any less damaging way to track such problem 
>> down in a _production_ environment.
>
> It sounds like you need to follow the instructions for kernel debugging. 
> Depending on your tolerance of performance loss, downtime, etc, a good 
> starting point is to configure the kernel with INVARIANTS and WITNESS. 
> WITNESS is particularly important, if you can tolerate the performance hit, 
> as it warns of potential deadlocks, not just actual deadlocks.  Also, compile

   With WITNESS enabled(debug.mpsafenet=0), I got another three pf related
warnings in the last 8 hours:

#
# this one looks like lor#046; however, I'm not sure if it is safe to drop
# pf_task_mtx with PF_UNLOCK before invoking pf_test() in pf_check_out().
#
lock order reversal:
1st 0xc06fca2c tcp (tcp) @ netinet/tcp_input.c:625
2nd 0xc69a1060 pf task mtx (pf task mtx) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6386
KDB: stack backtrace:
kdb_backtrace(ffffffff,c06bc0d0,c06ba6b8,c06847a4,c06f8b00,...) at kdb_backtrace+0x29
witness_checkorder(c69a1060,9,c699e609,18f2) at witness_checkorder+0x4cd
_mtx_lock_flags(c69a1060,0,c699e609,18f2,0,...) at _mtx_lock_flags+0x1e
pf_test(2,c64c6800,e4f6cae0,0,0,...) at pf_test+0x6b
pf_check_out(0,e4f6cae0,c64c6800,2,0,...) at pf_check_out+0x3d
pfil_run_hooks(c06fc5e0,e4f6cb5c,c64c6800,2,0,...) at pfil_run_hooks+0xb3
ip_output(c64e0300,0,e4f6cb28,0,0,...) at ip_output+0x75e
tcp_respond(0,c650a030,c650a044,c64e0300,eb232182,0,14) at tcp_respond+0x22a
tcp_input(c64e0300,14,8,121245cb,1,...) at tcp_input+0x2559
ip_input(c64e0300) at ip_input+0x729
netisr_processqueue(c06fa298) at netisr_processqueue+0x6e
swi_net(0) at swi_net+0x8c
ithread_execute_handlers(c63ff430,c644a080) at ithread_execute_handlers+0xce
ithread_loop(c63b6b10,e4f6cd38,c06aafc0,0,c06514db,...) at ithread_loop+0x4f
fork_exit(c04dd70c,c63b6b10,e4f6cd38) at fork_exit+0x61
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe4f6cd6c, ebp = 0 ---

lock order reversal:
1st 0xc6b3984c inp (tcpinp) @ netinet/tcp_usrreq.c:368
2nd 0xc69a1060 pf task mtx (pf task mtx) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6386
KDB: stack backtrace:
kdb_backtrace(ffffffff,c06bc0a8,c06ba6b8,c06847a4,c06f8b38,...) at kdb_backtrace+0x29
witness_checkorder(c69a1060,9,c699e609,18f2) at witness_checkorder+0x4cd
_mtx_lock_flags(c69a1060,0,c699e609,18f2,0,...) at _mtx_lock_flags+0x1e
pf_test(2,c6568800,e8ef7b40,0,c6b397bc,...) at pf_test+0x6b
pf_check_out(0,e8ef7b40,c6568800,2,c6b397bc,...) at pf_check_out+0x3d
pfil_run_hooks(c06fc5e0,e8ef7bbc,c6568800,2,c6b397bc,...) at 
pfil_run_hooks+0xb3
ip_output(c66d3400,0,e8ef7b88,0,0,c6b397bc) at ip_output+0x75e
tcp_output(c6cf8740) at tcp_output+0xd51
tcp_usr_connect(c6ca242c,c6c25260,c6b48600) at tcp_usr_connect+0xe3
soconnect(c6ca242c,c6c25260,c6b48600) at soconnect+0x4e
kern_connect(c6b48600,4,c6c25260,c6c25260,0,...) at kern_connect+0xbb
connect(c6b48600,e8ef7d04) at connect+0x2f
syscall(3b,3b,3b,804f380,80521a0,...) at syscall+0x22f
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (98, FreeBSD ELF32, connect), eip = 0x28126fbf, esp = 0xbfbfd26c, ebp = 0xbfbfd3d8 ---

lock order reversal:
1st 0xc6d667d0 ipsec request (ipsec request) @ netipsec/xform_esp.c:897
2nd 0xc69a1060 pf task mtx (pf task mtx) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6386
KDB: stack backtrace:
kdb_backtrace(ffffffff,c06ba4b0,c06ba6b8,c06847a4,c06f8ac8,...) at kdb_backtrace+0x29
witness_checkorder(c69a1060,9,c699e609,18f2) at witness_checkorder+0x4cd
_mtx_lock_flags(c69a1060,0,c699e609,18f2,0,...) at _mtx_lock_flags+0x1e
pf_test(2,c64c6800,e4f6c7cc,0,0,...) at pf_test+0x6b
pf_check_out(0,e4f6c7cc,c64c6800,2,0,...) at pf_check_out+0x3d
pfil_run_hooks(c06fc5e0,e4f6c848,c64c6800,2,0,...) at pfil_run_hooks+0xb3
ip_output(c64e4e00,0,e4f6c814,2,0,...) at ip_output+0x75e
ipsec_process_done(c64e4e00,c6d66780,c6f59e88,c6f4a240,c0693460,...) at ipsec_process_done+0x120
esp_output_cb(c6f59e88) at esp_output_cb+0x1ed
crypto_done(c6f59e88,2,c6452000,c6f59e88,c6f4a264,...) at crypto_done+0xa0
swcr_process(0,c6f59e88,0,c04e7b7d,c0693460,...) at swcr_process+0x126
crypto_invoke(c6452000,c6f59e88,0) at crypto_invoke+0x9e
crypto_dispatch(c6f59e88,1a,c6f59e88,c6f5be38,c6f0d488,...) at crypto_dispatch+0x4a
esp_output(c64e4e00,c6d66780,0,14,9,...) at esp_output+0x4ee
ipsec4_process_packet(c64e4e00,c6d66780,0,0,c6786400,...) at ipsec4_process_packet+0x206
ip_output(c64e4e00,0,c6de9bb0,0,0,...) at ip_output+0x6df
in_gif_output(c6539000,2,c64e4e00,c6de9b84,c6de9b80,...) at in_gif_output+0x237
gif_output(c6539000,c6b95300,c6d73790,c6ba04a4,c6e1de00,...) at gif_output+0x20f
ip_output(c6b95300,0,e4f6caf8,0,0,...) at ip_output+0x8de
syncache_respond(c6d5eaf0,c6b95300) at syncache_respond+0x2a1
syncache_add(e4f6cbec,e4f6cc3c,c694a86c,e4f6cbe8,c6912b00,e4f6cc3c,c694a880,8,1) at syncache_add+0x37f
tcp_input(c6912b00,14,8,c065b728,0,...) at tcp_input+0x6dc
ip_input(c6912b00) at ip_input+0x729
netisr_processqueue(c06fa298) at netisr_processqueue+0x6e
swi_net(0) at swi_net+0x8c
ithread_execute_handlers(c63ff430,c644a080) at ithread_execute_handlers+0xce
ithread_loop(c63b6b10,e4f6cd38,c06aafc0,0,c06514db,...) at ithread_loop+0x4f
fork_exit(c04dd70c,c63b6b10,e4f6cd38) at fork_exit+0x61
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe4f6cd6c, ebp = 0 ---


> the kernel with KDB, DDB, and BREAK_TO_DEBUGGER, and user a serial or 
> firewire console.  If the hang occurs, see if you can get into the debugger, 
> in which case the logged output from DDB for the following commands would be 
> very useful:
>
> show pcpu
> show allpcpu
> trace
> alltrace
> ps
> show locks
> show alllocks
> show lockedvnods
> show uma
> show malloc
>
> Please open a PR that describes your configuration, includes your kernel 
> config (since it seems quite customized), any loader.conf settings, a 
> detailed description of the problem, and the output.  I'd be quite interested

   Okay, I'll file a PR once I can collect more information with the serial
console(probably weekend).  For now our system administrator is pretty
nervous about my suggestion on turning debug.mpsafenet back to 1. ;)

> in know, once the machine is in a hung state, whether the numlock light goes 
> on and off when you hit the numlock key on the keyboard.

   The numlock light doesn't respond to the key when the machine is hanging;
hence Ctrl-Alt-Esc wouldn't break to debugger.

-- 
Thanks,

Tai-hwa Liang