From owner-freebsd-stable@FreeBSD.ORG Fri May 25 15:17:39 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2ACC416A41F for ; Fri, 25 May 2007 15:17:39 +0000 (UTC) (envelope-from volker@vwsoft.com) Received: from frontmail.ipactive.de (frontmail.maindns.de [85.214.95.103]) by mx1.freebsd.org (Postfix) with ESMTP id B847813C44C for ; Fri, 25 May 2007 15:17:38 +0000 (UTC) (envelope-from volker@vwsoft.com) Received: from mail.vtec.ipme.de (Q7c56.q.ppp-pool.de [89.53.124.86]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by frontmail.ipactive.de (Postfix) with ESMTP id 9AC26128844 for ; Fri, 25 May 2007 17:17:31 +0200 (CEST) Received: from epeios.sz.vwsoft.com (epeios.sz.vwsoft.com [192.168.16.5]) by mail.vtec.ipme.de (Postfix) with ESMTP id 6D3503FA00 for ; Fri, 25 May 2007 17:17:01 +0200 (CEST) Message-ID: <4656FDEE.7020002@vwsoft.com> Date: Fri, 25 May 2007 17:17:02 +0200 From: Volker User-Agent: Thunderbird 2.0.0.0 (X11/20070521) MIME-Version: 1.0 To: freebsd-stable@FreeBSD.ORG References: <200705230717.l4N7HuPW010071@lurza.secnetix.de> <465408F9.6080302@vwsoft.com> <4654C0C4.2030405@vwsoft.com> <20070523215818.GB64723@xor.obsecurity.org> <4656A73E.9040109@vwsoft.com> <4656CC57.7010705@vwsoft.com> In-Reply-To: <4656CC57.7010705@vwsoft.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-VWSoft-MailScanner: Found to be clean X-MailScanner-From: volker@vwsoft.com X-ipactive-MailScanner-Information: Please contact the ISP for more information X-ipactive-MailScanner: Found to be clean X-ipactive-MailScanner-From: volker@vwsoft.com Cc: Subject: LORs (was Re: ghosthunting: machine freeze 6.2R) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 May 2007 15:17:39 -0000 On 05/25/07 13:45, Volker wrote: > Using a debug kernel, the machine came up quickly with this LOR after > the reboot: > > lock order reversal: > 1st 0xc077078c tcp (tcp) @ /usr/src/sys/netinet/tcp_input.c:625 > 2nd 0xc4f18180 pf task mtx (pf task mtx) @ > /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6386 > KDB: stack backtrace: > kdb_backtrace(0,ffffffff,c072fcd0,c072e1c8,c06f6124,...) at > kdb_backtrace+0x29 > witness_checkorder(c4f18180,9,c4f1536e,18f2) at witness_checkorder+0x578 > _mtx_lock_flags(c4f18180,0,c4f1536e,18f2,c4f18180,...) at > _mtx_lock_flags+0x78 > pf_test(2,c4bdec00,e35c5ac4,0,0,...) at pf_test+0x81 > pf_check_out(0,e35c5ac4,c4bdec00,2,0) at pf_check_out+0x3d > pfil_run_hooks(c0770340,e35c5b40,c4bdec00,2,0,...) at pfil_run_hooks+0xc9 > ip_output(c50c8200,0,e35c5b0c,0,0,0) at ip_output+0x83a > tcp_respond(0,c4f85810,c4f85824,c50c8200,0,7a481ad6,4) at tcp_respond+0x3e1 > tcp_input(c50c8200,14,1,93d306d9,0,...) at tcp_input+0x3124 > ip_input(c50c8200) at ip_input+0x785 > netisr_processqueue(c076dfd8) at netisr_processqueue+0x6e > swi_net(0) at swi_net+0xc2 > ithread_execute_handlers(c4afca78,c4b4b180) at > ithread_execute_handlers+0xe6 > ithread_loop(c4adb990,e35c5d38,c4adb990,c0505918,0,...) at > ithread_loop+0x67 > fork_exit(c0505918,c4adb990,e35c5d38) at fork_exit+0xa0 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xe35c5d6c, ebp = 0 --- > Expensive timeout(9) function: 0xc0528fb4(0) 0.002565972 s > This first one appeared at 13:22 (short after bootup). ok, the next two LORs (similar to the first): at 13:28 this one came into the logs: lock order reversal: 1st 0xc077078c tcp (tcp) @ /usr/src/sys/netinet/tcp_input.c:625 2nd 0xc4f18180 pf task mtx (pf task mtx) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6386 KDB: stack backtrace: kdb_backtrace(0,ffffffff,c072fcd0,c072e1c8,c06f6124,...) at kdb_backtrace+0x29 witness_checkorder(c4f18180,9,c4f1536e,18f2) at witness_checkorder+0x578 _mtx_lock_flags(c4f18180,0,c4f1536e,18f2,c4f18180,...) at _mtx_lock_flags+0x78 pf_test(2,c4bdec00,e35c5ac4,0,0,...) at pf_test+0x81 pf_check_out(0,e35c5ac4,c4bdec00,2,0) at pf_check_out+0x3d pfil_run_hooks(c0770340,e35c5b40,c4bdec00,2,0,...) at pfil_run_hooks+0xc9 ip_output(c50c8200,0,e35c5b0c,0,0,0) at ip_output+0x83a tcp_respond(0,c4f85810,c4f85824,c50c8200,0,7a481ad6,4) at tcp_respond+0x3e1 tcp_input(c50c8200,14,1,93d306d9,0,...) at tcp_input+0x3124 ip_input(c50c8200) at ip_input+0x785 netisr_processqueue(c076dfd8) at netisr_processqueue+0x6e swi_net(0) at swi_net+0xc2 ithread_execute_handlers(c4afca78,c4b4b180) at ithread_execute_handlers+0xe6 ithread_loop(c4adb990,e35c5d38,c4adb990,c0505918,0,...) at ithread_loop+0x67 fork_exit(c0505918,c4adb990,e35c5d38) at fork_exit+0xa0 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe35c5d6c, ebp = 0 --- Expensive timeout(9) function: 0xc0528fb4(0) 0.002565972 s At 16:55 I catched this message: kernel: acpi: suspend request ignored (not ready yet) A minute (or seconds?) the machine died and I did not get anything around that time into the logs. What's the reason for this ACPI message? After bootup (reset key pressed by an operator), the machine brought this LOR: lock order reversal: 1st 0xc4f68180 pf task mtx (pf task mtx) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6386 2nd 0xc077078c tcp (tcp) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:2744 KDB: stack backtrace: kdb_backtrace(0,ffffffff,c072e1c8,c072fcd0,c06f6124,...) at kdb_backtrace+0x29 witness_checkorder(c077078c,9,c4f6536e,ab8) at witness_checkorder+0x578 _mtx_lock_flags(c077078c,0,c4f6536e,ab8,c077078c,...) at _mtx_lock_flags+0x78 pf_socket_lookup(e35c5b00,e35c5b04,1,e35c5bc0,0,...) at pf_socket_lookup+0x1d3 pf_test_tcp(e35c5b70,e35c5b68,1,c4ee0e00,c4d6f400,...) at pf_test_tcp+0x11e6 pf_test(1,c4c11c00,e35c5c5c,0,0,...) at pf_test+0xb8b pf_check_in(0,e35c5c5c,c4c11c00,1,0) at pf_check_in+0x37 pfil_run_hooks(c0770340,e35c5cb4,c4c11c00,1,0) at pfil_run_hooks+0xc9 ip_input(c4d6f400) at ip_input+0x272 netisr_processqueue(c076dfd8) at netisr_processqueue+0x6e swi_net(0) at swi_net+0xc2 ithread_execute_handlers(c4afca78,c4b4b180) at ithread_execute_handlers+0xe6 ithread_loop(c4adb990,e35c5d38,c4adb990,c0505918,0,...) at ithread_loop+0x67 fork_exit(c0505918,c4adb990,e35c5d38) at fork_exit+0xa0 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe35c5d6c, ebp = 0 --- My assumption: The LORs are somewhat pf related but are not related to the lockdown of the system. Am I correct? What might be reason for that ACPI message and may ACPI be a cause of the lockdown? What might be a possible cause for WITNESS and INVARIANTS being unable to catch whatever causes the freeze? Thx Volker PS: sorry for flooding this list, should I direct postings to hackers@? PPS: Is anybody able to provide me patches for these LORs?