From owner-freebsd-current@FreeBSD.ORG Wed Sep 21 17:44:54 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B7A7716A41F; Wed, 21 Sep 2005 17:44:54 +0000 (GMT) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [64.7.153.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4085343D49; Wed, 21 Sep 2005 17:44:54 +0000 (GMT) (envelope-from mike@sentex.net) Received: from pumice6.sentex.ca (pumice6.sentex.ca [64.7.153.21]) by smarthost1.sentex.ca (8.13.3/8.13.3) with ESMTP id j8LHirK9068803; Wed, 21 Sep 2005 13:44:53 -0400 (EDT) (envelope-from mike@sentex.net) Received: from lava.sentex.ca (pyroxene.sentex.ca [199.212.134.18]) by pumice6.sentex.ca (8.13.3/8.13.3) with ESMTP id j8LHirGA034757; Wed, 21 Sep 2005 13:44:53 -0400 (EDT) (envelope-from mike@sentex.net) Received: from simian.sentex.net (simeon.sentex.ca [192.168.43.27]) by lava.sentex.ca (8.13.3/8.13.3) with ESMTP id j8LHipOV010536 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 21 Sep 2005 13:44:52 -0400 (EDT) (envelope-from mike@sentex.net) Message-Id: <6.2.3.4.0.20050921133312.06993240@64.7.153.2> X-Mailer: QUALCOMM Windows Eudora Version 6.2.3.4 Date: Wed, 21 Sep 2005 13:45:17 -0400 To: freebsd-current@freebsd.org From: Mike Tancsa In-Reply-To: <6.2.3.4.0.20050921131152.035040a0@64.7.153.2> References: <6.2.3.4.0.20050921131152.035040a0@64.7.153.2> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Virus-Scanned: by amavisd-new X-Scanned-By: MIMEDefang 2.51 on 64.7.153.18 X-Scanned-By: MIMEDefang 2.51 on 64.7.153.21 Subject: Re: LOR on RELENG_6, beta 5 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Sep 2005 17:44:54 -0000 At 01:15 PM 21/09/2005, Mike Tancsa wrote: >lock order reversal > 1st 0xc295a090 inp (divinp) @ /usr/src/sys/netinet/ip_divert.c:327 > 2nd 0xc28d9150 ipsec request (ipsec request) @ > /usr/src/sys/netipsec/ipsec_output.c:354 >KDB: stack backtrace: >kdb_backtrace(0,ffffffff,c0763e30,c0763e58,c072b204) at kdb_backtrace+0x29 >witness_checkorder(c28d9150,9,c06fc8ea,162) at witness_checkorder+0x564 >_mtx_lock_flags(c28d9150,0,c06fc8ea,162,0) at _mtx_lock_flags+0x5b >ipsec4_process_packet(c28f9800,c28d9100,22,0,c26c8e00) at >ipsec4_process_packet+0x45 >ip_output(c28f9800,0,e740fb28,22,0) at ip_output+0x74f >div_output(c294b858,c28f9800,c23bd440,0,e740fc08) at div_output+0x185 >div_send(c294b858,0,c28f9800,c23bd440,0) at div_send+0x3f >sosend(c294b858,c23bd440,e740fc3c,c28f9800,0) at sosend+0x5e3 >kern_sendit(c2711d80,3,e740fcbc,0,0) at kern_sendit+0x104 >sendit(c2711d80,3,e740fcbc,0,bfbdec04) at sendit+0x163 >sendto(c2711d80,e740fd04,6,0,296) at sendto+0x4d >syscall(3b,3b,3b,2,64) at syscall+0x22f >Xint0x80_syscall() at Xint0x80_syscall+0x1f >--- syscall (133, FreeBSD ELF32, sendto), eip = 0x280c5d97, esp = >0xbfbdeb0c, ebp = 0xbfbeebb8 --- > >I am able to replicate this on the box without too much >effort. Unfortunately, it doesnt want to dump on my twe device for >some reason despite doing a >dumpon -v /dev/twed0s1b OK, I figured out why no dumps. Its not actually core dumping, its getting into some sort of deadlock. The reboot was courtesy of the watchdog kicking in. I recompiled the kernel so I can break to the serial console and disabled the watchdog so the box does not reboot. On the serial console, I see the LOR lock order reversal 1st 0xc292a090 inp (divinp) @ /usr/src/sys/netinet/ip_divert.c:327 2nd 0xc28bc950 ipsec request (ipsec request) @ /usr/src/sys/netipsec/ipsec_output.c:354 KDB: stack backtrace: kdb_backtrace(0,ffffffff,c075ed70,c075ed98,c07261c4) at kdb_backtrace+0x29 witness_checkorder(c28bc950,9,c06f7aa6,162) at witness_checkorder+0x564 _mtx_lock_flags(c28bc950,0,c06f7aa6,162,0) at _mtx_lock_flags+0x5b ipsec4_process_packet(c2a4a400,c28bc900,22,0,c28b7600) at ipsec4_process_packet+0x45 ip_output(c2a4a400,0,e741eb28,22,0) at ip_output+0x74f div_output(c2926b20,c2a4a400,c255a280,0,e741ec08) at div_output+0x185 div_send(c2926b20,0,c2a4a400,c255a280,0) at div_send+0x3f sosend(c2926b20,c255a280,e741ec3c,c2a4a400,0) at sosend+0x5e3 kern_sendit(c2731600,3,e741ecbc,0,0) at kern_sendit+0x104 sendit(c2731600,3,e741ecbc,0,bfbdec1c) at sendit+0x163 sendto(c2731600,e741ed04,6,2,296) at sendto+0x4d syscall(3b,3b,3b,2,7c) at syscall+0x22f Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (133, FreeBSD ELF32, sendto), eip = 0x280c5d97, esp = 0xbfbdeb0c, ebp = 0xbfbeebb8 --- And here the box is frozen up. telnet> send break KDB: enter: Line break on console [thread pid 12 tid 100004 ] Stopped at kdb_enter+0x2b: nop db> db> show pcpu cpuid = 0 curthread = 0xc22a9900: pid 12 "idle: cpu0" curpcb = 0xe3481d90 fpcurthread = none idlethread = 0xc22a9900: pid 12 "idle: cpu0" APIC ID = 0 currentldt = 0x50 spin locks held: db> show alllocks Process 682 (ssh) thread 0xc2aa6000 (100137) exclusive sleep mutex crypto (crypto op queues) r = 0 (0xc07aa420) locked @ /usr/src/sys/opencrypto/crypto.c:669 exclusive sleep mutex ipsec request r = 1 (0xc28bc950) locked @ /usr/src/sys/netipsec/xform_esp.c:872 exclusive sleep mutex inp (tcpinp) r = 0 (0xc28c6a68) locked @ /usr/src/sys/netinet/tcp_usrreq.c:651 Process 586 (natd) thread 0xc2731600 (100080) exclusive sleep mutex inp (divinp) r = 0 (0xc292a090) locked @ /usr/src/sys/netinet/ip_divert.c:327 exclusive sleep mutex div r = 0 (0xc07a0b6c) locked @ /usr/src/sys/netinet/ip_divert.c:325 Process 37 (swi4: clock sio) thread 0xc2317d80 (100036) exclusive sleep mutex tcp r = 0 (0xc07a1ecc) locked @ /usr/src/sys/netinet/tcp_timer.c:457 db> ps pid proc uid ppid pgrp flag stat wmesg wchan cmd 682 c2aa4830 0 681 681 0004002 [LOCK div c29f57c0] ssh 681 c2aa4a3c 0 679 681 0004002 [SLPQ piperd 0xc272f000][SLP] scp 679 c2894830 0 678 679 0004002 [SLPQ pause 0xc2894864][SLP] csh 678 c2894624 1001 675 678 0004102 [SLPQ wait 0xc2894624][SLP] su 675 c26efa3c 1001 674 675 0004002 [SLPQ pause 0xc26efa70][SLP] csh 674 c2894a3c 1001 672 672 0000100 [SLPQ select 0xc079d3c4][SLP] sshd 672 c2734830 0 523 672 0004100 [SLPQ sbwait 0xc2a2920c][SLP] sshd 663 c29f320c 0 657 663 0004102 [SLPQ select 0xc079d3c4][SLP] openvpn 657 c2894c48 0 656 657 0004002 [SLPQ pause 0xc2894c7c][SLP] csh 656 c29f3000 1001 653 656 0004102 [SLPQ wait 0xc29f3000][SLP] su 653 c29f3624 1001 652 653 0004002 [SLPQ pause 0xc29f3658][SLP] csh 652 c26ec624 1001 650 650 0000100 [SLPQ select 0xc079d3c4][SLP] sshd 650 c29f3a3c 0 523 650 0004100 [SLPQ sbwait 0xc2a2a20c][SLP] sshd 649 c29f3c48 0 1 649 0004002 [SLPQ ttyin 0xc254d410][SLP] getty 648 c29f7000 0 1 648 0004002 [SLPQ ttyin 0xc254f010][SLP] getty 647 c29f720c 0 1 647 0004002 [SLPQ ttyin 0xc254c410][SLP] getty 646 c29f7418 0 1 646 0004002 [SLPQ ttyin 0xc254fc10][SLP] getty 645 c29f7624 0 1 645 0004002 [SLPQ ttyin 0xc2555010][SLP] getty 644 c29f7830 0 1 644 0004002 [SLPQ ttyin 0xc254f810][SLP] getty 643 c2734000 0 1 643 0004002 [SLPQ ttyin 0xc2555810][SLP] getty 642 c2733000 0 1 642 0004002 [SLPQ ttyin 0xc2557010][SLP] getty 641 c26efc48 0 1 641 0004002 [SLPQ ttyin 0xc2556010][SLP] getty 595 c2730a3c 0 1 64 0000002 [SLPQ select 0xc079d3c4][SLP] 3dm2 586 c2730624 0 1 586 0000000 [LOCK ipsec request c26bacc0] natd 570 c2734624 0 1 570 0000101 [SLPQ select 0xc079d3c4][SLP] bgpd 568 c289020c 0 1 568 0000101 [SLPQ select 0xc079d3c4][SLP] zebra 545 c2734c48 0 1 545 0000000 [SLPQ nanslp 0xc07500ac][SLP] cron 532 c2733830 25 1 532 0000100 [SLPQ pause 0xc2733864][SLP] sendmail 528 c2730418 0 1 528 0000100 [SLPQ select 0xc079d3c4][SLP] sendmail 523 c2733c48 0 1 523 0000100 [SLPQ select 0xc079d3c4][SLP] sshd 521 c2890c48 0 505 505 0000000 [SLPQ pause 0xc2890c7c][SLP] ntpd 505 c26eca3c 0 1 505 0000000 [SLPQ select 0xc079d3c4][SLP] ntpd 407 c2890624 53 1 407 0000100 [SLPQ select 0xc079d3c4][SLP] named 346 c2890000 0 1 346 0000000 [SLPQ select 0xc079d3c4][SLP] syslogd 313 c26ec000 0 1 313 0000000 [SLPQ select 0xc079d3c4][SLP] devd 191 c26ef000 0 1 191 0000000 [SLPQ pause 0xc26ef034][SLP] adjkerntz 63 c238d20c 0 0 0 0000204 [SLPQ - 0xe4f4fd04][SLP] schedcpu 62 c238d418 0 0 0 0000204 [SLPQ - 0xc07a502c][SLP] nfsiod 3 61 c238d624 0 0 0 0000204 [SLPQ - 0xc07a5028][SLP] nfsiod 2 60 c238d830 0 0 0 0000204 [SLPQ - 0xc07a5024][SLP] nfsiod 1 59 c238da3c 0 0 0 0000204 [SLPQ - 0xc07a5020][SLP] nfsiod 0 58 c238dc48 0 0 0 0000204 [SLPQ vlruwt 0xc238dc48][SLP] vnlru 57 c23f4000 0 0 0 0000204 [SLPQ syncer 0xc074fe1c][SLP] syncer 56 c23f420c 0 0 0 0000204 [SLPQ psleep 0xc079d90c][SLP] bufdaemon 55 c23f4418 0 0 0 000020c [SLPQ pgzero 0xc07ab684][SLP] pagezero 54 c23f4624 0 0 0 0000204 [SLPQ psleep 0xc07ab1d4][SLP] vmdaemon 53 c23f4830 0 0 0 0000204 [SLPQ psleep 0xc07ab190][SLP] pagedaemon 52 c23f4a3c 0 0 0 0000204 [IWAIT] swi0: sio 51 c23f4c48 0 0 0 0000204 [SLPQ usbevt 0xc238b210][SLP] usb4 50 c2316624 0 0 0 0000204 [SLPQ usbevt 0xc23d4210][SLP] usb3 49 c2316830 0 0 0 0000204 [SLPQ usbevt 0xc23c0210][SLP] usb2 48 c2316a3c 0 0 0 0000204 [SLPQ usbevt 0xc2394210][SLP] usb1 47 c2316c48 0 0 0 0000204 [SLPQ usbtsk 0xc074ad64][SLP] usbtask 46 c238c000 0 0 0 0000204 [SLPQ usbevt 0xc2398210][SLP] usb0 45 c238c20c 0 0 0 0000204 [IWAIT] swi6: task queue 44 c238c418 0 0 0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task2 43 c238c624 0 0 0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task1 9 c238c830 0 0 0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task0 8 c238ca3c 0 0 0 0000204 [SLPQ - 0xc233aa00][SLP] kqueue taskq 42 c238cc48 0 0 0 0000204 [IWAIT] swi2: cambio 41 c238d000 0 0 0 0000204 [IWAIT] swi5:+ 7 c2309c48 0 0 0 0000204 [SLPQ - 0xc233ac80][SLP] thread taskq 40 c2315000 0 0 0 0000204 [IWAIT] swi6:+ 39 c231520c 0 0 0 0000204 [SLPQ - 0xc074a5a0][SLP] yarrow 6 c2315418 0 0 0 0000204 [SLPQ - 0xc074d5a8][SLP] g_down 5 c2315624 0 0 0 0000204 [SLPQ - 0xc074d5a4][SLP] g_up 4 c2315830 0 0 0 0000204 [SLPQ - 0xc074d59c][SLP] g_event 3 c2315a3c 0 0 0 0000204 [SLPQ crypto_ret_wait 0xc07aa444][SLP] crypto returns 2 c2315c48 0 0 0 0000204 [SLPQ crypto_wait 0xc07aa404][SLP] crypto 38 c2316000 0 0 0 0000204 [IWAIT] swi3: vm 37 c231620c 0 0 0 000020c [LOCK inp c22f8dc0] swi4: clock sio 36 c2316418 0 0 0 0000204 [LOCK div c29f57c0] swi1: net 35 c2300624 0 0 0 0000204 [IWAIT] irq23: uhci0 ehci0 34 c2300830 0 0 0 0000204 [IWAIT] irq22: em0 33 c2300a3c 0 0 0 0000204 [IWAIT] irq21: twe0 32 c2300c48 0 0 0 0000204 [IWAIT] irq20: fxp0 31 c2309000 0 0 0 0000204 [IWAIT] irq19: uhci1++ 30 c230920c 0 0 0 0000204 [IWAIT] irq18: uhci2 29 c2309418 0 0 0 0000204 [IWAIT] irq17: fwohci0 28 c2309624 0 0 0 0000204 [IWAIT] irq16: uhci3 27 c2309830 0 0 0 0000204 [IWAIT] irq15: ata1 26 c2309a3c 0 0 0 0000204 [IWAIT] irq14: ata0 25 c22ad20c 0 0 0 0000204 [IWAIT] irq13: 24 c22ad418 0 0 0 0000204 [IWAIT] irq12: 23 c22ad624 0 0 0 0000204 [IWAIT] irq11: 22 c22ad830 0 0 0 0000204 [IWAIT] irq10: 21 c22ada3c 0 0 0 0000204 [IWAIT] irq9: acpi0 20 c22adc48 0 0 0 0000204 [IWAIT] irq8: 19 c2300000 0 0 0 0000204 [IWAIT] irq7: ppc0 18 c230020c 0 0 0 0000204 [IWAIT] irq6: 17 c2300418 0 0 0 0000204 [IWAIT] irq5: 16 c22a8000 0 0 0 0000204 [IWAIT] irq4: sio0 15 c22a820c 0 0 0 0000204 [IWAIT] irq3: 14 c22a8418 0 0 0 0000204 [IWAIT] irq0: 13 c22a8624 0 0 0 0000204 [IWAIT] irq1: atkbd0 12 c22a8830 0 0 0 000020c [CPU 0] idle: cpu0 11 c22a8a3c 0 0 0 000020c [CPU 1] idle: cpu1 1 c22a8c48 0 0 1 0004200 [SLPQ wait 0xc22a8c48][SLP] init 10 c22ad000 0 0 0 0000204 [SLPQ ktrace 0xc074dff8][SLP] ktrace 0 c074d6a0 0 0 0 0000200 [IWAIT] swapper db> show lockedvnods Locked vnodes db> show lockedbufs db> trace Tracing pid 12 tid 100004 td 0xc22a9900 kdb_enter(c07056df) at kdb_enter+0x2b siointr1(c254b000,c07ad5c0,0,c07054eb,56e) at siointr1+0xce siointr(c254b000) at siointr+0x21 intr_execute_handlers(c229f490,e3481c94,4,e3481cd8,c068c0e3) at intr_execute_handlers+0xa5 lapic_handle_intr(34) at lapic_handle_intr+0x2e Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc08a18fd, esp = 0xe3481cd8, ebp = 0xe3481cd8 --- acpi_cpu_c1(c074f780,1,e3481cf8,1,c22a8830) at acpi_cpu_c1+0x5 acpi_cpu_idle(e3481d0c,c0516a51,c05169f4,e3481d24,c0516834) at acpi_cpu_idle+0x13e cpu_idle(c05169f4,e3481d24,c0516834,0,e3481d38) at cpu_idle+0x28 idle_proc(0,e3481d38,0,c05169f4,0) at idle_proc+0x5d fork_exit(c05169f4,0,e3481d38) at fork_exit+0xa0 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe3481d6c, ebp = 0 --- db> show lockedvnods Locked vnodes db> ># dumpon -v /dev/twed0s1b >kernel dumps on /dev/twed0s1b ># pstat -T >110/12328 files >0M/2048M swap space > > ---Mike > >-------------------------------------------------------------------- >Mike Tancsa, tel +1 519 651 3400 >Sentex Communications, mike@sentex.net >Providing Internet since 1994 www.sentex.net >Cambridge, Ontario Canada www.sentex.net/mike