From owner-freebsd-stable@FreeBSD.ORG Tue Sep 16 01:14:01 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1E878106566B for ; Tue, 16 Sep 2008 01:14:01 +0000 (UTC) (envelope-from fbsd-ml@scrapper.ca) Received: from idcmail-mo1so.shaw.ca (idcmail-mo1so.shaw.ca [24.71.223.10]) by mx1.freebsd.org (Postfix) with ESMTP id D24CC8FC21 for ; Tue, 16 Sep 2008 01:14:00 +0000 (UTC) (envelope-from fbsd-ml@scrapper.ca) Received: from pd4ml1so-ssvc.prod.shaw.ca ([10.0.141.141]) by pd2mo1so-svcs.prod.shaw.ca with ESMTP; 15 Sep 2008 19:14:00 -0600 X-Cloudmark-SP-Filtered: true X-Cloudmark-SP-Result: v=1.0 c=0 a=KzQIaqKIxrKvEe4sGnwA:9 a=J0klqugwZ4TupVOHDoAA:7 a=EP2xPC6RwYa4vJb8ynWGv8oWdqAA:4 a=1jPFEaVNb5QA:10 a=g1R10qNEJU8A:10 Received: from s010600121729c74c.vc.shawcable.net (HELO proven.lan) ([24.85.241.34]) by pd4ml1so-dmz.prod.shaw.ca with ESMTP; 15 Sep 2008 19:13:59 -0600 Received: from proven.lan (localhost [127.0.0.1]) by proven.lan (8.14.3/8.14.3) with ESMTP id m8G1DxT9002466; Mon, 15 Sep 2008 18:13:59 -0700 (PDT) (envelope-from fbsd-ml@scrapper.ca) Received: from localhost (localhost [[UNIX: localhost]]) by proven.lan (8.14.3/8.14.3/Submit) id m8G1Dw7M002465; Mon, 15 Sep 2008 18:13:58 -0700 (PDT) (envelope-from fbsd-ml@scrapper.ca) X-Authentication-Warning: proven.lan: npapke set sender to fbsd-ml@scrapper.ca using -f From: Norbert Papke Organization: Archaeological Filing To: freebsd-stable@freebsd.org Date: Mon, 15 Sep 2008 18:13:58 -0700 User-Agent: KMail/1.9.10 References: <200809141219.24943.fbsd-ml@scrapper.ca> <1221471431.49328.5.camel@buffy.york.ac.uk> In-Reply-To: <1221471431.49328.5.camel@buffy.york.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809151813.58749.fbsd-ml@scrapper.ca> Cc: Gavin Atkinson Subject: Re: Possible UDP related deadlock in 7.1-PRERELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Sep 2008 01:14:01 -0000 On September 15, 2008, Gavin Atkinson wrote: > On Sun, 2008-09-14 at 12:19 -0700, Norbert Papke wrote: > > Symptoms: > > > > * I can trigger this lockup reliably by starting ktorrent. After a short > > while (one to two minutes), it locks up. Other commands, e.g., netstat, > > also lock up. > > * The console generates "nfe0: watchdog timeout" error messages. > > * The system becomes unusable and must be rebooted. > > > > Attempted Diagnosis: > > > > If I break into DDB, the 'ps' output shows a number of processes that > > seem to be locked related to udp. > > > > [irq18:dc0] L *udp > > ktorrent L *udpinp > > hald L *udp > > ntpd L *udp > > > > Unfortunately, I am rapidly getting out of my depth here. I have no idea > > how to go about further analyzing this problem and would appreciate help. > > Can you add: > options WITNESS > options WITNESS_SKIPSPIN > > to your kernel, recompile and wait for the problem to happen again? > When it does, from the debugger issue "sh alllocks" and make a note of > the output? With WITNESS enabled, I now experience panics and could not follow your instructions. There is no core dump. The following gets logged to /var/log/messages: shared lock of (rw) udpinp @ /usr/src/sys/netinet/udp_usrreq.c:864 while exclusively locked from /usr/src/sys/netinet6/udp6_usrreq.c:940 panic: share->excl KDB: stack backtrace: db_trace_self_wrapper(c06fda7c,f6b96978,c052046a,c06fbb5d,c07695c0,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c06fbb5d,c07695c0,c06febd1,f6b96984,f6b96984,...) at kdb_backtrace+0x29 panic(c06febd1,c070c409,3ac,c0709eee,360,...) at panic+0xaa witness_checkorder(ccd5209c,1,c0709eee,360,8,...) at witness_checkorder+0x17c _rw_rlock(ccd5209c,c0709eee,360,c07780e0,cd4652c8,...) at _rw_rlock+0x2a udp_send(d3942000,0,c580f400,c68faa00,0,...) at udp_send+0x197 udp6_send(d3942000,0,c580f400,c68faa00,0,...) at udp6_send+0x140 sosend_generic(d3942000,c68faa00,f6b96be8,0,0,...) at sosend_generic+0x50d sosend(d3942000,c68faa00,f6b96be8,0,0,...) at sosend+0x3f kern_sendit(cd465230,f,f6b96c64,0,0,...) at kern_sendit+0x106 sendit(0,871b9fe,0,c68faa00,1c,...) at sendit+0x182 sendto(cd465230,f6b96cfc,18,cd465230,c072bab8,...) at sendto+0x4f syscall(f6b96d38) at syscall+0x293 Note that I do not use IPv6, none of my network interfaces is configured for it. Also, since I enabled WITNESS, I get the following logged during system startup: Enabling pf. lock order reversal: 1st 0xc09af92c pf task mtx (pf task mtx) @ /usr/src/sys/modules/pf/../../contri b/pf/net/pf_ioctl.c:1394 2nd 0xc07b4d68 ifnet (ifnet) @ /usr/src/sys/net/if.c:1558 KDB: stack backtrace: db_trace_self_wrapper(c06fda7c,f4914a60,c0552c75,c06fed11,c07b4d68,...) at db_tr ace_self_wrapper+0x26 kdb_backtrace(c06fed11,c07b4d68,c0703ca2,c0703ca2,c0703c73,...) at kdb_backtrace +0x29 witness_checkorder(c07b4d68,9,c0703c73,616,572,...) at witness_checkorder+0x5e5 _mtx_lock_flags(c07b4d68,0,c0703c73,616,c0104414,...) at _mtx_lock_flags+0x34 ifunit(c6ef5c20,0,c09adfb5,572,c0703a71,...) at ifunit+0x2f pfioctl(c566ce00,c0104414,c6ef5c20,3,c60c38c0,...) at pfioctl+0x2b43 devfs_ioctl_f(c588bb94,c0104414,c6ef5c20,c54bb900,c60c38c0,...) at devfs_ioctl_f +0xe6 kern_ioctl(c60c38c0,3,c0104414,c6ef5c20,1000000,...) at kern_ioctl+0x243 ioctl(c60c38c0,f4914cfc,c,c0718d59,c072b350,...) at ioctl+0x134 syscall(f4914d38) at syscall+0x293 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x281ab6f3, esp = 0xbfbfde3c, ebp = 0xbfbfde68 --- pf enabled I tried to unload 'pf' to see if it was the culprit. However, even without pf loaded, I experience the panic. Is there anything else I can try to provide better insight into what might be going on? Cheers, -- Norbert.