From owner-freebsd-net@FreeBSD.ORG Wed Sep 26 15:00:49 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC0D516A418 for ; Wed, 26 Sep 2007 15:00:49 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 3663F13C459 for ; Wed, 26 Sep 2007 15:00:48 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from root by ciao.gmane.org with local (Exim 4.43) id 1IaWUk-0005Xl-Rh for freebsd-net@freebsd.org; Wed, 26 Sep 2007 13:00:02 +0000 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 26 Sep 2007 13:00:02 +0000 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 26 Sep 2007 13:00:02 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Ivan Voras Date: Wed, 26 Sep 2007 12:45:37 +0200 Lines: 156 Message-ID: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-ripemd160; protocol="application/pgp-signature"; boundary="------------enigFB2919C7D08C45C9CD742FC6" X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 1.5.0.12 (X11/20060911) X-Enigmail-Version: 0.94.4.0 Sender: news Subject: Panic in rt_check X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Sep 2007 15:00:49 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigFB2919C7D08C45C9CD742FC6 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Hi, I have a machine that panics almost daily in route.c, in rt_check().=20 This panic has been reported by several users, including Marcel=20 Moolenaar for a machine in freebsd.org. The problem is present in both 6-STABLE and 7-CURRENT, and apparently it = manifests on SMP machines, both i386 and AMD64. The panic backtrace looks like this: panic: mtx_lock() of destroyed mutex @ /usr/src/sys/net/route.c:1305 cpuid =3D 1 KDB: stack backtrace: db_trace_self_wrapper(c091bcf0,e38b690c,c0659fc1,c093f3cf,1,...) at=20 db_trace_self_wrapper+0x26 kdb_backtrace(c093f3cf,1,c0917de2,e38b6918,1,...) at kdb_backtrace+0x29 panic(c0917de2,c0925d40,519,0,0,...) at panic+0x111 _mtx_lock_flags(c5d333a8,0,c0925d40,519,0,...) at _mtx_lock_flags+0x59 rt_check(e38b6970,e38b698c,c55b7d10,0,0,...) at rt_check+0x11e arpresolve(c4e27000,c5d33d98,c50dbe00,c55b7d10,e38b69a6,...) at=20 arpresolve+0xaf ether_output(c4e27000,c50dbe00,c55b7d10,c5d33d98,ccf8b348,...) at=20 ether_output+0x7e ip_output(c50dbe00,0,e38b6a1c,0,0,...) at ip_output+0xa09 tcp_output(ccefbac8,0,c0929785,91d,0,...) at tcp_output+0x1463 tcp_do_segment(ccefbac8,28,0,1dd,901f,...) at tcp_do_segment+0x1c97 tcp_input(c6095100,14,c4ea3c00,1,0,...) at tcp_input+0xd5e ip_input(c6095100,0,c09258bd,8c,c09efc38,...) at ip_input+0x662 netisr_processqueue(e38b6cc4,c064df85,c09eb940,1,c4d03480,...) at=20 netisr_processqueue+0x98 swi_net(0,0,c0915aee,471,c4d0bd64,...) at swi_net+0xdb ithread_loop(c4d0c270,e38b6d38,c0915862,315,c4d56558,...) at=20 ithread_loop+0x1c5 fork_exit(c063e2d0,c4d0c270,e38b6d38) at fork_exit+0xc5 fork_trampoline() at fork_trampoline+0x8 =2E.. #0 doadump () at pcpu.h:195 195 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xc0659d2c in boot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c= :409 #2 0xc0659ff0 in panic (fmt=3DVariable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0xc064e699 in _mtx_lock_flags (m=3D0x0, opts=3D0, file=3D0xc0925d40=20 "/usr/src/sys/net/route.c", line=3D1305) at /usr/src/sys/kern/kern_mutex.c:178 #4 0xc06fe28e in rt_check (lrt=3D0xe38b6970, lrt0=3D0xe38b698c,=20 dst=3D0xc55b7d10) at /usr/src/sys/net/route.c:1305 #5 0xc070282f in arpresolve (ifp=3D0xc4e27000, rt0=3D0xc5d33d98,=20 m=3D0xc50dbe00, dst=3D0xc55b7d10, desten=3D0xe38b69a6 "") at /usr/src/sys/netinet/if_ether.c:373 #6 0xc06f019e in ether_output (ifp=3D0xc4e27000, m=3D0xc50dbe00,=20 dst=3D0xc55b7d10, rt0=3D0xc5d33d98) at /usr/src/sys/net/if_ethersubr.c:17= 5 #7 0xc07127a9 in ip_output (m=3D0xc50dbe00, opt=3D0x0, ro=3D0xe38b6a1c, = flags=3DVariable "flags" is not available. ) at /usr/src/sys/netinet/ip_output.c:547 #8 0xc076d6e3 in tcp_output (tp=3D0xccefbac8) at=20 /usr/src/sys/netinet/tcp_output.c:1125 #9 0xc076ab87 in tcp_do_segment (m=3D0xc6095100, th=3D0xc6095158,=20 so=3D0xccdb67bc, tp=3D0xccefbac8, drop_hdrlen=3D40, tlen=3D0) at /usr/src/sys/netinet/tcp_input.c:2345 #10 0xc076bb0e in tcp_input (m=3D0xc6095100, off0=3D20) at=20 /usr/src/sys/netinet/tcp_input.c:843 #11 0xc0710c42 in ip_input (m=3D0xc6095100) at=20 /usr/src/sys/netinet/ip_input.c:663 #12 0xc06f9148 in netisr_processqueue (ni=3D0xc09efc38) at=20 /usr/src/sys/net/netisr.c:143 #13 0xc06f925b in swi_net (dummy=3D0x0) at /usr/src/sys/net/netisr.c:256 #14 0xc063e495 in ithread_loop (arg=3D0xc4d0c270) at=20 /usr/src/sys/kern/kern_intr.c:1036 #15 0xc063b845 in fork_exit (callout=3D0xc063e2d0 ,=20 arg=3D0xc4d0c270, frame=3D0xe38b6d38) at /usr/src/sys/kern/kern_fork.c:79= 7 #16 0xc0896f80 in fork_trampoline () at=20 /usr/src/sys/i386/i386/exception.s:205 I've been trying to solve this with Craig Rodrigues, and I've tried=20 several patches, without success. The backtrace above happens on the=20 following code from net/route.c: 1299 /* XXX BSD/OS checks dst->sa_family !=3D AF_NS */ 1300 if (rt->rt_flags & RTF_GATEWAY) { 1301 struct rtentry *temp_rt_gwroute =3D rt->rt_gwroute; 1302 if (temp_rt_gwroute =3D=3D NULL) 1303 goto lookup; 1304 rt =3D rt->rt_gwroute; 1305 RT_LOCK(rt); /* NB: gwroute */ 1306 if(rt0->rt_flags & 0x80000000U){ 1307 /*This rt is under process...*/ 1308 RT_UNLOCK(rt); 1309 RT_UNLOCK(rt0); 1310 goto try_again; 1311 } 1312 if ((rt->rt_flags & RTF_UP) =3D=3D 0) { 1313 rt0->rt_flags |=3D 0x80000000U; 1314 RTFREE_LOCKED(rt); /* unlock gwroute */ 1315 rt =3D rt0; 1316 lookup: 1317 RT_UNLOCK(rt0); 1318 rt =3D rtalloc1(rt->rt_gateway, 1, 0UL); 1319 if (rt =3D=3D rt0) { 1320 rt0->rt_gwroute =3D NULL; 1321 RT_REMREF(rt0); 1322 RT_UNLOCK(rt0); 1323 return (ENETUNREACH); 1324 } 1325 RT_LOCK(rt0); 1326 rt0->rt_gwroute =3D rt; 1327 rt0->rt_flags &=3D (~0x80000000U); 1328 if (rt =3D=3D NULL) { 1329 RT_UNLOCK(rt0); 1330 return (EHOSTUNREACH); 1331 } 1332 } 1333 RT_UNLOCK(rt0); 1334 } This code contains several patches we tried for workarounds, without any = success. The panic is always in RT_LOCK(rt) line: sometimes it's NULL=20 pointer reference, sometimes it's an operation on destroyed mutex. This is a critical problem for me, but I believe it's also critical for=20 other users. Does anyone have more ideas about how to solve this problem? --------------enigFB2919C7D08C45C9CD742FC6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFG+jhXldnAQVacBcgRA3zPAKC280XwOEosXgEFzMDgpdPysmovUACdG91H 3agosedq2jMCJfvPaBZ4eP0= =OpkH -----END PGP SIGNATURE----- --------------enigFB2919C7D08C45C9CD742FC6--