From owner-freebsd-net@FreeBSD.ORG Sat Mar 11 02:03:00 2006 Return-Path: X-Original-To: net@FreeBSD.org Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2B5C016ACF4; Sat, 11 Mar 2006 02:02:53 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 305AC4A799; Fri, 10 Mar 2006 21:31:59 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id A46621A3C33; Fri, 10 Mar 2006 13:31:53 -0800 (PST) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 04E8451965; Fri, 10 Mar 2006 16:31:50 -0500 (EST) Date: Fri, 10 Mar 2006 16:31:50 -0500 From: Kris Kennaway To: Kris Kennaway Message-ID: <20060310213149.GA33672@xor.obsecurity.org> References: <20050927222721.GA46411@xor.obsecurity.org> <20051001214002.GU45345@cell.sick.ru> <20051005173837.GA36638@xor.obsecurity.org> <20051005174012.GB36638@xor.obsecurity.org> <20060306231556.GA54600@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Qxx1br4bt0+wmkIi" Content-Disposition: inline In-Reply-To: <20060306231556.GA54600@xor.obsecurity.org> User-Agent: Mutt/1.4.2.1i Cc: gnn@freebsd.org, Hajimu UMEMOTO , net@FreeBSD.org Subject: Re: ipv6 panic in 6.0 ([kris@FreeBSD.org: kern/85780: 'panic: bogus refcnt 0' in routing/ipv6]) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Mar 2006 02:03:00 -0000 --Qxx1br4bt0+wmkIi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Mar 06, 2006 at 06:15:56PM -0500, Kris Kennaway wrote: > I've been adding KTR debugging to try and track down the cause of this > recurring problem (FYI: debug.mpsafenet=3D0 is no longer working around > it). To refresh your memory, here is the panic: >=20 > db> wh > Tracing pid 24 tid 100012 td 0xfffff802be9fa560 > panic() at panic+0x164 > rtfree() at rtfree+0xb4 > nd6_na_output() at nd6_na_output+0x540 > nd6_ns_input() at nd6_ns_input+0x738 > icmp6_input() at icmp6_input+0xc38 > ip6_input() at ip6_input+0x1038 > netisr_processqueue() at netisr_processqueue+0x7c > swi_net() at swi_net+0xdc > ithread_execute_handlers() at ithread_execute_handlers+0x144 > ithread_loop() at ithread_loop+0xa4 > fork_exit() at fork_exit+0x94 > fork_trampoline() at fork_trampoline+0x8 > db> >=20 > It's always in nd6_na_output() although the trace beyond this point > varies. However that doesn't tell us what leaked the reference count > prior to this stack trace. >=20 > So far I have narrowed it down to: Here is a better trace (in chronological order): 4431 (0xfffff803fe9f1ae0:cpu0) 16217304555013 netinet6/nd6_nbr.c.461: in6_s= electsrc 0xe2e0b380 nd6_ns_output(): src =3D in6_selectsrc(&dst_sa, NULL, NULL, &ro, NULL, NULL, &error); 4432 (0xfffff803fe9f1ae0:cpu0) 16217304555999 netinet6/in6_src.c.241: in6_s= electif 0xe2e0b380 in6_selectsrc(): /* * If the address is not specified, choose the best one based on * the outgoing interface and the destination address. */ /* get the outgoing interface */ if ((*errorp =3D in6_selectif(dstsock, opts, mopts, ro, &ifp)) !=3D= 0) return (NULL); in6_selectif() calls selectroute(): if ((error =3D selectroute(dstsock, opts, mopts, ro, retifp, &rt, 0, 1)) !=3D 0) { 4433 (0xfffff803fe9f1ae0:cpu0) 16217304558555 net/route.c.198: Adding ref 0= 0xfffff8032240dd10 4434 (0xfffff803fe9f1ae0:cpu0) 16217304559191 netinet6/in6_src.c.579: rtall= oc1 0xfffff8032240dd10 This rtalloc1() was called from selectroute(): if (ro->ro_rt =3D=3D (struct rtentry *)NULL) { struct sockaddr_in6 *sa6; /* No route yet, so try to acquire one */ bzero(&ro->ro_dst, sizeof(struct sockaddr_in6)); sa6 =3D (struct sockaddr_in6 *)&ro->ro_dst; *sa6 =3D *dstsock; sa6->sin6_scope_id =3D 0; if (clone) { rtalloc((struct route *)ro); } else { ro->ro_rt =3D rtalloc1(&((struct route *)ro) ->ro_dst, 0, 0UL); 4435 (0xfffff803fe9f1ae0:cpu0) 16217304560255 netinet6/in6_src.c.706: rtfre= e 0xfffff8032240dd10 4436 (0xfffff803fe9f1ae0:cpu0) 16217304560951 net/route.c.247: Removing ref= 1 0xfffff8032240dd10 We are now back at the end of in6_selectif(): if (rt && rt =3D=3D sro.ro_rt) RTFREE(rt); return (0); 4437 (0xfffff803fe9f1ae0:cpu0) 16217304590486 netinet6/nd6_nbr.c.534: 1 Fre= eing route 0xfffff8032240dd10 with ref 0 We are now back in nd6_ns_output() if (ro.ro_rt) { /* we don't cache this route. */ RTFREE(ro.ro_rt); } return; 4438 (0xfffff803fe9f1ae0:cpu0) 16217417726681 net/route.c.247: Removing ref= 0 0xfffff8032240dd10 and explode because we've freed the same route twice in a row when it only had a refcount of 1 to begin with. I suspect the control flow in nd6_ns_output() is broken. Kris --Qxx1br4bt0+wmkIi Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (FreeBSD) iD8DBQFEEfBFWry0BWjoQKURAtXgAKCjzfb01/qNcAv/G78YBmiwaqHkEACaAmFC 37puC2vLGtZqEBxNM9RNeeE= =LoTe -----END PGP SIGNATURE----- --Qxx1br4bt0+wmkIi--