From owner-freebsd-net@FreeBSD.ORG Thu Jul 14 19:25:08 2011 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 736E4106566B for ; Thu, 14 Jul 2011 19:25:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id BC3208FC0A for ; Thu, 14 Jul 2011 19:25:07 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p6EIp0ux068669 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 14 Jul 2011 21:51:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p6EIp0Qi012397; Thu, 14 Jul 2011 21:51:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p6EIp0Yg012396; Thu, 14 Jul 2011 21:51:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 14 Jul 2011 21:51:00 +0300 From: Kostik Belousov To: Gleb Smirnoff Message-ID: <20110714185100.GC43872@deviant.kiev.zoral.com.ua> References: <20110714154457.GI70776@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="dmVtbNsdrXqYZXhl" Content-Disposition: inline In-Reply-To: <20110714154457.GI70776@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: gnn@freebsd.org, bz@freebsd.org, rwatson@freebsd.org, net@freebsd.org Subject: Re: m_pkthdr.rcvif dangling pointer problem X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jul 2011 19:25:08 -0000 --dmVtbNsdrXqYZXhl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 14, 2011 at 07:44:57PM +0400, Gleb Smirnoff wrote: > Hi! >=20 > This problem is definitely known and is as old as network stack > is parallel. Those, who know the problem may skip to next paragraph. > Short description is the following: every mbuf that is allocated in > an interface receive procedure has a field where a pointer to the > corresponding struct ifnet is stored. Everything is okay, until you > are running a box, where configuration of interfaces is changing > rapidly, for example a PPP access concentrator. Utilizing any facility > that can delay packet processing for example netgraph(4) or dummynet(4) > would make probability of crash much bigger. Running INVARIANTS kernel > would crash a box with dummynet and disappearing interfaces in a few > minutes. >=20 > I see three approaches to this problem: >=20 > 1) Every m_pkthdr.rcvif should if_ref() the interface. Releasing > references can be done in the mbuf allocator: mb_dtor_mbuf(), > mb_dtor_pack(). I'm afraid this approach would degrate performance, > since adding at least a couple of atomic ops on every mbuf for its > lifetime. That is +2 atomic ops per packet. >=20 > 2) kib@ suggested to allocate ifnets from a UMA_ZONE_NOFREE zone. > I've made a compilable & working patch: >=20 > http://people.freebsd.org/~glebius/patches/ifnet.no_free >=20 > But on second though I find this a bad idea, this is just fooling > of INVARIANTS. Yes, we avoid thrashing of freed memory and rewriting > it by some other kernel allocation. But still out pointer point to > invalid ifnet. Even, if we make a check for IFF_DYING flag, we still > can not guarantee that an interface had been re-allocated for a new > instance. This would be not a panic condition, but subtle bugs in > firewalls. This is not quite what I suggested (not looking at the patch). I also noted that some measures to prevent the dereference of outdated pointers should be taken. We discussed the generation counter for the ifnets, and I pointed out that m_hdr has at least two MI scratch bytes. Also, on amd64, there seems to be one spare word in pkthdr dur to alignment. >=20 > 3) As we now have a straight if_index table that can grow, what about > storing the if_index in the m_pkthdr? Lookup of interface by index > is fast enough if done lockless. Doing it lockless isn't perfect, but > better than current pointer dereferncing. Optionally it could be > done with locking and with putting a reference. To avoid situation > with with getting to a re-allocated interface with the same index, > we can use a unique cookie, that is incremented in if_alloc(). > Smth like: >=20 > struct ifnet *=20 > mbuf_rcvif(struct mbuf *m) > { > struct ifnet *ifp; > =20 > M_ASSERTPKTHDR(m); >=20 > if (m->m_pkthdr.rcvif_idx =3D=3D 0) > return (NULL); >=20 > ifp =3D ifnet_byindex_locked(m->m_pkthdr.rcvif_idx); > if (ifp =3D=3D NULL) > return (NULL); > if (ifp->if_unique !=3D m->m_pkthdr.rcvif_unique) > return (NULL); >=20 > return (ifp); =20 > } >=20 > struct ifnet * > mbuf_rcvif_ref(struct mbuf *m) > { > struct ifnet *ifp; >=20 > M_ASSERTPKTHDR(m); >=20 > if (m->m_pkthdr.rcvif_idx =3D=3D 0) > return (NULL); >=20 > ifp =3D ifnet_byindex_ref(m->m_pkthdr.rcvif_idx); > if (ifp =3D=3D NULL) > return (NULL); > if (ifp->if_unique !=3D m->m_pkthdr.rcvif_unique) { > if_rele(ifp); > return (NULL); > } >=20 > return (ifp); > } >=20 > The size of struct mbuf isn't going to change on amd64: >=20 > @@ -111,7 +111,8 @@ > * Record/packet header in first mbuf of chain; valid only if M_PKTHDR i= s set. > */ > struct pkthdr { > - struct ifnet *rcvif; /* rcv interface */ > + uint32_t rcvif_idx; /* rcv interface index */ > + uint32_t rcvif_unique; /* rcv interface unique id */ >=20 > A proof-of-concept patch is available here: >=20 > http://people.freebsd.org/~glebius/patches/pkthdr.rcvif_idx >=20 > It doesn't cover entire kernel, LINT won't compile. It covers kern, netin= et, > netinet6, several interface drivers and netgraph. >=20 > One of the problems is ongoing namespace pollution: not all code that inc= lude > mbuf.h includes if_var.h and vice versa. I've cut ifnet from m_devget(), = but > my new function do declare it in parameter list :(. To deal with that I h= ad > to declare functions in mbuf.h but implement them in if.c. >=20 > Other problem is net80211 code that abuses the rcvif pointer in mbuf pack= et > header for its own purposes casting it to private type. This can be fixed > utilizing mbuf_tags(9), I think. I haven't made this yet, that's why LINT > doesn't compile. >=20 > Comments are requested! >=20 > Any alternative ideas on solving the problem are welcome! >=20 > --=20 > Totus tuus, Glebius. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" --dmVtbNsdrXqYZXhl Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4fOpQACgkQC3+MBN1Mb4hp9ACgslS4Th8L+s8f8gloH/F2sxG0 Y8MAn0JDKZ6d64Sh0HuBHltdMBS2kcza =a3yG -----END PGP SIGNATURE----- --dmVtbNsdrXqYZXhl--