From owner-freebsd-net@freebsd.org Mon Apr 17 13:58:55 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 371B8D4243A for ; Mon, 17 Apr 2017 13:58:55 +0000 (UTC) (envelope-from torek@torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 1072ED4B for ; Mon, 17 Apr 2017 13:58:54 +0000 (UTC) (envelope-from torek@torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTP id v3HDwmF3071624 for ; Mon, 17 Apr 2017 06:58:48 -0700 (PDT) (envelope-from torek@torek.net) Message-Id: <201704171358.v3HDwmF3071624@elf.torek.net> From: Chris Torek To: freebsd-net@freebsd.org Subject: three multicast fixes MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <71622.1492437528.1@elf.torek.net> Content-Transfer-Encoding: quoted-printable Date: Mon, 17 Apr 2017 06:58:48 -0700 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Mon, 17 Apr 2017 06:58:48 -0700 (PDT) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Apr 2017 13:58:55 -0000 The first is mostly cosmetic, it's just something I observed when I turned on multicast debug to find the bug that the second patch is for. The second patch is a kludge and if anyone has a better fix, have at it. :-) Note that it's also hard to provoke the bug. The only way I know to observe it somewhat reliably is to create and destroy bridge interfaces, using a debug kernel with memory debug enabled, while adding and deleting hardware interfaces to the bridge so that they repeatedly join and leave multicast groups. Eventually you hit the race and crash. The third is simple enough to observe. It may affect only the lagg driver, which implements SIOCDELMULTI by removing and re-adding all the multicast addresses other than the one actually deleted (which is of course really-removed by then). If running lagg on a VIMAGE kernel, leaving a multicast group causes this occasional crash: Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0x28 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff80adf6ea stack pointer =3D 0x28:0xfffffe009117e810 frame pointer =3D 0x28:0xfffffe009117e8b0 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 0 (thread taskq) rt_newmaddrmsg() at rt_newmaddrmsg+0x2a/frame 0xfffffe009117e8b0 if_addmulti() at if_addmulti+0x24c/frame 0xfffffe009117e940 lagg_ether_cmdmulti() at lagg_ether_cmdmulti+0x168/frame 0xfffffe009117e= 980 lagg_ioctl() at lagg_ioctl+0xad/frame 0xfffffe009117ea60 in_leavegroup() at in_leavegroup+0xb9/frame 0xfffffe009117eab0 inp_gcmoptions() at inp_gcmoptions+0x1a8/frame 0xfffffe009117eb20 taskqueue_run_locked() at taskqueue_run_locked+0x117/frame 0xfffffe00911= 7eb80 taskqueue_thread_loop() at taskqueue_thread_loop+0xc8/frame 0xfffffe0091= 17ebb0 fork_exit() at fork_exit+0x85/frame 0xfffffe009117ebf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009117ebf0 --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- Here, "curvnet" is NULL in the lagg_ioctl(), so that when rt_newmaddrmsg() does: if (V_route_cb.any_count =3D=3D 0) we crash with the offset of the per-vnet route_cb data. (The patches are in Git format, I applied them to the read-only freebsd master tree on GitHub.) Chris =46rom 65145b74d3dae68feddcf8a2aad235c3f0a981e9 Mon Sep 17 00:00:00 2001 From: Chris Torek Date: Mon, 8 Feb 2016 18:55:30 -0800 Subject: [PATCH 1/3] ip multicast debug: fix strings vs defines Turning on multicast debug made multicast failure worse because the strings and #define values no longer matched up. Fix them, and make sure they stay matched-up. --- sys/netinet/in_mcast.c | 31 +++++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/sys/netinet/in_mcast.c b/sys/netinet/in_mcast.c index c7fff9463128..eef560e7aca7 100644 --- a/sys/netinet/in_mcast.c +++ b/sys/netinet/in_mcast.c @@ -2920,7 +2920,14 @@ sysctl_ip_mcast_filters(SYSCTL_HANDLER_ARGS) = #if defined(KTR) && (KTR_COMPILE & KTR_IGMPV3) = -static const char *inm_modestrs[] =3D { "un", "in", "ex" }; +static const char *inm_modestrs[] =3D { + [MCAST_UNDEFINED] =3D "un", + [MCAST_INCLUDE] =3D "in", + [MCAST_EXCLUDE] =3D "ex", +}; +_Static_assert(MCAST_UNDEFINED =3D=3D 0 && + MCAST_EXCLUDE + 1 =3D=3D nitems(inm_modestrs), + "inm_modestrs: no longer matches #defines"); = static const char * inm_mode_str(const int mode) @@ -2932,16 +2939,20 @@ inm_mode_str(const int mode) } = static const char *inm_statestrs[] =3D { - "not-member", - "silent", - "idle", - "lazy", - "sleeping", - "awakening", - "query-pending", - "sg-query-pending", - "leaving" + [IGMP_NOT_MEMBER] =3D "not-member", + [IGMP_SILENT_MEMBER] =3D "silent", + [IGMP_REPORTING_MEMBER] =3D "reporting", + [IGMP_IDLE_MEMBER] =3D "idle", + [IGMP_LAZY_MEMBER] =3D "lazy", + [IGMP_SLEEPING_MEMBER] =3D "sleeping", + [IGMP_AWAKENING_MEMBER] =3D "awakening", + [IGMP_G_QUERY_PENDING_MEMBER] =3D "query-pending", + [IGMP_SG_QUERY_PENDING_MEMBER] =3D "sg-query-pending", + [IGMP_LEAVING_MEMBER] =3D "leaving", }; +_Static_assert(IGMP_NOT_MEMBER =3D=3D 0 && + IGMP_LEAVING_MEMBER + 1 =3D=3D nitems(inm_statestrs), + "inm_statetrs: no longer matches #defines"); = static const char * inm_state_str(const int state) -- = 2.12.1 =46rom 099e1b2c059fe962c1dd91af0a6735f02f611ac3 Mon Sep 17 00:00:00 2001 From: Chris Torek Date: Wed, 10 Feb 2016 18:41:39 -0800 Subject: [PATCH 2/3] ip multicast: fix use-after-free During if_detach(), we get a race where a closing socket is releasing multicast data (via inp_freemoptions()) at the same time as igmp_ifdetach() is releasing all multicast data for the interface, resulting in a potential double teardown and double free. This bug has been present since late 2011: Author: jhb Defer the work of freeing IPv4 multicast options from a socket to an asychronous task. ... It is very hard to trip over. You must create and delete interfaces (bridges that join real interfaces are good candidates) repeatedly, and even then, if M_IPMADDR (in_multi data structure) memory is not reused for something else during the race, the reference count in inm->inm_refcount is an unsigned int, so it decrements from the left-over 0 to 4294967295, avoiding a second free. Turning on the memory debug options (scribbling values over freed memory) catches the problem more quickly, though you still must create and destroy interfaces that partcipate in multicast to see it. The fix here is a kludge, but should serve until the entire network locking code and up/downcall system is reworked. --- sys/netinet/in.c | 4 ++++ sys/netinet/in_mcast.c | 11 +++++++++++ sys/netinet/ip_var.h | 1 + 3 files changed, 16 insertions(+) diff --git a/sys/netinet/in.c b/sys/netinet/in.c index e1611d50b4f0..50cfb188c442 100644 --- a/sys/netinet/in.c +++ b/sys/netinet/in.c @@ -1013,6 +1013,8 @@ in_purgemaddrs(struct ifnet *ifp) struct in_multi *inm, *tinm; struct ifmultiaddr *ifma; = + inp_freemopt_wait(); /* XXX kludge */ + LIST_INIT(&purgeinms); IN_MULTI_LOCK(); = @@ -1043,6 +1045,8 @@ in_purgemaddrs(struct ifnet *ifp) igmp_ifdetach(ifp); = IN_MULTI_UNLOCK(); + + inp_freemopt_wait(); /* XXX kludge */ } = struct in_llentry { diff --git a/sys/netinet/in_mcast.c b/sys/netinet/in_mcast.c index eef560e7aca7..1d1258701fa2 100644 --- a/sys/netinet/in_mcast.c +++ b/sys/netinet/in_mcast.c @@ -1585,6 +1585,17 @@ inp_freemoptions(struct ip_moptions *imo) taskqueue_enqueue(taskqueue_thread, &imo_gc_task); } = +/* + * Wait for inp_freemoptions() to complete any current work. + * Used because inp_freemoptions is no longer synchronous. + */ +void +inp_freemopt_wait(void) +{ + + taskqueue_drain(taskqueue_thread, &imo_gc_task); +} + static void inp_freemoptions_internal(struct ip_moptions *imo) { diff --git a/sys/netinet/ip_var.h b/sys/netinet/ip_var.h index f7e58d18635a..e51bcffc4fc2 100644 --- a/sys/netinet/ip_var.h +++ b/sys/netinet/ip_var.h @@ -200,6 +200,7 @@ extern struct pr_usrreqs rip_usrreqs; #define V_drop_redirect VNET(drop_redirect) = void inp_freemoptions(struct ip_moptions *); +void inp_freemopt_wait(void); int inp_getmoptions(struct inpcb *, struct sockopt *); int inp_setmoptions(struct inpcb *, struct sockopt *); = -- = 2.12.1 =46rom e27ff26bd086c46f2b1cce8eb95f4625cc4333fc Mon Sep 17 00:00:00 2001 From: Chris Torek Date: Sat, 5 Mar 2016 15:02:33 -0800 Subject: [PATCH 3/3] ip_multicast: defer vnet restore in in_leavegroup Note: in_leavegroup() does all its real work in in_leavegroup_locked(). For discussion here the two] functions might as well be equivalent. In in_leavegroup_locked(), when we're shedding a multicast group, we may (or may not) delete it from an interface via the igmp_change_state() call. This is where we currently set the multicast's vnet, and then restore the old vnet on return. However, a few lines later we use inm_release_locked() to release the inet multicast data structure, and that in turn may -- not necessarily will, only if the inm really is being freed -- call if_delmulti_ifma(), which may -- not necessarily will, again -- call the interface's SIOCDELMULTI ioctl (if and only if there is an interface and this was the last ref to this multicast address). For (at least) the lagg interface, we still need the current vnet to be valid during the SIOCDELMULTI. So, don't restore the old vnet until we've not only finished the IGMP code but also inm_release_locked(). --- sys/netinet/in_mcast.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sys/netinet/in_mcast.c b/sys/netinet/in_mcast.c index 1d1258701fa2..4549c952e058 100644 --- a/sys/netinet/in_mcast.c +++ b/sys/netinet/in_mcast.c @@ -1275,12 +1275,12 @@ in_leavegroup_locked(struct in_multi *inm, /*const= */ struct in_mfilter *imf) CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__); CURVNET_SET(inm->inm_ifp->if_vnet); error =3D igmp_change_state(inm); - CURVNET_RESTORE(); if (error) CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__); = CTR2(KTR_IGMPV3, "%s: dropping ref on %p", __func__, inm); inm_release_locked(inm); + CURVNET_RESTORE(); = return (error); } -- = 2.12.1