From owner-freebsd-net@freebsd.org  Mon Apr 17 13:58:55 2017
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 371B8D4243A
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Mon, 17 Apr 2017 13:58:55 +0000 (UTC) (envelope-from torek@torek.net)
Received: from elf.torek.net (mail.torek.net [96.90.199.121])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 1072ED4B
 for <freebsd-net@freebsd.org>; Mon, 17 Apr 2017 13:58:54 +0000 (UTC)
 (envelope-from torek@torek.net)
Received: from elf.torek.net (localhost [127.0.0.1])
 by elf.torek.net (8.15.2/8.15.2) with ESMTP id v3HDwmF3071624
 for <freebsd-net@freebsd.org>; Mon, 17 Apr 2017 06:58:48 -0700 (PDT)
 (envelope-from torek@torek.net)
Message-Id: <201704171358.v3HDwmF3071624@elf.torek.net>
From: Chris Torek <torek@torek.net>
To: freebsd-net@freebsd.org
Subject: three multicast fixes
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <71622.1492437528.1@elf.torek.net>
Content-Transfer-Encoding: quoted-printable
Date: Mon, 17 Apr 2017 06:58:48 -0700
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2
 (elf.torek.net [127.0.0.1]); Mon, 17 Apr 2017 06:58:48 -0700 (PDT)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Apr 2017 13:58:55 -0000

The first is mostly cosmetic, it's just something I observed when
I turned on multicast debug to find the bug that the second patch
is for.

The second patch is a kludge and if anyone has a better fix, have
at it.  :-) Note that it's also hard to provoke the bug.  The only
way I know to observe it somewhat reliably is to create and
destroy bridge interfaces, using a debug kernel with memory debug
enabled, while adding and deleting hardware interfaces to the
bridge so that they repeatedly join and leave multicast groups.
Eventually you hit the race and crash.

The third is simple enough to observe.  It may affect only the lagg
driver, which implements SIOCDELMULTI by removing and re-adding all
the multicast addresses other than the one actually deleted (which
is of course really-removed by then).  If running lagg on a VIMAGE
kernel, leaving a multicast group causes this occasional crash:

  Fatal trap 12: page fault while in kernel mode
  cpuid =3D 0; apic id =3D 00
  fault virtual address   =3D 0x28
  fault code              =3D supervisor read data, page not present
  instruction pointer     =3D 0x20:0xffffffff80adf6ea
  stack pointer           =3D 0x28:0xfffffe009117e810
  frame pointer           =3D 0x28:0xfffffe009117e8b0
  code segment            =3D base 0x0, limit 0xfffff, type 0x1b
			  =3D DPL 0, pres 1, long 1, def32 0, gran 1
  processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
  current process         =3D 0 (thread taskq)

  rt_newmaddrmsg() at rt_newmaddrmsg+0x2a/frame 0xfffffe009117e8b0
  if_addmulti() at if_addmulti+0x24c/frame 0xfffffe009117e940
  lagg_ether_cmdmulti() at lagg_ether_cmdmulti+0x168/frame 0xfffffe009117e=
980
  lagg_ioctl() at lagg_ioctl+0xad/frame 0xfffffe009117ea60
  in_leavegroup() at in_leavegroup+0xb9/frame 0xfffffe009117eab0
  inp_gcmoptions() at inp_gcmoptions+0x1a8/frame 0xfffffe009117eb20
  taskqueue_run_locked() at taskqueue_run_locked+0x117/frame 0xfffffe00911=
7eb80
  taskqueue_thread_loop() at taskqueue_thread_loop+0xc8/frame 0xfffffe0091=
17ebb0
  fork_exit() at fork_exit+0x85/frame 0xfffffe009117ebf0
  fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009117ebf0
  --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---

Here, "curvnet" is NULL in the lagg_ioctl(), so that when
rt_newmaddrmsg() does:

	if (V_route_cb.any_count =3D=3D 0)

we crash with the offset of the per-vnet route_cb data.

(The patches are in Git format, I applied them to the read-only
freebsd master tree on GitHub.)

Chris

=46rom 65145b74d3dae68feddcf8a2aad235c3f0a981e9 Mon Sep 17 00:00:00 2001
From: Chris Torek <chris.torek@gmail.com>
Date: Mon, 8 Feb 2016 18:55:30 -0800
Subject: [PATCH 1/3] ip multicast debug: fix strings vs defines

Turning on multicast debug made multicast failure worse
because the strings and #define values no longer matched
up.  Fix them, and make sure they stay matched-up.
---
 sys/netinet/in_mcast.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/sys/netinet/in_mcast.c b/sys/netinet/in_mcast.c
index c7fff9463128..eef560e7aca7 100644
--- a/sys/netinet/in_mcast.c
+++ b/sys/netinet/in_mcast.c
@@ -2920,7 +2920,14 @@ sysctl_ip_mcast_filters(SYSCTL_HANDLER_ARGS)
 =

 #if defined(KTR) && (KTR_COMPILE & KTR_IGMPV3)
 =

-static const char *inm_modestrs[] =3D { "un", "in", "ex" };
+static const char *inm_modestrs[] =3D {
+	[MCAST_UNDEFINED] =3D "un",
+	[MCAST_INCLUDE] =3D "in",
+	[MCAST_EXCLUDE] =3D "ex",
+};
+_Static_assert(MCAST_UNDEFINED =3D=3D 0 &&
+	       MCAST_EXCLUDE + 1 =3D=3D nitems(inm_modestrs),
+	       "inm_modestrs: no longer matches #defines");
 =

 static const char *
 inm_mode_str(const int mode)
@@ -2932,16 +2939,20 @@ inm_mode_str(const int mode)
 }
 =

 static const char *inm_statestrs[] =3D {
-	"not-member",
-	"silent",
-	"idle",
-	"lazy",
-	"sleeping",
-	"awakening",
-	"query-pending",
-	"sg-query-pending",
-	"leaving"
+	[IGMP_NOT_MEMBER] =3D "not-member",
+	[IGMP_SILENT_MEMBER] =3D "silent",
+	[IGMP_REPORTING_MEMBER] =3D "reporting",
+	[IGMP_IDLE_MEMBER] =3D "idle",
+	[IGMP_LAZY_MEMBER] =3D "lazy",
+	[IGMP_SLEEPING_MEMBER] =3D "sleeping",
+	[IGMP_AWAKENING_MEMBER] =3D "awakening",
+	[IGMP_G_QUERY_PENDING_MEMBER] =3D "query-pending",
+	[IGMP_SG_QUERY_PENDING_MEMBER] =3D "sg-query-pending",
+	[IGMP_LEAVING_MEMBER] =3D "leaving",
 };
+_Static_assert(IGMP_NOT_MEMBER =3D=3D 0 &&
+	       IGMP_LEAVING_MEMBER + 1 =3D=3D nitems(inm_statestrs),
+	       "inm_statetrs: no longer matches #defines");
 =

 static const char *
 inm_state_str(const int state)
-- =

2.12.1


=46rom 099e1b2c059fe962c1dd91af0a6735f02f611ac3 Mon Sep 17 00:00:00 2001
From: Chris Torek <torek@ixsystems.com>
Date: Wed, 10 Feb 2016 18:41:39 -0800
Subject: [PATCH 2/3] ip multicast: fix use-after-free

During if_detach(), we get a race where a closing socket is
releasing multicast data (via inp_freemoptions()) at the same
time as igmp_ifdetach() is releasing all multicast data for
the interface, resulting in a potential double teardown and
double free.

This bug has been present since late 2011:

    Author: jhb <jhb@FreeBSD.org>

    Defer the work of freeing IPv4 multicast options from a
    socket to an asychronous task. ...

It is very hard to trip over.  You must create and delete
interfaces (bridges that join real interfaces are good
candidates) repeatedly, and even then, if M_IPMADDR (in_multi
data structure) memory is not reused for something else during
the race, the reference count in inm->inm_refcount is an
unsigned int, so it decrements from the left-over 0 to
4294967295, avoiding a second free.

Turning on the memory debug options (scribbling values over
freed memory) catches the problem more quickly, though you
still must create and destroy interfaces that partcipate in
multicast to see it.

The fix here is a kludge, but should serve until the entire
network locking code and up/downcall system is reworked.
---
 sys/netinet/in.c       |  4 ++++
 sys/netinet/in_mcast.c | 11 +++++++++++
 sys/netinet/ip_var.h   |  1 +
 3 files changed, 16 insertions(+)

diff --git a/sys/netinet/in.c b/sys/netinet/in.c
index e1611d50b4f0..50cfb188c442 100644
--- a/sys/netinet/in.c
+++ b/sys/netinet/in.c
@@ -1013,6 +1013,8 @@ in_purgemaddrs(struct ifnet *ifp)
 	struct in_multi		*inm, *tinm;
 	struct ifmultiaddr	*ifma;
 =

+	inp_freemopt_wait();	/* XXX kludge */
+
 	LIST_INIT(&purgeinms);
 	IN_MULTI_LOCK();
 =

@@ -1043,6 +1045,8 @@ in_purgemaddrs(struct ifnet *ifp)
 	igmp_ifdetach(ifp);
 =

 	IN_MULTI_UNLOCK();
+
+	inp_freemopt_wait();	/* XXX kludge */
 }
 =

 struct in_llentry {
diff --git a/sys/netinet/in_mcast.c b/sys/netinet/in_mcast.c
index eef560e7aca7..1d1258701fa2 100644
--- a/sys/netinet/in_mcast.c
+++ b/sys/netinet/in_mcast.c
@@ -1585,6 +1585,17 @@ inp_freemoptions(struct ip_moptions *imo)
 	taskqueue_enqueue(taskqueue_thread, &imo_gc_task);
 }
 =

+/*
+ * Wait for inp_freemoptions() to complete any current work.
+ * Used because inp_freemoptions is no longer synchronous.
+ */
+void
+inp_freemopt_wait(void)
+{
+
+	taskqueue_drain(taskqueue_thread, &imo_gc_task);
+}
+
 static void
 inp_freemoptions_internal(struct ip_moptions *imo)
 {
diff --git a/sys/netinet/ip_var.h b/sys/netinet/ip_var.h
index f7e58d18635a..e51bcffc4fc2 100644
--- a/sys/netinet/ip_var.h
+++ b/sys/netinet/ip_var.h
@@ -200,6 +200,7 @@ extern struct	pr_usrreqs rip_usrreqs;
 #define	V_drop_redirect		VNET(drop_redirect)
 =

 void	inp_freemoptions(struct ip_moptions *);
+void	inp_freemopt_wait(void);
 int	inp_getmoptions(struct inpcb *, struct sockopt *);
 int	inp_setmoptions(struct inpcb *, struct sockopt *);
 =

-- =

2.12.1


=46rom e27ff26bd086c46f2b1cce8eb95f4625cc4333fc Mon Sep 17 00:00:00 2001
From: Chris Torek <torek@ixsystems.com>
Date: Sat, 5 Mar 2016 15:02:33 -0800
Subject: [PATCH 3/3] ip_multicast: defer vnet restore in in_leavegroup

Note: in_leavegroup() does all its real work in
in_leavegroup_locked().  For discussion here the two]
functions might as well be equivalent.

In in_leavegroup_locked(), when we're shedding a multicast
group, we may (or may not) delete it from an interface via
the igmp_change_state() call.  This is where we currently
set the multicast's vnet, and then restore the old vnet on
return.

However, a few lines later we use inm_release_locked() to
release the inet multicast data structure, and that in turn
may -- not necessarily will, only if the inm really is being
freed -- call if_delmulti_ifma(), which may -- not necessarily
will, again -- call the interface's SIOCDELMULTI ioctl
(if and only if there is an interface and this was the last
ref to this multicast address).

For (at least) the lagg interface, we still need the current
vnet to be valid during the SIOCDELMULTI.  So, don't restore
the old vnet until we've not only finished the IGMP code but
also inm_release_locked().
---
 sys/netinet/in_mcast.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sys/netinet/in_mcast.c b/sys/netinet/in_mcast.c
index 1d1258701fa2..4549c952e058 100644
--- a/sys/netinet/in_mcast.c
+++ b/sys/netinet/in_mcast.c
@@ -1275,12 +1275,12 @@ in_leavegroup_locked(struct in_multi *inm, /*const=
*/ struct in_mfilter *imf)
 	CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__);
 	CURVNET_SET(inm->inm_ifp->if_vnet);
 	error =3D igmp_change_state(inm);
-	CURVNET_RESTORE();
 	if (error)
 		CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__);
 =

 	CTR2(KTR_IGMPV3, "%s: dropping ref on %p", __func__, inm);
 	inm_release_locked(inm);
+	CURVNET_RESTORE();
 =

 	return (error);
 }
-- =

2.12.1