Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 May 2017 09:34:52 +0100
From:      Roger Pau =?iso-8859-1?Q?Monn=E9?= <royger@FreeBSD.org>
To:        Colin Percival <cperciva@tarsnap.com>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r301198 - head/sys/dev/xen/netfront
Message-ID:  <20170510083452.tmhumxatf356miv6@dhcp-3-128.uk.xensource.com>
In-Reply-To: <0100015beeed2f38-6246d7db-7f23-4b8a-ba50-b97ec0953457-000000@email.amazonses.com>
References:  <201606021116.u52BGajD047287@repo.freebsd.org> <0100015bccba6abc-4c3b1487-25e3-4640-8221-885341ece829-000000@email.amazonses.com> <20170509100912.h3ylwugahvfi5cc2@dhcp-3-128.uk.xensource.com> <0100015beeed2f38-6246d7db-7f23-4b8a-ba50-b97ec0953457-000000@email.amazonses.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, May 09, 2017 at 08:36:13PM +0000, Colin Percival wrote:
> On 05/09/17 03:09, Roger Pau Monn� wrote:
> > On Wed, May 03, 2017 at 05:13:40AM +0000, Colin Percival wrote:
> >> On 06/02/16 04:16, Roger Pau Monn� wrote:
> >>> Author: royger
> >>> Date: Thu Jun  2 11:16:35 2016
> >>> New Revision: 301198
> >>> URL: https://svnweb.freebsd.org/changeset/base/301198
> >>
> >> I think this commit is responsible for panics I'm seeing in EC2 on T2 family
> >> instances.  [...]
> >> but under high traffic volumes I think a separate thread can already be
> >> running in xn_rxeof, having dropped the RX lock while it passes a packet
> >> up the stack.  This would result in two different threads trying to process
> >> the same set of responses from the ring, with (unsurprisingly) bad results.
> > 
> > Hm, right, xn_rxeof drops the lock while pushing the packet up the stack.
> > There's a "XXX" comment on top of that, could you try to remove the lock
> > dripping and see what happens?
> > 
> > I'm not sure there's any reason to drop the lock here, I very much doubt
> > if_input is going to sleep.
> 
> Judging by
> $ grep -R -B 1 -A 1 if_input /usr/src/sys/dev
> I'm pretty sure that we do indeed need to drop the lock.  If it's possible
> to enter if_input while holding locks, there are a *lot* of network interface
> drivers which are dropping locks unnecessarily...
> 
> >> 3. Why xn_ifinit_locked is consuming ring responses.
> > 
> > There might be pending RX packets on the ring, so netfront consumes them and
> > signals netback. In the unlikely event that the RX ring was full when
> > xn_ifinit_locked is called, not doing this would mean the RX queue would get
> > stuck forever, since there's no guarantee netfront will receive event channel
> > notifications.
> 
> In that case, I'm guessing it would be safe to skip this if another thread is
> already running xn_rxeof and chewing through the packets on the ring?  It
> would be easy to set a flag in xn_rxeof before we drop locks.

Hm, I think it would be better to fix xn_rxeof so that netfront doesn't drop
the lock while poking at the ring, what about the following patch?

It should also prevent needlessly dropping and acquiring the lock for each
packet netfront pushes up the stack.

NB: I've only compile-tested this.

---8<---
diff --git a/sys/dev/xen/netfront/netfront.c b/sys/dev/xen/netfront/netfront.c
index 99ccdaaf5000..482b9e948fde 100644
--- a/sys/dev/xen/netfront/netfront.c
+++ b/sys/dev/xen/netfront/netfront.c
@@ -1161,17 +1161,18 @@ xn_rxeof(struct netfront_rxq *rxq)
 	struct mbufq mbufq_rxq, mbufq_errq;
 	int err, work_to_do;
 
-	do {
-		XN_RX_LOCK_ASSERT(rxq);
-		if (!netfront_carrier_ok(np))
-			return;
+	XN_RX_LOCK_ASSERT(rxq);
+
+	if (!netfront_carrier_ok(np))
+		return;
 
-		/* XXX: there should be some sane limit. */
-		mbufq_init(&mbufq_errq, INT_MAX);
-		mbufq_init(&mbufq_rxq, INT_MAX);
+	/* XXX: there should be some sane limit. */
+	mbufq_init(&mbufq_errq, INT_MAX);
+	mbufq_init(&mbufq_rxq, INT_MAX);
 
-		ifp = np->xn_ifp;
+	ifp = np->xn_ifp;
 
+	do {
 		rp = rxq->ring.sring->rsp_prod;
 		rmb();	/* Ensure we see queued responses up to 'rp'. */
 
@@ -1191,7 +1192,7 @@ xn_rxeof(struct netfront_rxq *rxq)
 			}
 
 			m->m_pkthdr.rcvif = ifp;
-			if ( rx->flags & NETRXF_data_validated ) {
+			if (rx->flags & NETRXF_data_validated) {
 				/*
 				 * According to mbuf(9) the correct way to tell
 				 * the stack that the checksum of an inbound
@@ -1214,50 +1215,45 @@ xn_rxeof(struct netfront_rxq *rxq)
 			}
 
 			(void )mbufq_enqueue(&mbufq_rxq, m);
-			rxq->ring.rsp_cons = i;
 		}
 
-		mbufq_drain(&mbufq_errq);
+		rxq->ring.rsp_cons = i;
 
-		/*
-		 * Process all the mbufs after the remapping is complete.
-		 * Break the mbuf chain first though.
-		 */
-		while ((m = mbufq_dequeue(&mbufq_rxq)) != NULL) {
-			if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
+		xn_alloc_rx_buffers(rxq);
 
-			/* XXX: Do we really need to drop the rx lock? */
-			XN_RX_UNLOCK(rxq);
+		RING_FINAL_CHECK_FOR_RESPONSES(&rxq->ring, work_to_do);
+	} while (work_to_do);
+
+	XN_RX_UNLOCK(rxq);
+	mbufq_drain(&mbufq_errq);
+	/*
+	 * Process all the mbufs after the remapping is complete.
+	 * Break the mbuf chain first though.
+	 */
+	while ((m = mbufq_dequeue(&mbufq_rxq)) != NULL) {
+		if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
 #if (defined(INET) || defined(INET6))
-			/* Use LRO if possible */
-			if ((ifp->if_capenable & IFCAP_LRO) == 0 ||
-			    lro->lro_cnt == 0 || tcp_lro_rx(lro, m, 0)) {
-				/*
-				 * If LRO fails, pass up to the stack
-				 * directly.
-				 */
-				(*ifp->if_input)(ifp, m);
-			}
-#else
+		/* Use LRO if possible */
+		if ((ifp->if_capenable & IFCAP_LRO) == 0 ||
+		    lro->lro_cnt == 0 || tcp_lro_rx(lro, m, 0)) {
+			/*
+			 * If LRO fails, pass up to the stack
+			 * directly.
+			 */
 			(*ifp->if_input)(ifp, m);
-#endif
-
-			XN_RX_LOCK(rxq);
 		}
-
-		rxq->ring.rsp_cons = i;
+#else
+		(*ifp->if_input)(ifp, m);
+#endif
+	}
 
 #if (defined(INET) || defined(INET6))
-		/*
-		 * Flush any outstanding LRO work
-		 */
-		tcp_lro_flush_all(lro);
+	/*
+	 * Flush any outstanding LRO work
+	 */
+	tcp_lro_flush_all(lro);
 #endif
-
-		xn_alloc_rx_buffers(rxq);
-
-		RING_FINAL_CHECK_FOR_RESPONSES(&rxq->ring, work_to_do);
-	} while (work_to_do);
+	XN_RX_LOCK(rxq);
 }
 
 static void




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170510083452.tmhumxatf356miv6>