From owner-svn-src-all@freebsd.org Wed May 10 08:35:00 2017 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 35AA0D66B16; Wed, 10 May 2017 08:35:00 +0000 (UTC) (envelope-from royger@gmail.com) Received: from mail-wr0-x241.google.com (mail-wr0-x241.google.com [IPv6:2a00:1450:400c:c0c::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B1D5D1AC; Wed, 10 May 2017 08:34:59 +0000 (UTC) (envelope-from royger@gmail.com) Received: by mail-wr0-x241.google.com with SMTP id w50so6286307wrc.0; Wed, 10 May 2017 01:34:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=XVlnfQKNNJAVWKgZ5uBqvGxw9ZXCI//h8obNzN6634Q=; b=iM5SPV8e3vIVBZz8ambtQfZK+BtSBAPrF5u6G7koxbaB14meoXwoW2SxE0NX3QvXfa SYIkjxOK3NOzyo66unTRB3Eb90oichEjsoCThRdUNEhhm2oaUZ5oNTJVMKKLoyLXxdqy XGufbV/WV0suJbNYcPUe7lTDug9ZINbk4DbHE+j/4DCalVF/kE7+cN5eORu9O4rdNIAE Vss+BQK0JYxMBeP+bf35bdCGLYXxbszMhtBLNLWxM7PAIpzxj1UbmLwCarbW0m8ApRxP RPmzHMxO4W9CeC9md+Xqmyg2OMNRTIaErU+7m/a53qCY/9nRr4rvvw7NcOT+mO+RjLV9 cCXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=XVlnfQKNNJAVWKgZ5uBqvGxw9ZXCI//h8obNzN6634Q=; b=A7dklT82ZBFx9T4j+1ji0Y1/AH1ZAeppiKkkT/ZH2jHgksyn5HkS7l6skFdBNvsYs5 BmPYV0VQqO1RQWNjwDiLGTq4vnwz+jO5wSxsJzgsIm7fotlIF8Eo/haAQ9C6oYH1rHEY Kf/Jtz9k8SQRyKkR2aHRMKOcztgAxIU6RyongvgjuzsWLJ+WERD93Gt0s0XxpTmDVc3J gbiM3XuZWQ9lfjKcQ+pp0liBnXg0187ehTi7feN3niUYtWT3gn4ZSXbxsTKjAljeRKvI jUIwNlWeqkGWAB7eRtZtXbJun0IrCdkx1JLH8roY5ilFwq5h8Ck4VQ2CJiP8dlldIo28 CmJg== X-Gm-Message-State: AODbwcDVemroEq6xl3JonvtOm8GxlcrflbiMxPMsuw+P6WBrdygblcuR 2btC0iBivDEthQ== X-Received: by 10.80.174.99 with SMTP id c90mr3302064edd.136.1494405297760; Wed, 10 May 2017 01:34:57 -0700 (PDT) Received: from localhost (default-46-102-197-194.interdsl.co.uk. [46.102.197.194]) by smtp.gmail.com with ESMTPSA id x26sm850694edx.60.2017.05.10.01.34.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 May 2017 01:34:56 -0700 (PDT) Sender: =?UTF-8?Q?Roger_Pau_Monn=C3=A9?= Date: Wed, 10 May 2017 09:34:52 +0100 From: Roger Pau =?iso-8859-1?Q?Monn=E9?= To: Colin Percival Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r301198 - head/sys/dev/xen/netfront Message-ID: <20170510083452.tmhumxatf356miv6@dhcp-3-128.uk.xensource.com> References: <201606021116.u52BGajD047287@repo.freebsd.org> <0100015bccba6abc-4c3b1487-25e3-4640-8221-885341ece829-000000@email.amazonses.com> <20170509100912.h3ylwugahvfi5cc2@dhcp-3-128.uk.xensource.com> <0100015beeed2f38-6246d7db-7f23-4b8a-ba50-b97ec0953457-000000@email.amazonses.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <0100015beeed2f38-6246d7db-7f23-4b8a-ba50-b97ec0953457-000000@email.amazonses.com> User-Agent: NeoMutt/20170428 (1.8.2) X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 May 2017 08:35:00 -0000 On Tue, May 09, 2017 at 08:36:13PM +0000, Colin Percival wrote: > On 05/09/17 03:09, Roger Pau Monn� wrote: > > On Wed, May 03, 2017 at 05:13:40AM +0000, Colin Percival wrote: > >> On 06/02/16 04:16, Roger Pau Monn� wrote: > >>> Author: royger > >>> Date: Thu Jun 2 11:16:35 2016 > >>> New Revision: 301198 > >>> URL: https://svnweb.freebsd.org/changeset/base/301198 > >> > >> I think this commit is responsible for panics I'm seeing in EC2 on T2 family > >> instances. [...] > >> but under high traffic volumes I think a separate thread can already be > >> running in xn_rxeof, having dropped the RX lock while it passes a packet > >> up the stack. This would result in two different threads trying to process > >> the same set of responses from the ring, with (unsurprisingly) bad results. > > > > Hm, right, xn_rxeof drops the lock while pushing the packet up the stack. > > There's a "XXX" comment on top of that, could you try to remove the lock > > dripping and see what happens? > > > > I'm not sure there's any reason to drop the lock here, I very much doubt > > if_input is going to sleep. > > Judging by > $ grep -R -B 1 -A 1 if_input /usr/src/sys/dev > I'm pretty sure that we do indeed need to drop the lock. If it's possible > to enter if_input while holding locks, there are a *lot* of network interface > drivers which are dropping locks unnecessarily... > > >> 3. Why xn_ifinit_locked is consuming ring responses. > > > > There might be pending RX packets on the ring, so netfront consumes them and > > signals netback. In the unlikely event that the RX ring was full when > > xn_ifinit_locked is called, not doing this would mean the RX queue would get > > stuck forever, since there's no guarantee netfront will receive event channel > > notifications. > > In that case, I'm guessing it would be safe to skip this if another thread is > already running xn_rxeof and chewing through the packets on the ring? It > would be easy to set a flag in xn_rxeof before we drop locks. Hm, I think it would be better to fix xn_rxeof so that netfront doesn't drop the lock while poking at the ring, what about the following patch? It should also prevent needlessly dropping and acquiring the lock for each packet netfront pushes up the stack. NB: I've only compile-tested this. ---8<--- diff --git a/sys/dev/xen/netfront/netfront.c b/sys/dev/xen/netfront/netfront.c index 99ccdaaf5000..482b9e948fde 100644 --- a/sys/dev/xen/netfront/netfront.c +++ b/sys/dev/xen/netfront/netfront.c @@ -1161,17 +1161,18 @@ xn_rxeof(struct netfront_rxq *rxq) struct mbufq mbufq_rxq, mbufq_errq; int err, work_to_do; - do { - XN_RX_LOCK_ASSERT(rxq); - if (!netfront_carrier_ok(np)) - return; + XN_RX_LOCK_ASSERT(rxq); + + if (!netfront_carrier_ok(np)) + return; - /* XXX: there should be some sane limit. */ - mbufq_init(&mbufq_errq, INT_MAX); - mbufq_init(&mbufq_rxq, INT_MAX); + /* XXX: there should be some sane limit. */ + mbufq_init(&mbufq_errq, INT_MAX); + mbufq_init(&mbufq_rxq, INT_MAX); - ifp = np->xn_ifp; + ifp = np->xn_ifp; + do { rp = rxq->ring.sring->rsp_prod; rmb(); /* Ensure we see queued responses up to 'rp'. */ @@ -1191,7 +1192,7 @@ xn_rxeof(struct netfront_rxq *rxq) } m->m_pkthdr.rcvif = ifp; - if ( rx->flags & NETRXF_data_validated ) { + if (rx->flags & NETRXF_data_validated) { /* * According to mbuf(9) the correct way to tell * the stack that the checksum of an inbound @@ -1214,50 +1215,45 @@ xn_rxeof(struct netfront_rxq *rxq) } (void )mbufq_enqueue(&mbufq_rxq, m); - rxq->ring.rsp_cons = i; } - mbufq_drain(&mbufq_errq); + rxq->ring.rsp_cons = i; - /* - * Process all the mbufs after the remapping is complete. - * Break the mbuf chain first though. - */ - while ((m = mbufq_dequeue(&mbufq_rxq)) != NULL) { - if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1); + xn_alloc_rx_buffers(rxq); - /* XXX: Do we really need to drop the rx lock? */ - XN_RX_UNLOCK(rxq); + RING_FINAL_CHECK_FOR_RESPONSES(&rxq->ring, work_to_do); + } while (work_to_do); + + XN_RX_UNLOCK(rxq); + mbufq_drain(&mbufq_errq); + /* + * Process all the mbufs after the remapping is complete. + * Break the mbuf chain first though. + */ + while ((m = mbufq_dequeue(&mbufq_rxq)) != NULL) { + if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1); #if (defined(INET) || defined(INET6)) - /* Use LRO if possible */ - if ((ifp->if_capenable & IFCAP_LRO) == 0 || - lro->lro_cnt == 0 || tcp_lro_rx(lro, m, 0)) { - /* - * If LRO fails, pass up to the stack - * directly. - */ - (*ifp->if_input)(ifp, m); - } -#else + /* Use LRO if possible */ + if ((ifp->if_capenable & IFCAP_LRO) == 0 || + lro->lro_cnt == 0 || tcp_lro_rx(lro, m, 0)) { + /* + * If LRO fails, pass up to the stack + * directly. + */ (*ifp->if_input)(ifp, m); -#endif - - XN_RX_LOCK(rxq); } - - rxq->ring.rsp_cons = i; +#else + (*ifp->if_input)(ifp, m); +#endif + } #if (defined(INET) || defined(INET6)) - /* - * Flush any outstanding LRO work - */ - tcp_lro_flush_all(lro); + /* + * Flush any outstanding LRO work + */ + tcp_lro_flush_all(lro); #endif - - xn_alloc_rx_buffers(rxq); - - RING_FINAL_CHECK_FOR_RESPONSES(&rxq->ring, work_to_do); - } while (work_to_do); + XN_RX_LOCK(rxq); } static void