Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 May 2017 05:13:39 +0000
From:      Colin Percival <cperciva@tarsnap.com>
To:        =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= <royger@FreeBSD.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r301198 - head/sys/dev/xen/netfront
Message-ID:  <0100015bccba63ab-edf2debb-6781-4d23-b21a-fa25b2a11803-000000@email.amazonses.com>
In-Reply-To: <201606021116.u52BGajD047287@repo.freebsd.org>
References:  <201606021116.u52BGajD047287@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 06/02/16 04:16, Roger Pau Monné wrote:
> Author: royger
> Date: Thu Jun  2 11:16:35 2016
> New Revision: 301198
> URL: https://svnweb.freebsd.org/changeset/base/301198

I think this commit is responsible for panics I'm seeing in EC2 on T2 family
instances.  Every time a DHCP request is made, we call into xn_ifinit_locked
(not sure why -- something to do with making the interface promiscuous?) and
hit this code

> @@ -1760,7 +1715,7 @@ xn_ifinit_locked(struct netfront_info *n
>  		xn_alloc_rx_buffers(rxq);
>  		rxq->ring.sring->rsp_event = rxq->ring.rsp_cons + 1;
>  		if (RING_HAS_UNCONSUMED_RESPONSES(&rxq->ring))
> -			taskqueue_enqueue(rxq->tq, &rxq->intrtask);
> +			xn_rxeof(rxq);
>  		XN_RX_UNLOCK(rxq);
>  	}

but under high traffic volumes I think a separate thread can already be
running in xn_rxeof, having dropped the RX lock while it passes a packet
up the stack.  This would result in two different threads trying to process
the same set of responses from the ring, with (unsurprisingly) bad results.

I'm not 100% sure that this is what's causing the panic, but it's definitely
happening under high traffic conditions immediately after xn_ifinit_locked is
called, so I think my speculation is well-founded.

There are a few things I don't understand here:
1. Why DHCP requests are resulting in calls into xn_ifinit_locked.
2. Why the calls into xn_ifinit_locked are only happening on T2 instances
and not on any of the other EC2 instances I've tried.
3. Why xn_ifinit_locked is consuming ring responses.
so I'm not sure what the solution is, but hopefully someone who knows this
code better will be able to help...

-- 
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0100015bccba63ab-edf2debb-6781-4d23-b21a-fa25b2a11803-000000>