From owner-svn-src-head@FreeBSD.ORG  Sun Mar 14 22:54:45 2010
Return-Path: <owner-svn-src-head@FreeBSD.ORG>
Delivered-To: svn-src-head@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 27983106566B;
	Sun, 14 Mar 2010 22:54:45 +0000 (UTC)
	(envelope-from pyunyh@gmail.com)
Received: from mail-gx0-f214.google.com (mail-gx0-f214.google.com
	[209.85.217.214])
	by mx1.freebsd.org (Postfix) with ESMTP id 937C28FC12;
	Sun, 14 Mar 2010 22:54:44 +0000 (UTC)
Received: by gxk6 with SMTP id 6so1604908gxk.14
	for <multiple recipients>; Sun, 14 Mar 2010 15:54:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:received:from:date:to:cc
	:subject:message-id:reply-to:references:mime-version:content-type
	:content-disposition:in-reply-to:user-agent;
	bh=WYRafYRTlpyWgFZ8TFbZivG2BaGU0K8JJj57H5ip1Ow=;
	b=xPSal475bGkUwERwBSk/tPif2E2+76ebeNUy7GXCvDAOaUv+qrkjTKZIze9nX1WwPj
	ftAxnDRaGjrdT1iebqZTiEg0NYk8h4SgP+mQtJOy5TyU5bL8O6o9h4juRMqM8Zr6JZq8
	gfJdxvJwdU/eLcZeuuqJGIZvhLT1f+SZ+lFR8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=from:date:to:cc:subject:message-id:reply-to:references:mime-version
	:content-type:content-disposition:in-reply-to:user-agent;
	b=LMGx0u16C1lD8SNbG5hp6javCOAY9lcvVs/1waYvC8BmRNb5XdqzeQT4tJhWCjGLGj
	RsWDMyRQ6OhSalNjrQGyK1kS4cVK83zT3oCiT9Nd+Gs+v8fqQkUZ2Lqj3SfKtDqLUONJ
	Kvk0rQm8JNninUKOKI1QUeUxbpkRdrpGTAppA=
Received: by 10.101.136.1 with SMTP id o1mr3203532ann.17.1268607283617;
	Sun, 14 Mar 2010 15:54:43 -0700 (PDT)
Received: from pyunyh@gmail.com ([174.35.1.224])
	by mx.google.com with ESMTPS id 5sm1382745ywd.59.2010.03.14.15.54.41
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Sun, 14 Mar 2010 15:54:42 -0700 (PDT)
Received: by pyunyh@gmail.com (sSMTP sendmail emulation);
	Sun, 14 Mar 2010 15:54:09 -0700
From: Pyun YongHyeon <pyunyh@gmail.com>
Date: Sun, 14 Mar 2010 15:54:09 -0700
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20100314225409.GA1245@michelle.cdnetworks.com>
References: <201003121818.o2CII4ri076014@svn.freebsd.org>
	<20100313222131.K22847@delplex.bde.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20100313222131.K22847@delplex.bde.org>
User-Agent: Mutt/1.4.2.3i
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
	src-committers@freebsd.org, Pyun YongHyeon <yongari@freebsd.org>
Subject: Re: svn commit: r205090 - head/sys/dev/bge
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: pyunyh@gmail.com
List-Id: SVN commit messages for the src tree for head/-current
	<svn-src-head.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
	<mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
	<mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Mar 2010 22:54:45 -0000

On Sat, Mar 13, 2010 at 11:05:11PM +1100, Bruce Evans wrote:
> On Fri, 12 Mar 2010, Pyun YongHyeon wrote:
> 
> >Log:
> > Reorder interrupt handler a bit such that producer/consumer
> > index of status block is read first before acknowledging the
> > interrupts. Otherwise bge(4) may get stale status block as
> > acknowledging an interrupt may yield another status block update.
> >
> > Reviewed by:	marius
> 
> Er, doesn't this give a race instead?  It undoes a critical part of
> rev.1.169 but not the comment part which still says that the ack is

You're probably right it may increase race window for interrupts.
But it ensures coherent accesses to the status block which is more
important than losing interrupts. I think old code fought not to
lose interrupts by acknowledging interrupts first and tried to get
next interrupts if status block was updated in interrupt handler. 
Old code also had similar race window because it blindly
acknowledged the interrupt, status block could be updated after the
acknowledgment of the interrupt.

> done first, and why (to ensure getting another interrupt if the status
> block changes after we have looked at it).
> 
> % 	/*
> % 	 * Do the mandatory PCI flush as well as get the link status.
> % 	 */
> % 	statusword = CSR_READ_4(sc, BGE_MAC_STS) & BGE_MACSTAT_LINK_CHANGED;
> % 
> % 	/* Make sure the descriptor ring indexes are coherent. */
> % 	bus_dmamap_sync(sc->bge_cdata.bge_status_tag,
> % 	    sc->bge_cdata.bge_status_map,
> % 	    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
> % 	rx_prod = sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx;
> % 	tx_cons = sc->bge_ldata.bge_status_block->bge_idx[0].bge_tx_cons_idx;
> % 	sc->bge_ldata.bge_status_block->bge_status = 0;
> % 	bus_dmamap_sync(sc->bge_cdata.bge_status_tag,
> % 	    sc->bge_cdata.bge_status_map,
> % 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
> 
> The above presumably gives sufficiently coherent accesses to the status
> block, but what happens if a status update occurs now (before the ack).

Theoretically this may happen if interrupt is shared with other
devices. Since bge(4) does not check whether the interrupt is ours
it may blindly process the interrupt.

> Doesn't the ack prevent an interrupt for this status update?  I think

If interrupt is shared with other devices it wouldn't.

> tx_prod and tx cons (read above) don't become stale since they are only
> advanced by software, and we may processes tx and rx descriptors beyond
> the ones reported by status updates before or after the ack, but
> statusword (read above) does become stale.
> 

I think this was a long standing bug of bge(4) and it kept bge(4)
from running on PAE environments. In order not to lose interrupts I
believe bge(4) should use tagged status mode which will enable
interrupt tracking via status block. Relying on some timing in
interrupt handler can't solve the root cause. All controllers
except BCM5700 supports tagged status mode and this commit is the
first step to the tagged status mode which requires coherent
accesses to the status block. Because driver should tell which
interrupts were handled in the interrupt handler coherent access
to status block is critical one.

> % 
> % 	/*
> % 	 * Ack the interrupt by writing something to BGE_MBX_IRQ0_LO.  Don't
> % 	 * disable interrupts by writing nonzero like we used to, since with
> % 	 * our current organization this just gives complications and
> % 	 * pessimizations for re-enabling interrupts.  We used to have races
> % 	 * instead of the necessary complications.  Disabling interrupts
> 
> I don't remember seeing races with the current order, but I seem to
> remember seeing them when the ack was the last hardware thing in the
> function.  As described in detail below, the latter gives quite a
> large race window so it is easy to miss an interrupt.
> 
> % 	 * would just reduce the chance of a status update while we are
> % 	 * running (by switching to the interrupt-mode coalescence
> % 	 * parameters), but this chance is already very low so it is more
> % 	 * efficient to get another interrupt than prevent it.
> 
> This describes why it doesn't matter if we get an extra interrupt due to
> the status block being updated after the ack, even in rev.1.168 when the
> race window was much larger (it was the entire runtime of bge_intr(),
> which can be several hundred uS; now it is several hundred nS).
> 

Correct. Having an extra interrupt wouldn't hurt so bge(4) should
implement reliable way to track last acked interrupts instead of
relying on timing.

> % 	 *
> % 	 * We do the ack first to ensure another interrupt if there is a
> % 	 * status update after the ack.  We don't check for the status
> 
> But we don't do the ack first any more.
> 

I agree current comment does not match with code. I'll fix that
after implementing tagged status mode.

> % 	 * changing later because it is more efficient to get another
> % 	 * interrupt than prevent it, not quite as above (not checking is
> % 	 * a smaller optimization than not toggling the interrupt enable,
> % 	 * since checking doesn't involve PCI accesses and toggling require
> % 	 * the status check).  So toggling would probably be a pessimization
> % 	 * even with MSI.  It would only be needed for using a task queue.
> % 	 */
> % 	bge_writembx(sc, BGE_MBX_IRQ0_LO, 0);
> 
> Bruce