From owner-freebsd-amd64@FreeBSD.ORG  Wed Jan 10 02:28:41 2007
Return-Path: <owner-freebsd-amd64@FreeBSD.ORG>
X-Original-To: freebsd-amd64@freebsd.org
Delivered-To: freebsd-amd64@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 3C20916A415;
	Wed, 10 Jan 2007 02:28:41 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailout1.pacific.net.au (mailout1-3.pacific.net.au [61.8.2.210])
	by mx1.freebsd.org (Postfix) with ESMTP id D265313C4A9;
	Wed, 10 Jan 2007 02:28:40 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.2.162])
	by mailout1.pacific.net.au (Postfix) with ESMTP id 159EF5A3EBC;
	Wed, 10 Jan 2007 13:28:36 +1100 (EST)
Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailproxy1.pacific.net.au (Postfix) with ESMTP id E21A38C29;
	Wed, 10 Jan 2007 13:28:33 +1100 (EST)
Date: Wed, 10 Jan 2007 13:28:33 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@besplex.bde.org
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <200701091151.17166.jhb@freebsd.org>
Message-ID: <20070110123340.E16378@besplex.bde.org>
References: <1168211205.22629.6.camel@lanshark.dmv.com>
	<20070108154433.C75042@delplex.bde.org>
	<200701091151.17166.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: rwatson@freebsd.org, freebsd-amd64@freebsd.org
Subject: Re: Panic in 6.2-PRERELEASE with bge on amd64
X-BeenThere: freebsd-amd64@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Porting FreeBSD to the AMD64 platform <freebsd-amd64.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-amd64>,
	<mailto:freebsd-amd64-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-amd64>
List-Post: <mailto:freebsd-amd64@freebsd.org>
List-Help: <mailto:freebsd-amd64-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-amd64>,
	<mailto:freebsd-amd64-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Jan 2007 02:28:41 -0000

On Tue, 9 Jan 2007, John Baldwin wrote:

> On Monday 08 January 2007 00:06, Bruce Evans wrote:
>> Like most NIC drivers, bge unlocks and re-locks around its call to
>> ether_input() in its interrupt handler.  This isn't very safe, and it
>> certainly causes panics for bge.  I often see it panic when bringing
>> the interface down and up while input is arriving, on a non-SMP non-amd64
>> (actually i386) non-6.x (actually -current) system.  Bringing the
>> interface down is probably the worst case.  It creates a null pointer
>> for bge_intr() to follow.
>
> Why do you feel that it is unsafe to drop the lock around if_input()?

General principles.  After dropping a lock, it is necessary to check for
any relevant state changes after reacquiring the lock.  This is not
easy to do.  bge_rxeof() has no explicit checking.  It apparently
depends on implicit checking and/or no relevant state changes occurring.
This is even less easy to do.  It seems to be possible for at least the
huge state change of the device being uninitialized to occur.  Then it
is surprising for bge_rxeof() to get as far as it does before crashing.
The critical things that it does are (for the nun-jumbo case):

% 	while(sc->bge_rx_saved_considx !=
% 	    sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx) {

We get back here immediately after reacquiring the lock (if the indexes
don't match).

% ...
% 		cur_rx =
% 	    &sc->bge_ldata.bge_rx_return_ring[sc->bge_rx_saved_considx];
% 
% 		rxidx = cur_rx->bge_idx;

cur_tx must be non-NULL to get this far.  I don't understand its lifetime.

% ...
% 		if (cur_rx->bge_flags & BGE_RXBDFLAG_JUMBO_RING) {
% ...
% 		} else {
% 			BGE_INC(sc->bge_std, BGE_STD_RX_RING_CNT);
% 			bus_dmamap_sync(sc->bge_cdata.bge_mtag,
% 			    sc->bge_cdata.bge_rx_std_dmamap[rxidx],
% 			    BUS_DMASYNC_POSTREAD);
% 			bus_dmamap_unload(sc->bge_cdata.bge_mtag,
% 			    sc->bge_cdata.bge_rx_std_dmamap[rxidx]);

I don't understand the lifetime of the dma data structures.

% 			m = sc->bge_cdata.bge_rx_std_chain[rxidx];
% 			sc->bge_cdata.bge_rx_std_chain[rxidx] = NULL;

All entries in bge_rx_std_chain are NULL after uninitialization, so m
is certain to be NULL here after uninitialization.  Note that this can
happen even if the uninitialization routines wait for the hardware to
finish using the mbufs before freeing them.  The hardware may have
used the old sc->bge_cdata.bge_rx_std_chain[rxidx] with no problems.
In fact, after uninitialization we can only get here in two ways:
- the hardware used the mbuf and updated the status block to indicate
   the change, and (even if the uninitialization stopped the hardware
   correctly) the the uninitialization didn't modify the status block
   enough.  The use and update may occur either before bge_rxeof() is
   called or concurrently.
- the uninitialization modified the status block in way that caused
   the problem.  The only setting that prevents the
   loop proceeding is sc->bge_rx_saved_considx ==
     sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx,
   and for setting both these indexes to work the hardware must be
   stopped so that it doesn't update the producer index.
In fact, the second way doesn't seem to happen -- bge initializes all
the indexes to 0 in its init routine, but doesn't seem to touch them
in its uninit routine.

% ...
% 		}
% ...
% 		BGE_UNLOCK(sc);
% 		(*ifp->if_input)(ifp, m);
% 		BGE_LOCK(sc);
% 	}

Stopping the loop by setting the indexes equal is an efficient way to
abort bge_rxeof(), but the async status update makes it fragile.

I think the current problem is not a full uninit but more subtle.

Bruce