From owner-cvs-all@FreeBSD.ORG  Fri Dec 22 15:00:47 2006
Return-Path: <owner-cvs-all@FreeBSD.ORG>
X-Original-To: cvs-all@FreeBSD.org
Delivered-To: cvs-all@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 62D4D16A503;
	Fri, 22 Dec 2006 15:00:47 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.freebsd.org (Postfix) with ESMTP id 0C9DA13C4BF;
	Fri, 22 Dec 2006 15:00:46 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57])
	(authenticated bits=0)
	by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id kBMF0bk3014094;
	Fri, 22 Dec 2006 08:00:44 -0700 (MST)
	(envelope-from scottl@samsco.org)
Message-ID: <458BF314.3000702@samsco.org>
Date: Fri, 22 Dec 2006 10:00:36 -0500
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US;
	rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5
MIME-Version: 1.0
To: Bruce Evans <bde@zeta.org.au>
References: <200612201203.kBKC3MhO053666@repoman.freebsd.org>
	<20061220132631.GH34400@FreeBSD.org>
	<20061222003115.R16146@delplex.bde.org>
	<20061223011349.O2603@epsplex.bde.org>
In-Reply-To: <20061223011349.O2603@epsplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed 
	version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org
Cc: cvs-all@FreeBSD.org, cvs-src@FreeBSD.org,
	Gleb Smirnoff <glebius@FreeBSD.org>,
	Bruce Evans <bde@FreeBSD.org>, src-committers@FreeBSD.org
Subject: Re: cvs commit: src/sys/dev/bge if_bge.c
X-BeenThere: cvs-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: CVS commit messages for the entire tree <cvs-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-all>
List-Post: <mailto:cvs-all@freebsd.org>
List-Help: <mailto:cvs-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Dec 2006 15:00:47 -0000

Bruce Evans wrote:
> On Fri, 22 Dec 2006, I wrote:
> 
>> bge_start_locked() starts with the bge (sc) lock held and never releases
>> it as far as I can see.  This this problem can't happen (the lock
>> prevents both txeof and the watchdog from being reached before start
>> resets the timeout to 5 seconds).
>>
>> I could only find the lock being released and reacquired in a nested
>> routine in bge_rxeof() (for calling if_input()).  I hope this 
>> complication
>> is never needed for start routines.
> 
> Releasing the lock in in rxeof seems dangerous, but all network drivers
> do it so there is some chance that it works, and I thought that it worked
> as follows in one race case: race with device unload or just an ioctl:
> - the device may have been reset while we were in if_input(), so we must
>   not use much state.  bge_txeof() and bge_intr() are simple enough to
>   be obviously correct here, but bge_rxeof() isn't and seems to be broken
>   (see below)
> - unload must wait for the interrupt handler to complete before removing
>   the interrupt handler's code.  This should be handled by normal interrupt
>   handler rundown.  The order seems to be: acquire sc lock; reset and stop
>   further interrupts; release sc lock; wait for interrupt handler to 
> finish.
> 
> Just a few hours after I thought this, a RELENG_6 kernel crashed on a
> null pointer in bge_rxeof().  I had just started rebooting using
> reboot(8) with no args, while the interface was being blasted with
> tiny packets using ttcp.  The reboot process seems to have shut down
> the interface and was waiting for something.  I didn't try to understand
> the null pointer.

The em driver was getting into problems in a similar way because it was
freeing and reallocating data structures in its reset routine.  This was
a bug on many levels, but it has been fixed.  I didn't see similar bugs
in the bge driver, but there could easily be others.

In any case, the driver lock must not be held when calling if_input
because there are many ways the codepath can loop from there back into
if_start of the same driver or another driver.  There is no way around
this without doing big decoupling steps that will impact common
performance cases.  That said, dropping and regrabbing the driver lock
in the rxeof routine of any driver is bad.  It may be safe to do, but it
incurs horrible performance penalties.  It essentially allows the
time-critical, high priority RX path to be constantly preempted by the
lower priority if_start or if_ioctl paths.  Even without this preemption
and priority inversion, you're doing an excessive number of expensive
lock ops in the fast path.

The key to dropping the lock here safely, or running without a lock at 
all (like if_em) is to synchronize dangerous operations at a higher 
level.  For driver unload, you need to block the interrupt and call
bus_teardown_intr before freeing structures that could be touched by the
interrupt handler.  For taskqueue and callout driven drivers, you need 
to drain and block those contexts before doing anything dangerous.  It's
not impossible to do, you just have to be careful, think about all of 
the entry points, and not assume that mutexes have magical powers.

Scott