Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Dec 2006 01:38:17 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Gleb Smirnoff <glebius@freebsd.org>
Cc:        cvs-src@freebsd.org, src-committers@freebsd.org, Bruce Evans <bde@freebsd.org>, cvs-all@freebsd.org
Subject:   Re: cvs commit: src/sys/dev/bge if_bge.c
Message-ID:  <20061223011349.O2603@epsplex.bde.org>
In-Reply-To: <20061222003115.R16146@delplex.bde.org>
References:  <200612201203.kBKC3MhO053666@repoman.freebsd.org> <20061220132631.GH34400@FreeBSD.org> <20061222003115.R16146@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 22 Dec 2006, I wrote:

> bge_start_locked() starts with the bge (sc) lock held and never releases
> it as far as I can see.  This this problem can't happen (the lock
> prevents both txeof and the watchdog from being reached before start
> resets the timeout to 5 seconds).
>
> I could only find the lock being released and reacquired in a nested
> routine in bge_rxeof() (for calling if_input()).  I hope this complication
> is never needed for start routines.

Releasing the lock in in rxeof seems dangerous, but all network drivers
do it so there is some chance that it works, and I thought that it worked
as follows in one race case: race with device unload or just an ioctl:
- the device may have been reset while we were in if_input(), so we must
   not use much state.  bge_txeof() and bge_intr() are simple enough to
   be obviously correct here, but bge_rxeof() isn't and seems to be broken
   (see below)
- unload must wait for the interrupt handler to complete before removing
   the interrupt handler's code.  This should be handled by normal interrupt
   handler rundown.  The order seems to be: acquire sc lock; reset and stop
   further interrupts; release sc lock; wait for interrupt handler to finish.

Just a few hours after I thought this, a RELENG_6 kernel crashed on a
null pointer in bge_rxeof().  I had just started rebooting using
reboot(8) with no args, while the interface was being blasted with
tiny packets using ttcp.  The reboot process seems to have shut down
the interface and was waiting for something.  I didn't try to understand
the null pointer.

RELENG_6 seems to be immune to the sbdrop panic that usually occurs
under -current for a less extreme interface shutdown.  Killing the
ttcp udp server (ttcp -u -r) while the interface is being blasted with
tiny packets almost always causes the sbdrop panic if the server is
SMP (I think I saw it once with a !SMP server, but it is unusual then).
Killing the ttcp udp client (ttcp -u -t...) first avoids the panic.
I haven't seen this problem for non-bge interfaces.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061223011349.O2603>