From owner-freebsd-hackers@FreeBSD.ORG  Wed Jun 18 21:29:45 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A7ADE37B401
	for <freebsd-hackers@freebsd.org>;
	Wed, 18 Jun 2003 21:29:45 -0700 (PDT)
Received: from smtp.goamerica.net (ny-mx-02.goamerica.net [208.200.67.109])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CB94943FBF
	for <freebsd-hackers@freebsd.org>;
	Wed, 18 Jun 2003 21:29:44 -0700 (PDT)	(envelope-from eaja@erols.com)
Received: from localhost (165.sub-166-141-30.myvzw.com [166.141.30.165])
	by smtp.goamerica.net (8.12.8/8.12.8) with SMTP id h5J4TQ9I009235
	for <freebsd-hackers@freebsd.org>;
	Thu, 19 Jun 2003 00:29:31 -0400 (EDT)
Date: Thu, 19 Jun 2003 00:28:27 -0400
From: Eric Jacobs <eaja@erols.com>
To: freebsd-hackers@freebsd.org
Message-Id: <20030619002827.561faeda.eaja@erols.com>
X-Mailer: Sylpheed version 0.8.5 (GTK+ 1.2.10; i386-portbld-freebsd4.2)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: timeout(9), mutexes,  and races
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Jun 2003 04:29:45 -0000


The other day, I had a panic with my 5.1-RELEASE kernel when I
removed my Cardbus NIC (3Com 3c575B Fast Etherlink XL, using the
xl driver.) The traceback indicated a pretty uninteresting race
between a timeout routine (xl_stats_update) and the card being
detached. xl_stats_update was being called after the device's
softc had been freed.

I'm not sure exactly what the problem is, but the following
caught my eye in kern_timeout.c:

                                mtx_unlock_spin(&callout_lock); 
                                if (!(c_flags & CALLOUT_MPSAFE))
                                        mtx_lock(&Giant);

The timeout(9) callouts never have the CALLOUT_MPSAFE flag set,
so we always try to acquire Giant here. But there's an gap where
we can be preempted (mtx_lock is specifically documented that it
can do this), and so the cardbus interrupt could be serviced at
this time, removing the callout entry but still calling it here
when Giant is finally acquired.

Would the solution be to try to detect this condition (callout
removed in an intervening thread) somehow? In the new callout
interface, clients are responsible for allocating the callout
struct, so it may not even exist by the time we get to check
it. The situation seems to be even worse for CALLOUT_MPSAFE
entries, because it wouldn't help to check it before the
mutex has been locked, but if it's not Giant, we have no way
of knowing what mutex it would be...

Or is there another way to solve this somehow? Or am I completely
missing this and seeing the wrong problem? :)

Any ideas would be appreciated.

Eric