From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 18 21:29:45 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A7ADE37B401 for ; Wed, 18 Jun 2003 21:29:45 -0700 (PDT) Received: from smtp.goamerica.net (ny-mx-02.goamerica.net [208.200.67.109]) by mx1.FreeBSD.org (Postfix) with ESMTP id CB94943FBF for ; Wed, 18 Jun 2003 21:29:44 -0700 (PDT) (envelope-from eaja@erols.com) Received: from localhost (165.sub-166-141-30.myvzw.com [166.141.30.165]) by smtp.goamerica.net (8.12.8/8.12.8) with SMTP id h5J4TQ9I009235 for ; Thu, 19 Jun 2003 00:29:31 -0400 (EDT) Date: Thu, 19 Jun 2003 00:28:27 -0400 From: Eric Jacobs To: freebsd-hackers@freebsd.org Message-Id: <20030619002827.561faeda.eaja@erols.com> X-Mailer: Sylpheed version 0.8.5 (GTK+ 1.2.10; i386-portbld-freebsd4.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: timeout(9), mutexes, and races X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jun 2003 04:29:45 -0000 The other day, I had a panic with my 5.1-RELEASE kernel when I removed my Cardbus NIC (3Com 3c575B Fast Etherlink XL, using the xl driver.) The traceback indicated a pretty uninteresting race between a timeout routine (xl_stats_update) and the card being detached. xl_stats_update was being called after the device's softc had been freed. I'm not sure exactly what the problem is, but the following caught my eye in kern_timeout.c: mtx_unlock_spin(&callout_lock); if (!(c_flags & CALLOUT_MPSAFE)) mtx_lock(&Giant); The timeout(9) callouts never have the CALLOUT_MPSAFE flag set, so we always try to acquire Giant here. But there's an gap where we can be preempted (mtx_lock is specifically documented that it can do this), and so the cardbus interrupt could be serviced at this time, removing the callout entry but still calling it here when Giant is finally acquired. Would the solution be to try to detect this condition (callout removed in an intervening thread) somehow? In the new callout interface, clients are responsible for allocating the callout struct, so it may not even exist by the time we get to check it. The situation seems to be even worse for CALLOUT_MPSAFE entries, because it wouldn't help to check it before the mutex has been locked, but if it's not Giant, we have no way of knowing what mutex it would be... Or is there another way to solve this somehow? Or am I completely missing this and seeing the wrong problem? :) Any ideas would be appreciated. Eric