Date: Thu, 19 Jun 2003 00:28:27 -0400 From: Eric Jacobs <eaja@erols.com> To: freebsd-hackers@freebsd.org Subject: timeout(9), mutexes, and races Message-ID: <20030619002827.561faeda.eaja@erols.com>
next in thread | raw e-mail | index | archive | help
The other day, I had a panic with my 5.1-RELEASE kernel when I removed my Cardbus NIC (3Com 3c575B Fast Etherlink XL, using the xl driver.) The traceback indicated a pretty uninteresting race between a timeout routine (xl_stats_update) and the card being detached. xl_stats_update was being called after the device's softc had been freed. I'm not sure exactly what the problem is, but the following caught my eye in kern_timeout.c: mtx_unlock_spin(&callout_lock); if (!(c_flags & CALLOUT_MPSAFE)) mtx_lock(&Giant); The timeout(9) callouts never have the CALLOUT_MPSAFE flag set, so we always try to acquire Giant here. But there's an gap where we can be preempted (mtx_lock is specifically documented that it can do this), and so the cardbus interrupt could be serviced at this time, removing the callout entry but still calling it here when Giant is finally acquired. Would the solution be to try to detect this condition (callout removed in an intervening thread) somehow? In the new callout interface, clients are responsible for allocating the callout struct, so it may not even exist by the time we get to check it. The situation seems to be even worse for CALLOUT_MPSAFE entries, because it wouldn't help to check it before the mutex has been locked, but if it's not Giant, we have no way of knowing what mutex it would be... Or is there another way to solve this somehow? Or am I completely missing this and seeing the wrong problem? :) Any ideas would be appreciated. Eric
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030619002827.561faeda.eaja>