Date: Thu, 18 Jan 2001 01:41:12 -0500 From: "Bosko Milekic" <bmilekic@technokratis.com> To: <arch@FreeBSD.ORG> Cc: "John Baldwin" <jhb@FreeBSD.org>, <jasone@freebsd.org> Subject: mtx_init: MTX_RECURSE and potential SMP troublespots Message-ID: <001d01c08119$a62482b0$1f90c918@jehovah>
next in thread | raw e-mail | index | archive | help
Hello, This patch: http://people.freebsd.org/~bmilekic/code/recur_witn.diff introduces MTX_RECURSE (the old MTX_RECURSE state bit is renamed to MTX_RECURSED) and makes passing the MTX_RECURSE bit to mtx_init() get registered in the witness code such that during execution, the WITNESS portion of mtx_enter/mtx_exit (witness_enter/witness_exit) will call panic if a mutex is recursed but did not have this bit in its flags during initialization. This is a first change for debugging purposes mainly in preparation for a modified, cleaner, mtx_*() interface which is likely to be coming to -CURRENT eventually. The diff also cleans up some mtx_recurse != 0 checks to use the already provided mtx_recursed() macros. Final reviews are welcome. (LINT builds with this, by the way, and -CURRENT on my machine seems to be going fine, so far). Now, the important bit. The patch attempts to add MTX_RECURSE to existing mutex locks that are known to recurse. Amongst these, we have: eventhandler, Giant, callout, sched_lock, and a whole bunch of network card drivers in pci/* and company, as well as some others which I was able to find out recurse. If you are a maintainer or have added locks to some subsystem, please glance at the patch and make sure that if your locks recurse, that their mtx_init() has been changed to include the MTX_RECURSE bit. If this is not the case, please let us know ASAP. Furthermore, if you've added a lock that you know/think should not be recursing and you see its mtx_init() call was modified in the patch to include the MTX_RECURSE bit, also please let us know. While going around the code, I've spotted some definite problems, and some potential problems. There are likely more which I've missed, so if you've added any locks, please make sure to review for similar problems. sys/dev/acpica/Osd/OsdSynch.c probably has a bug where AcpiOsDeleteSemaphore() does not mtx_destroy() on its previously mtx_init()ed mutex locks. sys/dev/ichsmb passes MTX_NORECURSE to mtx_init(), when it's only supposed to be passed to mtx_exit(). Apparently, this is not in mutex(9) [yet?] sys/dev/isp probably calls tsleep() with a mutex held and has a wakeup() for that sleep surrounded by the mutex lock's acquire/drop. sys/dev/pccbb seems to protect wakeup() calls w/ a mutex enter/exit. This is odd because just code is typically not supposed to be protected with a mutex. This may be a problem because I believe I saw this code drop the mutex before calling tsleep() which may be a race because in this case, a wakeup() can come before the actual tsleep() is entered (or before the switch happens) and therefore the thread will go to sleep even if a wakeup() was already issued just prior to. sys/dev/yarrow.c "may" be violating lock order (there _may_ be a lock order reversal); the fact that printf()s are surrounded with Giant lock enter/exit and that some of the routines containing the printf() calls are called with the driver's lock held and that some of them aren't may be a problem if those that are not do attempt to acquire the driver lock and are entered with Giant already held (from somewhere). There may be a lock reversal; but I haven't been able to investigate sufficiently to be able to claim this with absolute certainty. Regards, Bosko. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?001d01c08119$a62482b0$1f90c918>