Date: Mon, 12 Jul 1999 10:41:26 -0700 From: Mike Haertel <mike@ducky.net> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: Luoqi Chen <luoqi@watermarkgroup.com>, dfr@nlsystems.com, jeremyp@gsmx07.alcatel.com.au, freebsd-current@FreeBSD.ORG, mike@ducky.net, mike@ducky.net Subject: Re: "objtrm" problem probably found (was Re: Stuck in "objtrm") Message-ID: <199907121741.KAA17837@ducky.net> In-Reply-To: Your message of "Mon, 12 Jul 1999 09:47:11 PDT." <199907121647.JAA70249@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
You might think that, due to MESI state bits in the cache and bus coherency protocols, that locks are "free". Unfortunately, the lock prefix has a measurable cost on a UP system, at least on P6 and later processors. The reason is that the locked memory operation is an "at-retirement" operation, which means that it waits for the out-of-order execution of all instructions logically older than it to complete before it even starts to operate. (Suppose locks were not at-retirement--then locks on cache lines could be obtained out-of-order, and this would lead to a possibility of global deadlocks even if the original code was deadlock-free.) Locks may in fact have further serializing effects, like draining the store queues prior to obtaining the lock, I have forgotten. Hmm, I am almost sure the lock needs to drain the store queues. Let's assume it does. This all adds up to "locks are painful". Some data: Loop: addl $1, foo subl $1, %ecx jne Loop requires about 30 seconds to 10M iterations, and with "lock; addl $1, foo" requires about 4 minutes and 30 seconds on my 333 MHz P-II. (This loop has other problems and someone else just posted a much better lock benchmark than this. Anyway...) As future processors become more deeply out-of-order, locks will become even more painful (although one could imagine at some point that they might cross the pain threshold that would justify heroic hardware solutions allowing OOO locks). Anyway, taking all that into account, I still agree with Dillon that it is a better software solution to allow the same loadable drivers to work for both UP and MP systems whenever possible. One way to do this while not feeling the full pain of locks would be to make the atomic operations actual function calls through function pointers. They could point to the locked or non-locked versions depending on whether the kernel was SMP. Although function calls are more expensive than inline code, they aren't necessarily a lot more so, and function calls to non-locked RMW operations are certainly much cheaper than inline locked RMW operations. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199907121741.KAA17837>