From owner-freebsd-hackers Sun Jun 27 0:54:43 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from rover.village.org (rover.village.org [204.144.255.49]) by hub.freebsd.org (Postfix) with ESMTP id 80813151E7 for ; Sun, 27 Jun 1999 00:54:38 -0700 (PDT) (envelope-from imp@harmony.village.org) Received: from harmony.village.org (harmony.village.org [10.0.0.6]) by rover.village.org (8.9.3/8.9.3) with ESMTP id BAA88827; Sun, 27 Jun 1999 01:54:38 -0600 (MDT) (envelope-from imp@harmony.village.org) Received: from harmony.village.org (localhost.village.org [127.0.0.1]) by harmony.village.org (8.9.3/8.8.3) with ESMTP id BAA09021; Sun, 27 Jun 1999 01:52:44 -0600 (MDT) Message-Id: <199906270752.BAA09021@harmony.village.org> To: Matthew Dillon Subject: Re: [Re: [Re: coarse vs fine-grained locking in SMP systems]] Cc: hackers@FreeBSD.ORG In-reply-to: Your message of "Sun, 27 Jun 1999 00:33:35 PDT." <199906270733.AAA10635@apollo.backplane.com> References: <199906270733.AAA10635@apollo.backplane.com> Date: Sun, 27 Jun 1999 01:52:44 -0600 From: Warner Losh Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <199906270733.AAA10635@apollo.backplane.com> Matthew Dillon writes: : Here's the basic problem: The kernel is currently designed for : single-threaded operation plus interrupt handling. A piece of code : in the kernel can temporarily disable certain interrupts with the : spl*() codes to cover situations where a race on some system resource : might occur. : : But with SMP, several cpu's may be running in supervisor mode : simultaniously. The spl*() model breaks down because while one : can block interrupts, one cannot easily block another cpu that : might be running conflicting code. Resource races can now occur between : mainline code running on several cpu's simultaniously as well as between : mainline code and interrupt code. Yes. However, the spl* model could also be viewed as a few very basic locks. so splnet would block the net interrupts and take out the net mutex, etc. When splx is executed, the interrupts are restored to their old value and the net mutex could be released. In this case the return value of spl* becomes a cookie that can be used to restore both the prior interrupt context, as well as release the mutex aquired. There are problems with this approach, as I believe early efforts in the FreeBSD/SMP project can attest, but I don't recall the details of them. It was originally thought that this could be made to work, if I recall the few messages about SMP that I saw, since you are effectively emulating the spl mechanism accross CPUS. VMS 5.0 introduced a similar concept as well. To get access to a resource, you'd raise the SPL level of the CPU (to keep the hardware devices from interrupting you) and then take out a spin lock (to keep the other CPUs from doing the same). : In order to make SMP operation work better, pieces of the kernel are : slowly being moved outside the "big giant lock". Linux developers, : in fact, have already moved their core data copying code and their TCP : stack outside the lock. At the moment the FreeBSD-current kernel has : not moved anything outside the lock, but John Dyson has shown that it : is fairly easy to move certain specific pieces such as the uiomove() : code outside the lock, though inefficiencies from side-effects currently : make the improvement in performance less then steller. That is correct. At Solbourne[*], we were honest enough to call the one big lock approach ASMP (any CPU could run in kernel mode, but only one at a time). Linux's (and FreeBSD's) SMP has really been mostly ASMP, with a little bit of fine grain locking in the corners. [*] Solbourne, for those of you that don't know, made sparc servers (and one workstation) several years ago. They were SMP years before Sun managed to ship SMP support in Solrais. Many of my SMP "gut feelings" were developed while working there. : The real question is how to manage concurrency as pieces get moved outside : the lock. There are lots of ways to do it. One can use spin locks to : protect resources or, as someone pointed out earlier, to protect sections : of code. I don't know which is better myself, it probably depends on the : situation so a hybrid will probably be the end result. One can also use : kernel threads to simplify resource management. The advantage of a : kernel thread verses a normal process is in the ability to switch between : kernel threads very quickly, allowing the time normally wasted spining in : certain types of locks to be used more efficiently. Solaris wound up using mutexes, condition variables, and semaphores to accomplish this. I don't know the exact details of what they did on a resource stall, however. The ddk tended to discourage exploration of this. I believe it was simply the thread stalled and another thread were allowed to run. I don't know how the scheduler itself was protected. Given that you have a threading kernel, making it SMP safe is generally fairly easy, modulo locking issues. The biggest area that both Solbourne, VMS and Solaris had in their early versions were making sure that deadlock didn't happen. Locks were always a real SOB to get right, and generally the cause of all kinds of problems. When I was testing Solbourne OS/MP 4.0C, I'd say that 95% of the difficult to reproduce problems turned out to be locking related and 60% of the easily reproducible were locking related. The years may have colored my rememberences of the percentages and the version numbers for OS/MP, but it is the one thing that stands out in my mind accross the 9 years it has been since I was doing that. Warner To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message