From owner-freebsd-hackers  Sun Jun 27  0:54:43 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from rover.village.org (rover.village.org [204.144.255.49])
	by hub.freebsd.org (Postfix) with ESMTP id 80813151E7
	for <hackers@FreeBSD.ORG>; Sun, 27 Jun 1999 00:54:38 -0700 (PDT)
	(envelope-from imp@harmony.village.org)
Received: from harmony.village.org (harmony.village.org [10.0.0.6])
	by rover.village.org (8.9.3/8.9.3) with ESMTP id BAA88827;
	Sun, 27 Jun 1999 01:54:38 -0600 (MDT)
	(envelope-from imp@harmony.village.org)
Received: from harmony.village.org (localhost.village.org [127.0.0.1]) by harmony.village.org (8.9.3/8.8.3) with ESMTP id BAA09021; Sun, 27 Jun 1999 01:52:44 -0600 (MDT)
Message-Id: <199906270752.BAA09021@harmony.village.org>
To: Matthew Dillon <dillon@apollo.backplane.com>
Subject: Re: [Re: [Re: coarse vs fine-grained locking in SMP systems]] 
Cc: hackers@FreeBSD.ORG
In-reply-to: Your message of "Sun, 27 Jun 1999 00:33:35 PDT."
		<199906270733.AAA10635@apollo.backplane.com> 
References: <199906270733.AAA10635@apollo.backplane.com>  <XFMail.990626184454.doconnor@gsoft.com.au> 
Date: Sun, 27 Jun 1999 01:52:44 -0600
From: Warner Losh <imp@harmony.village.org>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <199906270733.AAA10635@apollo.backplane.com> Matthew Dillon writes:
:     Here's the basic problem:  The kernel is currently designed for 
:     single-threaded operation plus interrupt handling.  A piece of code
:     in the kernel can temporarily disable certain interrupts with the
:     spl*() codes to cover situations where a race on some system resource
:     might occur.
: 
:     But with SMP, several cpu's may be running in supervisor mode 
:     simultaniously.  The spl*() model breaks down because while one
:     can block interrupts, one cannot easily block another cpu that
:     might be running conflicting code.  Resource races can now occur between
:     mainline code running on several cpu's simultaniously as well as between
:     mainline code and interrupt code.

Yes.  However, the spl* model could also be viewed as a few very basic 
locks.  so splnet would block the net interrupts and take out the net
mutex, etc.  When splx is executed, the interrupts are restored to
their old value and the net mutex could be released.  In this case the 
return value of spl* becomes a cookie that can be used to restore both 
the prior interrupt context, as well as release the mutex aquired.

There are problems with this approach, as I believe early efforts in
the FreeBSD/SMP project can attest, but I don't recall the details of
them.  It was originally thought that this could be made to work, if I 
recall the few messages about SMP that I saw, since you are
effectively emulating the spl mechanism accross CPUS.

VMS 5.0 introduced a similar concept as well.  To get access to a
resource, you'd raise the SPL level of the CPU (to keep the hardware
devices from interrupting you) and then take out a spin lock (to keep
the other CPUs from doing the same).

:     In order to make SMP operation work better, pieces of the kernel are
:     slowly being moved outside the "big giant lock".  Linux developers,
:     in fact, have already moved their core data copying code and their TCP
:     stack outside the lock.  At the moment the FreeBSD-current kernel has
:     not moved anything outside the lock, but John Dyson has shown that it
:     is fairly easy to move certain specific pieces such as the uiomove()
:     code outside the lock, though inefficiencies from side-effects currently
:     make the improvement in performance less then steller.

That is correct.  At Solbourne[*], we were honest enough to call the one
big lock approach ASMP (any CPU could run in kernel mode, but only one 
at a time).  Linux's (and FreeBSD's) SMP has really been mostly ASMP,
with a little bit of fine grain locking in the corners.

[*] Solbourne, for those of you that don't know, made sparc servers
(and one workstation) several years ago.  They were SMP years before
Sun managed to ship SMP support in Solrais.  Many of my SMP "gut
feelings" were developed while working there.

:     The real question is how to manage concurrency as pieces get moved outside
:     the lock.  There are lots of ways to do it.   One can use spin locks to
:     protect resources or, as someone pointed out earlier, to protect sections
:     of code.  I don't know which is better myself, it probably depends on the
:     situation so a hybrid will probably be the end result.  One can also use
:     kernel threads to simplify resource management.  The advantage of a 
:     kernel thread verses a normal process is in the ability to switch between
:     kernel threads very quickly, allowing the time normally wasted spining in
:     certain types of locks to be used more efficiently.  

Solaris wound up using mutexes, condition variables, and semaphores to
accomplish this.  I don't know the exact details of what they did on a
resource stall, however.  The ddk tended to discourage exploration of
this.  I believe it was simply the thread stalled and another thread
were allowed to run.  I don't know how the scheduler itself was
protected.  Given that you have a threading kernel, making it SMP safe 
is generally fairly easy, modulo locking issues.

The biggest area that both Solbourne, VMS and Solaris had in their
early versions were making sure that deadlock didn't happen.  Locks
were always a real SOB to get right, and generally the cause of all
kinds of problems.  When I was testing Solbourne OS/MP 4.0C, I'd say
that 95% of the difficult to reproduce problems turned out to be
locking related and 60% of the easily reproducible were locking
related.  The years may have colored my rememberences of the
percentages and the version numbers for OS/MP, but it is the one thing 
that stands out in my mind accross the 9 years it has been since I was 
doing that.

Warner


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message