From owner-freebsd-smp  Sun Jul  2 18:52:41 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by hub.freebsd.org (Postfix) with ESMTP id 8232237BEB3
	for <freebsd-smp@freebsd.org>; Sun,  2 Jul 2000 18:52:28 -0700 (PDT)
	(envelope-from grog@wantadilla.lemis.com)
Received: (from grog@localhost)
	by wantadilla.lemis.com (8.9.3/8.9.3) id LAA62040;
	Mon, 3 Jul 2000 11:22:03 +0930 (CST)
	(envelope-from grog)
Date: Mon, 3 Jul 2000 11:22:03 +0930
From: Greg Lehey <grog@lemis.com>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Chuck Paterson <cp@bsdi.com>, David Greenman <dg@root.com>,
	freebsd-smp@freebsd.org
Subject: SMP progress (was: Stepping on Toes)
Message-ID: <20000703112203.B61851@wantadilla.lemis.com>
References: <200006261650.KAA17801@berserker.bsdi.com> <200006271742.KAA35851@apollo.backplane.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <200006271742.KAA35851@apollo.backplane.com>
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tuesday, 27 June 2000 at 10:42:20 -0700, Matthew Dillon wrote:
>
>>>    Even with interrupt threads we have the GiantMutex issue... the same
>>>    issue that we have with our current MP implementation.  We cannot remove
>>>    SPL's until we remove the GiantMutex, and we cannot remove GiantMutex
>>>    without major modifications to just about every single source file in sys/
>>
>> In general this isn't true. If you get to the point where
>>
>> 1)	All entrance to unsafe code is proteced by Giant.
>> 2)	Tsleep and friend if any release Giant when they
>>	a process is suspended and re-acquire it on exit
>> 3)	Interrupts have a context to run in.
>> 4)	You have one or more scheduling locks.
>>
>> Then you can just turn spls into a nop. There is lots of hand waving
>> in regards to details at this point. BSD/OS SMPng is the existance
>> proof.
>>
>> It seems like one of the major problems in retaining spls during
>> the change over period is that they don't much useful, and effectively
>> push everything under Giant.
>>
>>    Grabbing a spl will only block interrupts, it will not give
>>    any protection against an interrupt thread which has already
>>    started.
>>
>> This means that any device which might be blocked by splbio() can
>> not be brought out from under the Giant lock until all instances
>> of splbio have been removed.
>>
>> Chuck
>
>     Yes, I see it.  I agree.  You don't even need to hold a scheduling
>     lock... all you need to hold is Giant.
>
>     #1 - done
>     #2 - done
>     #3 - (Greg)
>     #4 - not required

I don't understand #4.  I thought you had done this.

>     Right this moment the requirement is that only someone holding Giant
>     is allowed to mess with spl*()'s (the cpl variable can only be messed
>     with by people holding Giant).
>
>     At this moment, without interrupt threads, interrupts can share Giant
>     with the curproc they interrupted.  This is how our existing MP stuff
>     worked already.
>
>     When Greg moves interrupts to their own threads, and obtains Giant to
>     run those interrupts, no more sharing will occur and just the fact
>     that the interrupt is holding Giant guarentees that nobody else will
>     be messing with SPLs, thus the SPLs can be removed entirely.

Agreed.  I'm in the process of implementing the heavy-weight interrupt
processes now.  I've just taken a look at your web page and note that
the URL no longer exists; in conjunction with the discussion above,
I'm no longer sure how far you are.  Are you importing the BSD/OS code
now?

We should probably take the rest of this offline, but I wanted to
discuss how we do things.  My idea is:

1.  You import the BSD/OS mutexes.
2.  I import/implement the heavy-weight interrupt code, which I will
    endeavour to get working relatively reliably.  This should be a
    fallback while I break^H^H^H^H^Himplement light-wait interrupt
    threads.
3.  You and I test our stuff together until it can stay up for an hour
    or so (exact time to be determined by Jason, who'll be carrying
    the can).
4.  We commit the marginally stable stuff.
5.  I carry on working on the light-weight threads.

Any comments?

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Sun Jul  2 19:16:10 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by hub.freebsd.org (Postfix) with ESMTP id F09B637BECD
	for <smp@FreeBSD.ORG>; Sun,  2 Jul 2000 19:16:02 -0700 (PDT)
	(envelope-from grog@wantadilla.lemis.com)
Received: (from grog@localhost)
	by wantadilla.lemis.com (8.9.3/8.9.3) id LAA62161;
	Mon, 3 Jul 2000 11:45:35 +0930 (CST)
	(envelope-from grog)
Date: Mon, 3 Jul 2000 11:45:35 +0930
From: Greg Lehey <grog@lemis.com>
To: Daniel Eischen <eischen@vigrid.com>
Cc: Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
Message-ID: <20000703114535.T39024@wantadilla.lemis.com>
References: <20000626151441.L8965@blitz.canonware.com> <Pine.SUN.3.91.1000626193709.15096A-100000@pcnet1.pcnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <Pine.SUN.3.91.1000626193709.15096A-100000@pcnet1.pcnet.com>
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Monday, 26 June 2000 at 20:00:09 -0400, Daniel Eischen wrote:
> On 26 Jun 2000, Jason Evans wrote:
>
>> On Mon, Jun 26, 2000 at 02:49:57PM -0700, Jason Evans wrote:
>>> On Mon, Jun 26, 2000 at 04:13:24PM -0400, Luoqi Chen wrote:
>>>>> Processes that block on a mutex are granted the lock in FIFO order, rather
>>>>> than priority order.  In order to avoid priority inversion, the mutex wait
>>>>> queue implements priority lending.
>>>>>
>>>> Ok. I remember I have read somewhere that solaris 7 has given up the behavior
>>>> of waking up only one thread after a mutex is released, now it wakes up all
>>>> the blocking threads. It seems that the "thundering herd" problem is not
>>>> serious after all if the lock granuity is high enough.
>>>
>>> I don't think this is the case.
>>
>> Whoops.  The article is broken into two web pages, and the second page
>> states exactly what you said: as of Solaris 7, all waiting threads are
>> woken up.
>
> Yes, this confirms what Jim Mauro said in the Solaris Internals course
> at USENIX.  Since mutexes are held only for very small amounts of time
> and the kernel is sufficiently fine-grained, their was no advantage
> to calling wake_one() as opposed to wake_all().  Obviously with these
> semantics, the waiter with the highest priority should obtain the
> mutex.  At least that was my recollection...

I find this rather strange.  There can be many reasons to take a
mutex, and not all of them have to be fast.  Even in the case where
they are, it doesn't seem to be of any value to wake more processes
than can take the mutex.  From
http://www.sunworld.com/sunworldonline/swol-08-1999/swol-08-insidesolaris-2.html:

   Sun engineering coded the turnstile_wakeup() in Solaris 7 in a
   generic enough way so that a single thread wakeup could be
   executed, instead of all threads inevitably waking up
   together. Exhaustive testing under a variety of different loads has
   shown that, in practice, we very rarely end up with a large
   blocking chain of threads, and thus almost never run into the
   thundering herd problem. The wakeup-all implementation also solves
   some bit synchronization issues that make a wakeup-one scenario
   tricky.

This seems like a less honest way of saying "We couldn't figure out
how to avoid race conditions on wakeup, and so far nobody has been
able to point to a thundering herd".  I'd need some conviction.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3  3:24:18 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id 9607F37C048
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 03:24:10 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id GAA06569;
	Mon, 3 Jul 2000 06:23:28 -0400 (EDT)
Date: Mon, 3 Jul 2000 06:23:28 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
To: Greg Lehey <grog@lemis.com>
Cc: Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
In-Reply-To: <20000703114535.T39024@wantadilla.lemis.com>
Message-ID: <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Mon, 3 Jul 2000, Greg Lehey wrote:
> On Monday, 26 June 2000 at 20:00:09 -0400, Daniel Eischen wrote:
> > Yes, this confirms what Jim Mauro said in the Solaris Internals course
> > at USENIX.  Since mutexes are held only for very small amounts of time
> > and the kernel is sufficiently fine-grained, their was no advantage
> > to calling wake_one() as opposed to wake_all().  Obviously with these
> > semantics, the waiter with the highest priority should obtain the
> > mutex.  At least that was my recollection...
> 
> I find this rather strange.  There can be many reasons to take a
> mutex, and not all of them have to be fast.  Even in the case where
> they are, it doesn't seem to be of any value to wake more processes
> than can take the mutex.  From
> http://www.sunworld.com/sunworldonline/swol-08-1999/swol-08-insidesolaris-2.html:
> 
>    Sun engineering coded the turnstile_wakeup() in Solaris 7 in a
>    generic enough way so that a single thread wakeup could be
>    executed, instead of all threads inevitably waking up
>    together. Exhaustive testing under a variety of different loads has
>    shown that, in practice, we very rarely end up with a large
>    blocking chain of threads, and thus almost never run into the
>    thundering herd problem. The wakeup-all implementation also solves
>    some bit synchronization issues that make a wakeup-one scenario
>    tricky.
> 
> This seems like a less honest way of saying "We couldn't figure out
> how to avoid race conditions on wakeup, and so far nobody has been
> able to point to a thundering herd".  I'd need some conviction.

Well if you are considering spinning for a bit of time on a held
mutex (which you seem to advocate?), then why not wake everyone?
If mutexes are held for very short periods of time and you don't
often have a thundering herd problem, then waking everyone is
an optimization since you only have to take the scheduling lock
once.  If mutexes can be held for long periods of time, then you
probably wouldn't want to wake everyone.

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3  3:31:20 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by hub.freebsd.org (Postfix) with ESMTP id F23EB37C05A
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 03:31:07 -0700 (PDT)
	(envelope-from grog@wantadilla.lemis.com)
Received: (from grog@localhost)
	by wantadilla.lemis.com (8.9.3/8.9.3) id UAA63751;
	Mon, 3 Jul 2000 20:00:39 +0930 (CST)
	(envelope-from grog)
Date: Mon, 3 Jul 2000 20:00:39 +0930
From: Greg Lehey <grog@lemis.com>
To: Daniel Eischen <eischen@vigrid.com>
Cc: Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
Message-ID: <20000703200039.H62680@wantadilla.lemis.com>
References: <20000703114535.T39024@wantadilla.lemis.com> <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com>
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Monday,  3 July 2000 at  6:23:28 -0400, Daniel Eischen wrote:
> On Mon, 3 Jul 2000, Greg Lehey wrote:
>> On Monday, 26 June 2000 at 20:00:09 -0400, Daniel Eischen wrote:
>>> Yes, this confirms what Jim Mauro said in the Solaris Internals course
>>> at USENIX.  Since mutexes are held only for very small amounts of time
>>> and the kernel is sufficiently fine-grained, their was no advantage
>>> to calling wake_one() as opposed to wake_all().  Obviously with these
>>> semantics, the waiter with the highest priority should obtain the
>>> mutex.  At least that was my recollection...
>>
>> I find this rather strange.  There can be many reasons to take a
>> mutex, and not all of them have to be fast.  Even in the case where
>> they are, it doesn't seem to be of any value to wake more processes
>> than can take the mutex.  From
>> http://www.sunworld.com/sunworldonline/swol-08-1999/swol-08-insidesolaris-2.html:
>>
>>    Sun engineering coded the turnstile_wakeup() in Solaris 7 in a
>>    generic enough way so that a single thread wakeup could be
>>    executed, instead of all threads inevitably waking up
>>    together. Exhaustive testing under a variety of different loads has
>>    shown that, in practice, we very rarely end up with a large
>>    blocking chain of threads, and thus almost never run into the
>>    thundering herd problem. The wakeup-all implementation also solves
>>    some bit synchronization issues that make a wakeup-one scenario
>>    tricky.
>>
>> This seems like a less honest way of saying "We couldn't figure out
>> how to avoid race conditions on wakeup, and so far nobody has been
>> able to point to a thundering herd".  I'd need some conviction.
>
> Well if you are considering spinning for a bit of time on a held
> mutex (which you seem to advocate?), then why not wake everyone?

Because it doesn't buy us anything.

> If mutexes are held for very short periods of time and you don't
> often have a thundering herd problem,

That's an assumption.  So far we have *never* had a thundering herd,
because the code don't work yet.

> then waking everyone is an optimization since you only have to take
> the scheduling lock once.

No.  If I understand things correctly, each process would need to get
the schedlock, and only one process can get the mutex.  Why wake the
rest?  What do you want them to do?  This applies even in the case of
a counting semaphore (of which our "mutex" is a special case), since
if any slots are available, the process wouldn't be sleeping.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3  7:55:49 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from cypherpunks.ai (cypherpunks.ai [209.88.68.47])
	by hub.freebsd.org (Postfix) with ESMTP id 4367037BA07
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 07:55:46 -0700 (PDT)
	(envelope-from jeroen@vangelderen.org)
Received: from vangelderen.org (grolsch.ai [209.88.68.214])
	by cypherpunks.ai (Postfix) with ESMTP
	id 4EA4E6D; Mon,  3 Jul 2000 10:55:45 -0400 (AST)
Message-ID: <3960A971.982DDF07@vangelderen.org>
Date: Mon, 03 Jul 2000 10:55:45 -0400
From: "Jeroen C. van Gelderen" <jeroen@vangelderen.org>
X-Mailer: Mozilla 4.72 [en] (X11; I; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Greg Lehey <grog@lemis.com>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
References: <20000703114535.T39024@wantadilla.lemis.com> <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com> <20000703200039.H62680@wantadilla.lemis.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Greg Lehey wrote:
[...]
> That's an assumption.  So far we have *never* had a thundering herd,
> because the code don't work yet.

Your position is an assumption too. The difference is that 
one usually doesn't optimize until one has profiling 
information available. Am I correct in assuming that you
haven't done any profiling yet? Am I correct in assuming
that wake_one is an optimization?

> > then waking everyone is an optimization since you only have to take
> > the scheduling lock once.
> 
> No.  If I understand things correctly, each process would need to get
> the schedlock, and only one process can get the mutex.  Why wake the
> rest?  What do you want them to do?  

If -on average- there is only one process waiting you don't 
want to go trough the trouble of implementing a more complex
wake_one. It would only complicate the code with negligible
gain. 

That's my reading of Sun's claims in Solaris and given that 
they have a little more experience with this kind of thing 
I'm inclined to believe them until I see facts stating the 
contrary.

Cheers,
Jeroen
-- 
Jeroen C. van Gelderen          o      _     _         _
jeroen@vangelderen.org  _o     /\_   _ \\o  (_)\__/o  (_)
                      _< \_   _>(_) (_)/<_    \_| \   _|/' \/
                     (_)>(_) (_)        (_)   (_)    (_)'  _\o_


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3  8:29:36 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id 89C8737B7CD
	for <smp@freebsd.org>; Mon,  3 Jul 2000 08:29:31 -0700 (PDT)
	(envelope-from cp@berserker.bsdi.com)
Received: from berserker.bsdi.com (cp@localhost [127.0.0.1])
	by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id JAA26798;
	Mon, 3 Jul 2000 09:28:34 -0600 (MDT)
Message-Id: <200007031528.JAA26798@berserker.bsdi.com>
To: Daniel Eischen <eischen@vigrid.com>
Cc: Greg Lehey <grog@lemis.com>, Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@freebsd.org
Subject: Re: SMP meeting summary 
From: Chuck Paterson <cp@bsdi.com>
Date: Mon, 03 Jul 2000 09:28:34 -0600
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


	As someone else already pointed out once the OS starts to
run there is a whole lot of tuning that needs to go on to choose
a mix of compromises that works reasonable with the "general" work
load. There have been lots of good suggestions made, which need
to be considered once we have something up and running and have
accumulated some data. 


}Well if you are considering spinning for a bit of time on a held
}mutex (which you seem to advocate?), then why not wake everyone?
}If mutexes are held for very short periods of time and you don't
}often have a thundering herd problem, then waking everyone is
}an optimization since you only have to take the scheduling lock
}once.  If mutexes can be held for long periods of time, then you
}probably wouldn't want to wake everyone.
}
}-- 
}Dan Eischen


	If all processes are made runnable at once then both future
releases and acquisitions of the mutex may be uncontested, resulting
in not having to acquire the scheduling lock. If the system is busy
and there are not idle CPUs then there won't be a thundering herd,
because there is no herd to thunder. The probability of threads
blocking on the mutex before it is released is a function of mutex
hold time to the time it takes a processor to calling switch with
the thread which wants to run being the highest priority. In
general mutex hold time is small compared to the time a process
runs.

	Only a single free processor is required to cause a problem
when priority inversion has occurred and multiple threads are waiting
on the mutex. Both the processor doing the release and the free
processor will be picking off the run queue and potentially picking
threads which want the same mutex.


	If someone wanted/needed to build a system where prioritization
is so important that processes are preempted even if they are on
another processor then making processes, which are just going to
block immediately, runnable is going to be very bad. This too can
most likely be solved with only a single added test in
the path of a contested release.


	In general there ought not to be multiple processes piling
up on a mutex. If there are and for some reason they can't be
fixed then these particular mutexs are going to dictate how this
area is handled. Once we have these cases in hand we can make
some decisions as to how to proceed.


Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3  8:35:59 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from ipamzlx.physik.uni-mainz.de (ipamzlx.Physik.Uni-Mainz.DE [134.93.180.54])
	by hub.freebsd.org (Postfix) with ESMTP id B49F537B732
	for <smp@freebsd.org>; Mon,  3 Jul 2000 08:35:51 -0700 (PDT)
	(envelope-from ohartman@ipamzlx.physik.uni-mainz.de)
Received: from ipamzlx.Physik.Uni-Mainz.DE (ipamzlx.Physik.Uni-Mainz.DE [134.93.180.54])
	by ipamzlx.physik.uni-mainz.de (8.9.3/8.9.3) with ESMTP id RAA08401
	for <smp@freebsd.org>; Mon, 3 Jul 2000 17:37:38 +0200 (CEST)
	(envelope-from ohartman@ipamzlx.physik.uni-mainz.de)
Date: Mon, 3 Jul 2000 17:37:38 +0200 (CEST)
From: "O. Hartmann" <ohartman@ipamzlx.physik.uni-mainz.de>
To: smp@freebsd.org
Subject: SMP Problems on ALR QSMP
Message-ID: <Pine.BSF.4.10.10007031723200.8239-100000@ipamzlx.physik.uni-mainz.de>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Dear Sirs.
I posted prior some questions concerning problems with a ALR QSMP
aka Siemens PCE-5Smp System. This machine has 256 MB RAM and three
P100 CPU boards but I cannot activate all three CPUs because while
trying to start the slave CPUs the system crashes. This crash seems 
to be very courios. When booting in SMP mode, kernel tries to
start slave CPU #1 but reports that something failed and asks for 
going on or halt. When typing for going on, the same procedure occurs 
for the next CPU. When typing for going on, kernel tries to boot the normal way,
but when initializing EISA/ISA interrupts on EISA bus, it freezes
completely. It seems that there is something wrong with Interrupt handler
or anything else, it is courios to me that the kernel is capable of trying
to boot each CPU, failing and trying the next one and then crashing by
freezing. I'm now for the past 5 months on with this problem, but no
glue why it occurs and how it could be fixed. With only one CPU the machine runs fine, but
slow.

Here is what mptable says:

===============================================================================

MPTable, version 2.0.15

-------------------------------------------------------------------------------

MP Floating Pointer Structure:

  location:                     EBDA
  physical address:             0x0009f0a0
  signature:                    '_MP_'
  length:                       16 bytes
  version:                      1.1
  checksum:                     0x75
  mode:                         PIC

-------------------------------------------------------------------------------

MP Config Table Header:

  physical address:             0x0009f0b5
  signature:                    'PCMP'
  base table length:            256
  version:                      1.1
  checksum:                     0x60
  OEM ID:                       'ALR     '
  Product ID:                   'Revolut QSMP'
  OEM table pointer:            0x00000000
  OEM table size:               0
  entry count:                  22
  local APIC address:           0xfee00000
  extended table length:        0
  extended table checksum:      0

-------------------------------------------------------------------------------

MP Config Base Table Entries:

--
Processors:     APIC ID Version State           Family  Model   Step    Flags
                 0       0x 1    BSP, usable     5       2       5       0x0181
                 1       0x 1    AP, usable      5       2       5       0x0181
                 2       0x 1    AP, usable      5       2       5       0x0181
--
Bus:            Bus ID  Type
                 0       EISA
                 1       PCI
--
I/O APICs:      APIC ID Version State           Address
                 3       0x01    usable          0xfec00000
--
I/O Ints:       Type    Polarity    Trigger     Bus ID   IRQ    APIC ID PIN#
                ExtINT   conforms    conforms        0     0          3    0
                INT      conforms    conforms        0     1          3    1
                INT      conforms    conforms        0     3          3    3
                INT      conforms    conforms        0     4          3    4
                INT      conforms    conforms        0     5          3    5
                INT      conforms    conforms        0     6          3    6
                INT      conforms    conforms        0     7          3    7
                INT      conforms    conforms        0     8          3    8
                INT      conforms    conforms        0     9          3    9
                INT      conforms    conforms        0    10          3   10
                INT      conforms    conforms        0    11          3   11
                INT      conforms    conforms        0    12          3   12
                INT      conforms    conforms        0    14          3   14
                INT      conforms    conforms        0    15          3   15
--
Local Ints:     Type    Polarity    Trigger     Bus ID   IRQ    APIC ID PIN#
                ExtINT   conforms    conforms        0     0        255    0
                NMI      conforms    conforms        0     0        255    1

-------------------------------------------------------------------------------

# SMP kernel config file options:


# Required:
options         SMP                     # Symmetric MultiProcessor Kernel
options         APIC_IO                 # Symmetric (APIC) I/O

# Optional (built-in defaults will work in most cases):
#options                NCPU=3                  # number of CPUs
#options                NBUS=2                  # number of busses
#options                NAPIC=1                 # number of IO APICs
#options                NINTR=24                # number of INTs


This is what dmesg shows (while running in Single CPU mode:

Copyright (c) 1992-2000 The FreeBSD Project.
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California. All rights reserved.
FreeBSD 4.0-STABLE #7: Mon Jul  3 13:37:12 CEST 2000
    root@wotan.brainstorm-online.net:/usr/src/sys/compile/WOTAN
Timecounter "i8254"  frequency 1193150 Hz
Timecounter "TSC"  frequency 99998757 Hz
CPU: Pentium/P54C (100.00-MHz 586-class CPU)
  Origin = "GenuineIntel"  Id = 0x525  Stepping = 5
  Features=0x1bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8>
real memory  = 268435456 (262144K bytes)
avail memory = 257667072 (251628K bytes)
Preloaded elf kernel "kernel" at 0xc0364000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc036409c.
Intel Pentium detected, installing workaround for F00F bug
ccd0-3: Concatenated disk drivers
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
mlx0: <Mylex version 3 RAID interface> port 0x7000-0x707f mem 0x82000000-0x8200007f irq 15 at device 17.0 on pci0
mlx0: DAC960P/PD, 3 channels, firmware 3.51-0-12, 4MB RAM
mlxd0: <Mylex System Drive> on mlx0
mlxd0: 12288MB (25165824 sectors) RAID 5 (online)
de0: <Digital 21142 Fast Ethernet> port 0x7080-0x70ff mem 0x82000080-0x820000ff irq 14 at device 18.0 on pci0
de0: 21142 [10-100Mb/s] pass 1.1
de0: address 00:80:ad:b6:0f:f6
de1: <Digital 21142 Fast Ethernet> port 0x7400-0x747f mem 0x82000100-0x8200017f irq 9 at device 19.0 on pci0
de1: 21142 [10-100Mb/s] pass 1.1
de1: address 00:80:ad:b6:0f:ea
pci0: <Tseng Labs ET4000 W32P graphics accelerator> at 20.0 irq 15
eisa0: <EISA bus> on motherboard
mainboard0: <ALRa341 (System Board)> on eisa0 slot 0
isa0: <ISA bus> on motherboard
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
vga0: <Generic ISA VGA> at port 0x3b0-0x3df iomem 0xa0000-0xbffff on isa0
fb0 at vga0
sc0: <System console> on isa0
sc0: VGA <16 virtual consoles, flags=0x200>
aic0: <Adaptec 6260/6360 SCSI controller> at port 0x340-0x35f irq 11 on isa0
aic0: aic6360, dma, disconnection, parity check
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 flags 0x10 on isa0
sio1: type 16550A
ppc0: <Parallel port> at port 0x378-0x37f irq 7 drq 1 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
IP packet filtering initialized, divert enabled, rule-based forwarding enabled, default to deny, unlimited logging
DUMMYNET initialized (000608)
BRIDGE 990810, have 3 interfaces
-- index 1  type 6 phy 0 addrl 6 addr 00.80.ad.b6.0f.f6
-- index 2  type 6 phy 0 addrl 6 addr 00.80.ad.b6.0f.ea
IPsec: Initialized Security Association Processing.
Waiting 5 seconds for SCSI devices to settle
de0: enabling 10baseT port
de1: enabling 10baseT port
sa0 at aic0 bus 0 target 3 lun 0
sa0: <HP C1533A 9503> Removable Sequential Access SCSI-2 device
sa0: 5.000MB/s transfers (5.000MHz, offset 8)
no devsw (majdev=0 bootdev=0xa0200000)
Mounting root from ufs:/dev/mlxd0s1a
cd0 at aic0 bus 0 target 5 lun 0
cd0: <TOSHIBA CD-ROM XM-3501TA 1875> Removable CD-ROM SCSI-2 device
cd0: 4.237MB/s transfers (4.237MHz, offset 8)
cd0: Attempt to query device size failed: NOT READY, Medium not present
cd1 at aic0 bus 0 target 6 lun 0
cd1: <TOSHIBA CD-ROM XM-3501TA 1875> Removable CD-ROM SCSI-2 device
cd1: 4.237MB/s transfers (4.237MHz, offset 8)
cd1: Attempt to query device size failed: NOT READY, Medium not present


I hope this is useful ... many thanks in advance,

Gruss O. Hartmann
-------------------------------------------------------------------
ohartman@ipamzlx.physik.uni-mainz.de

Klimadatenserver des IPA, Universitaet Mainz
Netzwerk- und Systembetreuung


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3  8:42:10 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from magnesium.net (toxic.magnesium.net [207.154.84.15])
	by hub.freebsd.org (Postfix) with SMTP id 482D237B885
	for <freebsd-smp@freebsd.org>; Mon,  3 Jul 2000 08:42:04 -0700 (PDT)
	(envelope-from jasone@magnesium.net)
Received: (qmail 1539 invoked by uid 1142); 3 Jul 2000 15:42:01 -0000
Date: 3 Jul 2000 08:42:01 -0700
Date: Mon, 3 Jul 2000 08:41:31 -0700
From: Jason Evans <jasone@canonware.com>
To: Greg Lehey <grog@lemis.com>
Cc: Matthew Dillon <dillon@apollo.backplane.com>,
	Jake Burkholder <jake@freebsd.org>, freebsd-smp@freebsd.org
Subject: Re: SMP progress (was: Stepping on Toes)
Message-ID: <20000703084130.D826@blitz.canonware.com>
References: <200006261650.KAA17801@berserker.bsdi.com> <200006271742.KAA35851@apollo.backplane.com> <20000703112203.B61851@wantadilla.lemis.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0.1i
In-Reply-To: <20000703112203.B61851@wantadilla.lemis.com>; from grog@lemis.com on Mon, Jul 03, 2000 at 11:22:03AM +0930
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Mon, Jul 03, 2000 at 11:22:03AM +0930, Greg Lehey wrote:
> Agreed.  I'm in the process of implementing the heavy-weight interrupt
> processes now.  I've just taken a look at your web page and note that
> the URL no longer exists; in conjunction with the discussion above,
> I'm no longer sure how far you are.  Are you importing the BSD/OS code
> now?
> 
> We should probably take the rest of this offline, but I wanted to
> discuss how we do things.  My idea is:
> 
> 1.  You import the BSD/OS mutexes.

Jake Burkholder offered to port the mutex code, and I discussed this with
Matt last week, who had no problems with Jake doing it.  As of last night,
it sounds like Jake essentially has this done for i386, and Doug Rabson
will be following soon with the alpha bits.  Jake's patch set also includes
the pertinent parts of Matt's work (per-CPU idle processes, some of the
schedlock changes, etc.).  I'll be adding a link to Jake's most recent
patch set on the web page (http://people.freebsd.org/~jasone/smp/) shortly.
Meanwhile, you can get Jake's patch set at:

http://people.freebsd.org/~jake/smpng2.tar

> 2.  I import/implement the heavy-weight interrupt code, which I will
>     endeavour to get working relatively reliably.  This should be a
>     fallback while I break^H^H^H^H^Himplement light-wait interrupt
>     threads.

Yep, the stage is set for this work to begin now.

> 3.  You and I test our stuff together until it can stay up for an hour
>     or so (exact time to be determined by Jason, who'll be carrying
>     the can).
> 4.  We commit the marginally stable stuff.

A successful buildworld would be a satisfactory test of stability in my
eyes.  Hopefully we can do that well. =)

Jason


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3  8:49: 3 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id 9AFCD37B88C
	for <smp@freebsd.org>; Mon,  3 Jul 2000 08:48:57 -0700 (PDT)
	(envelope-from cp@berserker.bsdi.com)
Received: from berserker.bsdi.com (cp@localhost [127.0.0.1])
	by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id JAA26972;
	Mon, 3 Jul 2000 09:47:55 -0600 (MDT)
Message-Id: <200007031547.JAA26972@berserker.bsdi.com>
To: "Jeroen C. van Gelderen" <jeroen@vangelderen.org>
Cc: Greg Lehey <grog@lemis.com>, Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@freebsd.org
Subject: Re: SMP meeting summary 
From: Chuck Paterson <cp@bsdi.com>
Date: Mon, 03 Jul 2000 09:47:54 -0600
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


}That's my reading of Sun's claims in Solaris and given that 
}they have a little more experience with this kind of thing 
}I'm inclined to believe them until I see facts stating the 
}contrary.


I would caution against using Solaris to draw too detailed conclusions.
The locking in Solaris is finer grained than we are likely to
achieve for some time. Also having per processor run queues and
all the associated machinery to support this makes Solaris characterize
quite different than what we have today. As time goes on we will
have to make decisions on the number of processors we want to
support most efficiently. The answer for our problem set may be
quite different than what Sun arrived for their problem set. 
While I have no specific knowledge that this is true, I would not
be surprised if the Solaris machine dependent implementation differs
between Sparc and X86 with only Sparc being reported.

Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3  9: 1:11 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from tinker.exit.com (exit-gw.power.net [207.151.46.196])
	by hub.freebsd.org (Postfix) with ESMTP id 5065737B876
	for <smp@freebsd.org>; Mon,  3 Jul 2000 09:01:08 -0700 (PDT)
	(envelope-from frank@exit.com)
Received: from realtime.exit.com (realtime.exit.com [206.223.0.5])
	by tinker.exit.com (8.9.3/8.9.3) with ESMTP id JAA53786
	for <smp@freebsd.org>; Mon, 3 Jul 2000 09:01:07 -0700 (PDT)
	(envelope-from frank@exit.com)
Received: (from frank@localhost)
	by realtime.exit.com (8.9.3/8.9.3) id JAA10893
	for smp@freebsd.org; Mon, 3 Jul 2000 09:01:06 -0700 (PDT)
	(envelope-from frank)
From: Frank Mayhar <frank@exit.com>
Message-Id: <200007031601.JAA10893@realtime.exit.com>
Subject: Re: SMP meeting summary
In-Reply-To: <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com> from
	Daniel Eischen at "Jul 3, 2000 06:23:28 am"
To: Daniel Eischen <eischen@vigrid.com>
Date: Mon, 3 Jul 2000 08:49:43 -0700 (PDT)
Cc: Greg Lehey <grog@lemis.com>, Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORGG
Reply-To: frank@exit.com
Organization: Exit Consulting
X-Copyright0: Copyright 2000 Frank Mayhar.  All Rights Reserved.
X-Copyright1: Permission granted for electronic reproduction as Usenet News or email only.
X-Mailer: ELM [version 2.4ME+ PL68 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Daniel Eischen wrote:
> Well if you are considering spinning for a bit of time on a held
> mutex (which you seem to advocate?), then why not wake everyone?
> If mutexes are held for very short periods of time and you don't
> often have a thundering herd problem, then waking everyone is
> an optimization since you only have to take the scheduling lock
> once.  If mutexes can be held for long periods of time, then you
> probably wouldn't want to wake everyone.

I'm going to use Daniel's email as a springboard for a strongly-held opinion.
I really, really wish that you guys would get away from using the term
"mutex" for both short- and long-term locks.  All a mutex is is a lock;
it says nothing about what kind of lock, whether it sleeps or spins or even
whether it's a reader/writer lock.  It's the most generic term available.

How about "spinlock" for short-term spin locks, "sleeplock" for long-term
blocking locks and "rwlock" for reader/writer locks (modified by whether
it's a spinlock or a sleeplock)?  That would add clarity, I think.  Maybe
it's my SVR4/Unixware experience, but I always have to figure out whether
someone is talking about a spinlock or a sleeplock from context.

To address Dan's real question, it depends on whether, after spinning for
a short time, the spinlock decides to become a sleeplock (in other words,
it's a hybrid).  If it sleeps, depending on the implementation there could
be a thundering herd problem, since you have no idea how long it's going
to sleep or how many threads/processes will be on the chain.  Basically,
all a hybrid lock is is an optimization of a sleeplock that speeds up the
(possibly common) case of a sleeplock being held for a short time relative
to the waiting process.

(Any errors here are my own and will undoubtedly be promptly corrected by
the audience.  :-)
-- 
Frank Mayhar frank@exit.com	http://www.exit.com/
Exit Consulting                 http://store.exit.com/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 13:15:18 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from io.yi.org (24.67.218.186.bc.wave.home.com [24.67.218.186])
	by hub.freebsd.org (Postfix) with ESMTP id 3A2DB37C0A0
	for <freebsd-smp@freebsd.org>; Mon,  3 Jul 2000 13:15:05 -0700 (PDT)
	(envelope-from jburkhol@home.com)
Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1])
	by io.yi.org (Postfix) with ESMTP
	id 28BA3BA4E; Mon,  3 Jul 2000 13:15:16 -0700 (PDT)
X-Mailer: exmh version 2.1.1 10/15/1999
To: Jason Evans <jasone@canonware.com>
Cc: Greg Lehey <grog@lemis.com>,
	Matthew Dillon <dillon@apollo.backplane.com>, freebsd-smp@freebsd.org
Subject: Re: SMP progress (was: Stepping on Toes) 
In-Reply-To: Message from Jason Evans <jasone@canonware.com> 
   of "Mon, 03 Jul 2000 08:41:31 PDT." <20000703084130.D826@blitz.canonware.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 03 Jul 2000 13:15:16 -0700
From: Jake Burkholder <jburkhol@home.com>
Message-Id: <20000703201516.28BA3BA4E@io.yi.org>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> On Mon, Jul 03, 2000 at 11:22:03AM +0930, Greg Lehey wrote:
> > Agreed.  I'm in the process of implementing the heavy-weight interrupt
> > processes now.  I've just taken a look at your web page and note that
> > the URL no longer exists; in conjunction with the discussion above,
> > I'm no longer sure how far you are.  Are you importing the BSD/OS code
> > now?
> > 
> > We should probably take the rest of this offline, but I wanted to
> > discuss how we do things.  My idea is:
> > 
> > 1.  You import the BSD/OS mutexes.
> 
> Jake Burkholder offered to port the mutex code, and I discussed this with
> Matt last week, who had no problems with Jake doing it.  As of last night,
> it sounds like Jake essentially has this done for i386, and Doug Rabson
> will be following soon with the alpha bits.  Jake's patch set also includes
> the pertinent parts of Matt's work (per-CPU idle processes, some of the
> schedlock changes, etc.).  I'll be adding a link to Jake's most recent
> patch set on the web page (http://people.freebsd.org/~jasone/smp/) shortly.
> Meanwhile, you can get Jake's patch set at:
> 
> http://people.freebsd.org/~jake/smpng2.tar

I've just updated this to this mornings -current and added my kernel
config.

The biggest change is that cpu_switch() no longer disable or enable
interrupts directly.  Its taken care of by sched_lock since BSD/OS
spin mutexes enable and disable interrupts on the first and last
release.  Protecting the run queues is not really necessary right
now, but its a step in the right direction.

I haven't dealt with the mp_lock, but I've had this patch running on
my UP box for a while, building kernels etc.

I think we'll want to make INVARIANTS, INVARIANT_SUPPORT, DIAGNOSTIC,
and SMP_DEBUG on by default in -current for a while at least.  Every
assertion helps.  WITNESS currently isn't doing anything because the
sched_lock is ignored, but we'll likely want that by default too.

Jake

> 
> > 2.  I import/implement the heavy-weight interrupt code, which I will
> >     endeavour to get working relatively reliably.  This should be a
> >     fallback while I break^H^H^H^H^Himplement light-wait interrupt
> >     threads.
> 
> Yep, the stage is set for this work to begin now.
> 
> > 3.  You and I test our stuff together until it can stay up for an hour
> >     or so (exact time to be determined by Jason, who'll be carrying
> >     the can).
> > 4.  We commit the marginally stable stuff.
> 
> A successful buildworld would be a satisfactory test of stability in my
> eyes.  Hopefully we can do that well. =)
> 
> Jason


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 16: 9:44 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by hub.freebsd.org (Postfix) with ESMTP id C716737C0E3
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 16:09:36 -0700 (PDT)
	(envelope-from grog@wantadilla.lemis.com)
Received: (from grog@localhost)
	by wantadilla.lemis.com (8.9.3/8.9.3) id IAA66554;
	Tue, 4 Jul 2000 08:38:23 +0930 (CST)
	(envelope-from grog)
Date: Tue, 4 Jul 2000 08:38:22 +0930
From: Greg Lehey <grog@lemis.com>
To: "Jeroen C. van Gelderen" <jeroen@vangelderen.org>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
Message-ID: <20000704083822.A65029@wantadilla.lemis.com>
References: <20000703114535.T39024@wantadilla.lemis.com> <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <3960A971.982DDF07@vangelderen.org>
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Monday,  3 July 2000 at 10:55:45 -0400, Jeroen C. van Gelderen wrote:
> Greg Lehey wrote:
> [...]
>> That's an assumption.  So far we have *never* had a thundering herd,
>> because the code don't work yet.
>
> Your position is an assumption too. The difference is that
> one usually doesn't optimize until one has profiling
> information available. Am I correct in assuming that you
> haven't done any profiling yet? Am I correct in assuming
> that wake_one is an optimization?

You're not correct in your implied assumption that we can see any
potential problems with wake_one.

>>> then waking everyone is an optimization since you only have to take
>>> the scheduling lock once.
>>
>> No.  If I understand things correctly, each process would need to get
>> the schedlock, and only one process can get the mutex.  Why wake the
>> rest?  What do you want them to do?
>
> If -on average- there is only one process waiting you don't
> want to go trough the trouble of implementing a more complex
> wake_one. It would only complicate the code with negligible
> gain.

There's nothing to say that wake_one is more complex.  wake_one takes
the first process on the mutex's sleep list and wakes it.  wake_all
(or whatever) would make a loop out of that wake function and wake all
the processes on the list.  All would then be scheduled, try to take
the mutex, and all except one would fail and be put back on the sleep
list.  Does this make sense?

> That's my reading of Sun's claims in Solaris and given that they
> have a little more experience with this kind of thing I'm inclined
> to believe them until I see facts stating the contrary.

Sun's problem with Solaris is non-obvious, and may not bite us.

I think we should hold off with this kind of discussion for the while.
Everything I can see suggests that it's crazy to wake all processes.
If we find that we run into race conditions which we can only solve
with wake_all, though, we'll compare the effort in fixing them with
the (undoubted) performance degradation caused by waking them all.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 16:36:21 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from mail.enteract.com (mail.enteract.com [207.229.143.33])
	by hub.freebsd.org (Postfix) with ESMTP id 2C43E37BA40
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 16:36:18 -0700 (PDT)
	(envelope-from dscheidt@enteract.com)
Received: from shell-2.enteract.com (dscheidt@shell-2.enteract.com [207.229.143.41])
	by mail.enteract.com (8.9.3/8.9.3) with SMTP id SAA49983;
	Mon, 3 Jul 2000 18:35:55 -0500 (CDT)
	(envelope-from dscheidt@enteract.com)
Date: Mon, 3 Jul 2000 18:35:55 -0500 (CDT)
From: David Scheidt <dscheidt@enteract.com>
To: Greg Lehey <grog@lemis.com>
Cc: "Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
In-Reply-To: <20000704083822.A65029@wantadilla.lemis.com>
Message-ID: <Pine.NEB.3.96.1000703182157.14851A-100000@shell-2.enteract.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tue, 4 Jul 2000, Greg Lehey wrote:

:
:There's nothing to say that wake_one is more complex.  wake_one takes
:the first process on the mutex's sleep list and wakes it.  wake_all
:(or whatever) would make a loop out of that wake function and wake all
:the processes on the list.  All would then be scheduled, try to take
:the mutex, and all except one would fail and be put back on the sleep
:list.  Does this make sense?

With a wake_one function, you need to be much more careful to avoid priority
inversion, and all sorts of other potential races.  Solaris's locks are very
fine grained, and it wouldn't suprise me at all if their average case was to
wake only one process.  Under that case, you get a wake_all that performs
very much like wake_one, and you get to avoid the overhead of having to sort
a sleep queue, or the like.  There may be a slight performance penality, but
a whole class of deadlock is removed, which makes it easier to produce
correct code.  In many cases, I'll take a performance hit for availability.

It's quite likely, perhaps even certain, that FreeBSD isn't going to be a
position where the sleep queues average length is close enough to one for
this to be a viable approch in the medium-term.  (I haven't had a chance to
thourghly understand what the SMP road map looks like)  If that is the case,
then wake_one would be a win for FreeBSD.  


David Scheidt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 16:48: 5 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from mailhost.iprg.nokia.com (mailhost.iprg.nokia.com [205.226.5.12])
	by hub.freebsd.org (Postfix) with ESMTP id 4F90337BA9D
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 16:48:01 -0700 (PDT)
	(envelope-from jre@iprg.nokia.com)
Received: from darkstar.iprg.nokia.com (darkstar.iprg.nokia.com [205.226.5.69])
	by mailhost.iprg.nokia.com (8.9.3/8.9.3-GLGS) with ESMTP id QAA17821;
	Mon, 3 Jul 2000 16:47:59 -0700 (PDT)
Received: (from mail@localhost)
	by darkstar.iprg.nokia.com (8.9.3/8.9.3-VIRSCAN) id QAA05600;
	Mon, 3 Jul 2000 16:47:57 -0700
X-Virus-Scanned:  Mon, 3 Jul 2000 16:47:57 -0700 Nokia Silicon Valley Email Exploit Scanner
Received: from UNKNOWN (205.226.1.150, claiming to be "iprg.nokia.com")
	by darkstar with SMTP id smtpdoKR9wT; Mon, 03 Jul 2000 16:47:52 PDT
Message-ID: <39612626.3E3AE2C4@iprg.nokia.com>
Date: Mon, 03 Jul 2000 16:47:51 -0700
From: Joe Eykholt <jre@iprg.nokia.com>
Organization: Nokia IPRG
X-Mailer: Mozilla 4.7 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Greg Lehey <grog@lemis.com>
Cc: "Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
References: <20000703114535.T39024@wantadilla.lemis.com> <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Greg Lehey wrote:
 
> There's nothing to say that wake_one is more complex.  wake_one takes
> the first process on the mutex's sleep list and wakes it.  wake_all
> (or whatever) would make a loop out of that wake function and wake all
> the processes on the list.  All would then be scheduled, try to take
> the mutex, and all except one would fail and be put back on the sleep
> list.  Does this make sense?

With adaptive mutexes, the threads which are woken will either run one 
serially on one CPU, or some run at the same time on multiple CPUs.  
In that case, one gets the lock right away, and the rest SPIN on it 
(as long as the new owner doesn't get suspended on something else).  
They don't necessarily go back to sleep on that same lock.
  
I agree it's too early to talk about this degree of optimization, though.

	Joe Eykholt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 16:55:59 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by hub.freebsd.org (Postfix) with ESMTP id 8E7BA37BA9D
	for <smp@freebsd.org>; Mon,  3 Jul 2000 16:55:47 -0700 (PDT)
	(envelope-from grog@wantadilla.lemis.com)
Received: (from grog@localhost)
	by wantadilla.lemis.com (8.9.3/8.9.3) id JAA93985;
	Tue, 4 Jul 2000 09:22:45 +0930 (CST)
	(envelope-from grog)
Date: Tue, 4 Jul 2000 09:22:45 +0930
From: Greg Lehey <grog@lemis.com>
To: Chuck Paterson <cp@bsdi.com>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@freebsd.org
Subject: Re: SMP meeting summary
Message-ID: <20000704092245.B65029@wantadilla.lemis.com>
References: <200007031528.JAA26798@berserker.bsdi.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <200007031528.JAA26798@berserker.bsdi.com>
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Monday,  3 July 2000 at  9:28:34 -0600, Chuck Paterson wrote:
>> Well if you are considering spinning for a bit of time on a held
>> mutex (which you seem to advocate?), then why not wake everyone?
>> If mutexes are held for very short periods of time and you don't
>> often have a thundering herd problem, then waking everyone is
>> an optimization since you only have to take the scheduling lock
>> once.  If mutexes can be held for long periods of time, then you
>> probably wouldn't want to wake everyone.
>>
>> Dan Eischen
>
> 	If all processes are made runnable at once then both future
> releases and acquisitions of the mutex may be uncontested, resulting
> in not having to acquire the scheduling lock.

I'm not sure we're talking about the same thing, but if so I must be
missing something.  If I'm waiting on a mutex, I still need to
reacquire it on wakeup, don't I?  In that case, only the first process
to be scheduled will actually get the mutex, and the others will block
again.

> If the system is busy and there are not idle CPUs then there won't
> be a thundering herd, because there is no herd to thunder.

So we get a creeping herd?  If we wake more processes than we can
handle, we're just going to spend time putting the rest to sleep
again.

> The probability of threads blocking on the mutex before it is
> released is a function of mutex hold time to the time it takes a
> processor to calling switch with the thread which wants to run being
> the highest priority. In general mutex hold time is small compared
> to the time a process runs.

Fine, but there are exceptions.  Obviously if we only ever have one
thread waiting on the mutex, we don't have any basis for discussion.

<snip>

> 	In general there ought not to be multiple processes piling
> up on a mutex. If there are and for some reason they can't be
> fixed then these particular mutexs are going to dictate how this
> area is handled. Once we have these cases in hand we can make
> some decisions as to how to proceed.

In my experience, I've seen mutexes used for long-term waits, and I
don't see any a priori reason not to do so.  Of course, if we make
design decisions based on the assumption that all waits will be short,
then we will have a reason, but it won't be a good one.

Before you say that long-term waits are evil, note that we're probably
talking about different kinds of waits.  Obviously anything that
threatens to keep the system idle while it waits is bad, but a
replacement for tsleep(), say, can justifiably wait for a long time.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 16:59:22 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from mailhost.iprg.nokia.com (mailhost.iprg.nokia.com [205.226.5.12])
	by hub.freebsd.org (Postfix) with ESMTP id 120E337BA9D
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 16:59:16 -0700 (PDT)
	(envelope-from jre@iprg.nokia.com)
Received: from darkstar.iprg.nokia.com (darkstar.iprg.nokia.com [205.226.5.69])
	by mailhost.iprg.nokia.com (8.9.3/8.9.3-GLGS) with ESMTP id QAA18361;
	Mon, 3 Jul 2000 16:59:15 -0700 (PDT)
Received: (from mail@localhost)
	by darkstar.iprg.nokia.com (8.9.3/8.9.3-VIRSCAN) id QAA11098;
	Mon, 3 Jul 2000 16:59:12 -0700
X-Virus-Scanned:  Mon, 3 Jul 2000 16:59:12 -0700 Nokia Silicon Valley Email Exploit Scanner
Received: from UNKNOWN (205.226.1.150, claiming to be "iprg.nokia.com")
	by darkstar with SMTP id smtpdsWDuJ1; Mon, 03 Jul 2000 16:59:06 PDT
Message-ID: <396128CD.DC304CAE@iprg.nokia.com>
Date: Mon, 03 Jul 2000 16:59:09 -0700
From: Joe Eykholt <jre@iprg.nokia.com>
Organization: Nokia IPRG
X-Mailer: Mozilla 4.7 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Greg Lehey <grog@lemis.com>,
	"Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
References: <20000703114535.T39024@wantadilla.lemis.com> <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com> <39612626.3E
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Joe Eykholt wrote:
> 
> Greg Lehey wrote:
> 
> > There's nothing to say that wake_one is more complex.  wake_one takes
> > the first process on the mutex's sleep list and wakes it.  wake_all
> > (or whatever) would make a loop out of that wake function and wake all
> > the processes on the list.  All would then be scheduled, try to take
> > the mutex, and all except one would fail and be put back on the sleep
> > list.  Does this make sense?
> 
> With adaptive mutexes, the threads which are woken will either run one
> serially on one CPU, or some run at the same time on multiple CPUs.
> In that case, one gets the lock right away, and the rest SPIN on it
> (as long as the new owner doesn't get suspended on something else).
> They don't necessarily go back to sleep on that same lock.
 
Another thing that would happen if you don't wake all the waiters,
and you have a lot of CPUs, is that a thread that comes along and wants
the lock while the new owner is running (or before it's started running)
either SPINs or acquires the lock in front of all of the threads
that are still asleep.   In other words it cuts in line.  

I guess that could be bad for certain locks and for a very large 
number of CPUs.

If you do wake_all, then they all get the same chance at the lock ... and
likely they'll just spin for a short time (at worst) if locks are held
briefly.

I guess that could be bad for certain locks and for a very large 
number of CPUs.

	Joe


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 17:27:49 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by hub.freebsd.org (Postfix) with ESMTP id 8385B37BCC5
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 17:27:42 -0700 (PDT)
	(envelope-from grog@wantadilla.lemis.com)
Received: (from grog@localhost)
	by wantadilla.lemis.com (8.9.3/8.9.3) id JAA94126;
	Tue, 4 Jul 2000 09:57:05 +0930 (CST)
	(envelope-from grog)
Date: Tue, 4 Jul 2000 09:57:05 +0930
From: Greg Lehey <grog@lemis.com>
To: Joe Eykholt <jre@iprg.nokia.com>
Cc: "Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
Message-ID: <20000704095705.D65029@wantadilla.lemis.com>
References: <20000703114535.T39024@wantadilla.lemis.com> <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com> <39612626.3E <396128CD.DC304CAE@iprg.nokia.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <396128CD.DC304CAE@iprg.nokia.com>
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Monday,  3 July 2000 at 16:59:09 -0700, Joe Eykholt wrote:
> Joe Eykholt wrote:
>>
>> Greg Lehey wrote:
>>
>>> There's nothing to say that wake_one is more complex.  wake_one takes
>>> the first process on the mutex's sleep list and wakes it.  wake_all
>>> (or whatever) would make a loop out of that wake function and wake all
>>> the processes on the list.  All would then be scheduled, try to take
>>> the mutex, and all except one would fail and be put back on the sleep
>>> list.  Does this make sense?
>>
>> With adaptive mutexes, the threads which are woken will either run one
>> serially on one CPU, or some run at the same time on multiple CPUs.
>> In that case, one gets the lock right away, and the rest SPIN on it
>> (as long as the new owner doesn't get suspended on something else).
>> They don't necessarily go back to sleep on that same lock.
>
> Another thing that would happen if you don't wake all the waiters,
> and you have a lot of CPUs, is that a thread that comes along and wants
> the lock while the new owner is running (or before it's started running)
> either SPINs or acquires the lock in front of all of the threads
> that are still asleep.   In other words it cuts in line.

I need to look at the implementation more carefully.  My understanding
is that the mutexes allow exactly one process to run, independent of
the number of processors (or any other resource on which they're
waiting).  If we can allow multiple (but a finite number of) processes
to run concurrently, then we should be using counting semaphores
(which are, in fact, very similar to sleeping mutexes).

> I guess that could be bad for certain locks and for a very large
> number of CPUs.

You'll always find good and bad examples.  It's difficult to
generalize.

> If you do wake_all, then they all get the same chance at the lock ... and
> likely they'll just spin for a short time (at worst) if locks are held
> briefly.

If we have a certain number of slots available, then we wake that
many.  Normally, though, if you're sleeping at all, you've used up all
your slots.  Consider a counting semaphore which allows four
concurrent processes to enter.  Initially the counter is set to 4.

   - process 1 takes semaphore.  Counter goes to 3.
   - process 2 takes semaphore.  Counter goes to 2.
   - process 3 takes semaphore.  Counter goes to 1.
   - process 4 takes semaphore.  Counter goes to 0.
   - process 5 tries to take semaphore.  Counter goes to -1, so
     process 5 sleeps.
   - process 6 tries to take semaphore.  Counter is -1, so process 6
     sleeps.
   - process 2 releases semaphore.  Counter goes to 0.  It's still not
     > 0, so nothing further happens.
   - process 1 releases semaphore.  Counter goes to 1.  Process 5 gets
     scheduled, decreasing counter to 0.

I'm making assumptions about the exact counter implementation and that
process 5 gets scheduled, and not process 6, but they're not relevant
to the discussion.  Clearly at this point, it wouldn't make any sense
to try to schedule process 6, since the counter won't allow it.
Things might be slightly different if:

   - process 1 releases semaphore.  Counter goes to 1.  *interrupt*
   - process 3 releases semaphore.  Counter goes to 2.  Process 5 gets
     scheduled, decreasing counter to 1.  *return from interrupt*
   - Counter is 1, so process 6 gets scheduled, decreasing the counter
     to 0.

Clearly at this point, if we had a process 7 waiting on the queue, it
wouldn't make any sense to have it scheduled too.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 17:42:13 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from mailhost.iprg.nokia.com (mailhost.iprg.nokia.com [205.226.5.12])
	by hub.freebsd.org (Postfix) with ESMTP id 48EBA37B8A0
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 17:42:10 -0700 (PDT)
	(envelope-from jre@iprg.nokia.com)
Received: from darkstar.iprg.nokia.com (darkstar.iprg.nokia.com [205.226.5.69])
	by mailhost.iprg.nokia.com (8.9.3/8.9.3-GLGS) with ESMTP id RAA21990;
	Mon, 3 Jul 2000 17:42:09 -0700 (PDT)
Received: (from mail@localhost)
	by darkstar.iprg.nokia.com (8.9.3/8.9.3-VIRSCAN) id RAA00969;
	Mon, 3 Jul 2000 17:42:07 -0700
X-Virus-Scanned:  Mon, 3 Jul 2000 17:42:07 -0700 Nokia Silicon Valley Email Exploit Scanner
Received: from UNKNOWN (205.226.1.150, claiming to be "iprg.nokia.com")
	by darkstar with SMTP id smtpdRnE9JC; Mon, 03 Jul 2000 17:42:02 PDT
Message-ID: <396132D8.460539CA@iprg.nokia.com>
Date: Mon, 03 Jul 2000 17:42:00 -0700
From: Joe Eykholt <jre@iprg.nokia.com>
Organization: Nokia IPRG
X-Mailer: Mozilla 4.7 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Greg Lehey <grog@lemis.com>
Cc: "Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
References: <20000703114535.T39024@wantadilla.lemis.com> <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com> <39612626.3E
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Greg Lehey wrote:
> 
> On Monday,  3 July 2000 at 16:59:09 -0700, Joe Eykholt wrote:
> > Joe Eykholt wrote:
> >>
> >> Greg Lehey wrote:
> >>
> >>> There's nothing to say that wake_one is more complex.  wake_one takes
> >>> the first process on the mutex's sleep list and wakes it.  wake_all
> >>> (or whatever) would make a loop out of that wake function and wake all
> >>> the processes on the list.  All would then be scheduled, try to take
> >>> the mutex, and all except one would fail and be put back on the sleep
> >>> list.  Does this make sense?
> >>
> >> With adaptive mutexes, the threads which are woken will either run one
> >> serially on one CPU, or some run at the same time on multiple CPUs.
> >> In that case, one gets the lock right away, and the rest SPIN on it
> >> (as long as the new owner doesn't get suspended on something else).
> >> They don't necessarily go back to sleep on that same lock.
> >
> > Another thing that would happen if you don't wake all the waiters,
> > and you have a lot of CPUs, is that a thread that comes along and wants
> > the lock while the new owner is running (or before it's started running)
> > either SPINs or acquires the lock in front of all of the threads
> > that are still asleep.   In other words it cuts in line.
> 
> I need to look at the implementation more carefully.  My understanding
> is that the mutexes allow exactly one process to run, independent of
> the number of processors (or any other resource on which they're
> waiting).  If we can allow multiple (but a finite number of) processes
> to run concurrently, then we should be using counting semaphores
> (which are, in fact, very similar to sleeping mutexes).

Mutexes only allow one thread to get the mutex, but multiple threads can
be spinning for the mutex.  Wake_one (or wake_all) doesn't give
the mutex to anyone, the threads must to get scheduled by a CPU and then
ACQUIRE the mutex ... which is hopefully still free at that time.  
If we wake_all, then all of the threads will try to get the mutex once 
they have a CPU.  As long as the thread owning the mutex is on a CPU, 
the other contenders spin.  
 
Semaphores are tough to use adaptively, since there's no identifiable 'owner'.
 	
	Joe


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 19:18:28 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id 3F7CF37C2DF
	for <smp@freebsd.org>; Mon,  3 Jul 2000 19:18:24 -0700 (PDT)
	(envelope-from cp@berserker.bsdi.com)
Received: from berserker.bsdi.com (cp@localhost [127.0.0.1])
	by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id UAA01169;
	Mon, 3 Jul 2000 20:18:04 -0600 (MDT)
Message-Id: <200007040218.UAA01169@berserker.bsdi.com>
To: Greg Lehey <grog@lemis.com>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@freebsd.org
Subject: Re: SMP meeting summary 
From: Chuck Paterson <cp@bsdi.com>
Date: Mon, 03 Jul 2000 20:18:00 -0600
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


}I'm not sure we're talking about the same thing, but if so I must be
}missing something.  If I'm waiting on a mutex, I still need to
}reacquire it on wakeup, don't I?  In that case, only the first process
}to be scheduled will actually get the mutex, and the others will block
}again.

	Yes, you need to acquire the mutex on wakeup, but likely
one process will run acquiring and releasing the mutex in an
uncontested fashion before other processes run and do the same
thing.

}
}> If the system is busy and there are not idle CPUs then there won't
}> be a thundering herd, because there is no herd to thunder.
}
}So we get a creeping herd?  If we wake more processes than we can
}handle, we're just going to spend time putting the rest to sleep
}again.
}
	Perhaps, but odds are not see next graph.

}> The probability of threads blocking on the mutex before it is
}> released is a function of mutex hold time to the time it takes a
}> processor to calling switch with the thread which wants to run being
}> the highest priority. In general mutex hold time is small compared
}> to the time a process runs.
}
}Fine, but there are exceptions.  Obviously if we only ever have one
}thread waiting on the mutex, we don't have any basis for discussion.
}
}<snip>
}
}> 	In general there ought not to be multiple processes piling
}> up on a mutex. If there are and for some reason they can't be
}> fixed then these particular mutexs are going to dictate how this
}> area is handled. Once we have these cases in hand we can make
}> some decisions as to how to proceed.
}
}In my experience, I've seen mutexes used for long-term waits, and I
}don't see any a priori reason not to do so.  Of course, if we make
}design decisions based on the assumption that all waits will be short,
}then we will have a reason, but it won't be a good one.
}
}Before you say that long-term waits are evil, note that we're probably
}talking about different kinds of waits.  Obviously anything that
}threatens to keep the system idle while it waits is bad, but a
}replacement for tsleep(), say, can justifiably wait for a long time.

	A replacement for tsleep is not a mutex, but in Solaris
parlance a conditional variable. The uses are different, one is
for locking a resource, the other is waiting on a synch event. A
conditional variable, like the sleep queues has a mutex associated
with it. This mutex is not held except while processing the event,
both by the process waiting and the process doing the activation.
I don't think it is a good idea to assume that the heuristics for
waking up tsleep / conditional variables  is going to be
anything like those seen with mutexs.


	Since things have been cuts and pasted I'll say
again I don't have a good idea what the right answer is on
any of this. I do believe we need to get what we have running,
instrument it, and reach some decisions.

Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 19:26: 6 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id 86E9437C270
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 19:26:03 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id WAA03158;
	Mon, 3 Jul 2000 22:25:43 -0400 (EDT)
Date: Mon, 3 Jul 2000 22:25:32 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
To: Greg Lehey <grog@lemis.com>
Cc: Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
In-Reply-To: <20000703200039.H62680@wantadilla.lemis.com>
Message-ID: <Pine.SUN.3.91.1000703220209.29911A@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Mon, 3 Jul 2000, Greg Lehey wrote:
> On Monday,  3 July 2000 at  6:23:28 -0400, Daniel Eischen wrote:
> > Well if you are considering spinning for a bit of time on a held
> > mutex (which you seem to advocate?), then why not wake everyone?
> 
> Because it doesn't buy us anything.

You only have to take the sleep queue lock once to wake all the
waiting threads/processes instead of N times if you have N waiting
threads.  You only have to take the scheduling queue lock once to
place the waiting threads on the scheduling queue.

I'm not advocating doing this (right now) in FreeBSD.  But this
seems like a potential optimization we could possibly do in the
future.  But in order to do this, we need to make sure that the
time mutexes are held is very short.  My only suggestion was that
we comment those sections of code that can hold mutexes for long
periods of time or use a different naming convention for those
mutexes.

> > If mutexes are held for very short periods of time and you don't
> > often have a thundering herd problem,
> 
> That's an assumption.  So far we have *never* had a thundering herd,
> because the code don't work yet.

I wouldn't call the above an assumption.  I'm not assuming anything.
It is predicated with an "if".  If we would/do have a thundering herd
problem then I sure wouldn't want to wake_all(). 

> 
> > then waking everyone is an optimization since you only have to take
> > the scheduling lock once.
> 
> No.  If I understand things correctly, each process would need to get
> the schedlock, and only one process can get the mutex.  Why wake the
> rest?  What do you want them to do?  This applies even in the case of
> a counting semaphore (of which our "mutex" is a special case), since
> if any slots are available, the process wouldn't be sleeping.

No each process would be placed in the run queue with wake_all() 
semantics.  Plus you would only have to take the sleep queue lock
once too.  Waking processes doesn't mean that they will run 
immediately.  The first process to run will hopefully take the
mutex and release it is a timely manner so that by the time the
next process runs it can take the mutex uncontested.

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 19:36:48 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id CA45D37C4AD
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 19:36:34 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id WAA04290;
	Mon, 3 Jul 2000 22:36:16 -0400 (EDT)
Date: Mon, 3 Jul 2000 22:36:14 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
To: Joe Eykholt <jre@iprg.nokia.com>
Cc: Greg Lehey <grog@lemis.com>,
	"Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
In-Reply-To: <39612626.3E3AE2C4@iprg.nokia.com>
Message-ID: <Pine.SUN.3.91.1000703223119.29911B-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Mon, 3 Jul 2000, Joe Eykholt wrote:
> Greg Lehey wrote:
>  
> > There's nothing to say that wake_one is more complex.  wake_one takes
> > the first process on the mutex's sleep list and wakes it.  wake_all
> > (or whatever) would make a loop out of that wake function and wake all
> > the processes on the list.  All would then be scheduled, try to take
> > the mutex, and all except one would fail and be put back on the sleep
> > list.  Does this make sense?
> 
> With adaptive mutexes, the threads which are woken will either run one 
> serially on one CPU, or some run at the same time on multiple CPUs.  
> In that case, one gets the lock right away, and the rest SPIN on it 
> (as long as the new owner doesn't get suspended on something else).  
> They don't necessarily go back to sleep on that same lock.

Thanks, I forgot about this.  Even if you wake multiple threads, they
will not be put back to sleep until the owner of the mutex is no
longer running.

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 19:40:10 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by hub.freebsd.org (Postfix) with ESMTP id 3A6F737C2D5
	for <smp@freebsd.org>; Mon,  3 Jul 2000 19:40:00 -0700 (PDT)
	(envelope-from grog@wantadilla.lemis.com)
Received: (from grog@localhost)
	by wantadilla.lemis.com (8.9.3/8.9.3) id MAA94857;
	Tue, 4 Jul 2000 12:09:30 +0930 (CST)
	(envelope-from grog)
Date: Tue, 4 Jul 2000 12:09:30 +0930
From: Greg Lehey <grog@lemis.com>
To: Chuck Paterson <cp@bsdi.com>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@freebsd.org
Subject: Re: SMP meeting summary
Message-ID: <20000704120930.G94351@wantadilla.lemis.com>
References: <200007040218.UAA01169@berserker.bsdi.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <200007040218.UAA01169@berserker.bsdi.com>
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Monday,  3 July 2000 at 20:18:00 -0600, Chuck Paterson wrote:
>
>> I'm not sure we're talking about the same thing, but if so I must be
>> missing something.  If I'm waiting on a mutex, I still need to
>> reacquire it on wakeup, don't I?  In that case, only the first process
>> to be scheduled will actually get the mutex, and the others will block
>> again.
>
> 	Yes, you need to acquire the mutex on wakeup, but likely
> one process will run acquiring and releasing the mutex in an
> uncontested fashion before other processes run and do the same
> thing.

Hmmm.  Yes, I suppose that would happen in a single processor
environment.

>>> 	In general there ought not to be multiple processes piling
>>> up on a mutex. If there are and for some reason they can't be
>>> fixed then these particular mutexs are going to dictate how this
>>> area is handled. Once we have these cases in hand we can make
>>> some decisions as to how to proceed.
>>
>> In my experience, I've seen mutexes used for long-term waits, and I
>> don't see any a priori reason not to do so.  Of course, if we make
>> design decisions based on the assumption that all waits will be short,
>> then we will have a reason, but it won't be a good one.
>>
>> Before you say that long-term waits are evil, note that we're probably
>> talking about different kinds of waits.  Obviously anything that
>> threatens to keep the system idle while it waits is bad, but a
>> replacement for tsleep(), say, can justifiably wait for a long time.
>
> 	A replacement for tsleep is not a mutex, but in Solaris
> parlance a conditional variable.

I think we have a certain problem with terminology, and it seems to be
clouding the discussion.  The big difference between the BSD/OS sleep
mutex and the semaphores we used at Tandem (amongst other things for
long-term waits) wasn't the counter (which was always set to 1) but
the name.

> The uses are different, one is for locking a resource, the other is
> waiting on a synch event. A conditional variable, like the sleep
> queues has a mutex associated with it. This mutex is not held except
> while processing the event, both by the process waiting and the
> process doing the activation.  

This is a different paradigm from the one we used.

> I don't think it is a good idea to assume that the heuristics for
> waking up tsleep / conditional variables is going to be anything
> like those seen with mutexs.

Maybe.  I need to let this go through my head.  Just because we found
it to be the right idea at Tandem doesn't mean it's the right idea
here.  I've never been able to understand the advantages of
conditional variables, which may be my viewpoint, or it may be some
basic lack of understanding.

> 	Since things have been cuts and pasted I'll say again I don't
> have a good idea what the right answer is on any of this. I do
> believe we need to get what we have running, instrument it, and
> reach some decisions.

Agreed entirely.  At the moment the discussion is academic.  When
we've done the implementation, we'll have a much better idea about
what we really want^H^H^H^Hneed.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 19:41:58 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id C530137C384
	for <smp@freebsd.org>; Mon,  3 Jul 2000 19:41:49 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id WAA04905;
	Mon, 3 Jul 2000 22:41:29 -0400 (EDT)
Date: Mon, 3 Jul 2000 22:41:27 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
To: Greg Lehey <grog@lemis.com>
Cc: Chuck Paterson <cp@bsdi.com>, Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@freebsd.org
Subject: Re: SMP meeting summary
In-Reply-To: <20000704092245.B65029@wantadilla.lemis.com>
Message-ID: <Pine.SUN.3.91.1000703223707.29911C-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tue, 4 Jul 2000, Greg Lehey wrote:
> On Monday,  3 July 2000 at  9:28:34 -0600, Chuck Paterson wrote:
> >> Well if you are considering spinning for a bit of time on a held
> >> mutex (which you seem to advocate?), then why not wake everyone?
> >> If mutexes are held for very short periods of time and you don't
> >> often have a thundering herd problem, then waking everyone is
> >> an optimization since you only have to take the scheduling lock
> >> once.  If mutexes can be held for long periods of time, then you
> >> probably wouldn't want to wake everyone.
> >>
> >> Dan Eischen
> >
> > 	If all processes are made runnable at once then both future
> > releases and acquisitions of the mutex may be uncontested, resulting
> > in not having to acquire the scheduling lock.
> 
> I'm not sure we're talking about the same thing, but if so I must be
> missing something.  If I'm waiting on a mutex, I still need to
> reacquire it on wakeup, don't I?  In that case, only the first process
> to be scheduled will actually get the mutex, and the others will block
> again.
> 
> > If the system is busy and there are not idle CPUs then there won't
> > be a thundering herd, because there is no herd to thunder.
> 
> So we get a creeping herd?  If we wake more processes than we can
> handle, we're just going to spend time putting the rest to sleep
> again.
> 
> > The probability of threads blocking on the mutex before it is
> > released is a function of mutex hold time to the time it takes a
> > processor to calling switch with the thread which wants to run being
> > the highest priority. In general mutex hold time is small compared
> > to the time a process runs.
> 
> Fine, but there are exceptions.  Obviously if we only ever have one
> thread waiting on the mutex, we don't have any basis for discussion.
> 
> <snip>
> 
> > 	In general there ought not to be multiple processes piling
> > up on a mutex. If there are and for some reason they can't be
> > fixed then these particular mutexs are going to dictate how this
> > area is handled. Once we have these cases in hand we can make
> > some decisions as to how to proceed.
> 
> In my experience, I've seen mutexes used for long-term waits, and I
> don't see any a priori reason not to do so.  Of course, if we make
> design decisions based on the assumption that all waits will be short,
> then we will have a reason, but it won't be a good one.
> 
> Before you say that long-term waits are evil, note that we're probably
> talking about different kinds of waits.  Obviously anything that
> threatens to keep the system idle while it waits is bad, but a
> replacement for tsleep(), say, can justifiably wait for a long time.

Which is why we want condition variables to replace tsleep().  If
you want to wait long periods of time, then use condition variables
or reader/writer locks.  Mutex and condition variables can be used
in a very similar way to splXXX() and tsleep().

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 20:41:21 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id 4AF8E37B5E7
	for <smp@freebsd.org>; Mon,  3 Jul 2000 20:41:17 -0700 (PDT)
	(envelope-from cp@berserker.bsdi.com)
Received: from berserker.bsdi.com (cp@localhost [127.0.0.1])
	by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id VAA01866;
	Mon, 3 Jul 2000 21:40:57 -0600 (MDT)
Message-Id: <200007040340.VAA01866@berserker.bsdi.com>
To: Greg Lehey <grog@lemis.com>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@freebsd.org
Subject: Re: SMP meeting summary 
From: Chuck Paterson <cp@bsdi.com>
Date: Mon, 03 Jul 2000 21:40:57 -0600
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


}
}Maybe.  I need to let this go through my head.  Just because we found
}it to be the right idea at Tandem doesn't mean it's the right idea
}here.  I've never been able to understand the advantages of
}conditional variables, which may be my viewpoint, or it may be some
}basic lack of understanding.
}

This is how I think of it:

Mutexs are a synchronization mechanism optimized such that
they have to be acquired and then released by the same
process.

Tsleep, conditional variables are a mechanism that is optimized
such that it is used for one process to wait for an event
posted by another process.

A general purpose semaphore could be used for either one. The
current use of lock manager locks with bufs is an example of this.

Mutexs map well into what hardware such as Intel or Sparc
can natively support. Any of the other mechanism are likely to be
twice as expensive, requiring two low level locked operations per
high level locked operation. This isn't always true as read/writer
locks can be as cheap as a mutex for the case where there is a
single reader or writer. But once again this is the case where
the same process is acquiring and releasing the resource without
action from another process.

It may well be that Tandem hardware made general purpose semaphores
as cheap as mutexs. This could be because there was support for a
higher level operation or just a few (or one as in the
case of Cray) synchronization registers.  In either case software can 
use general purpose semaphores for everything and not screw with
all these hybrids.

Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 22: 8:40 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 374C137B8C3
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 22:08:37 -0700 (PDT)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id e6458OX14088;
	Mon, 3 Jul 2000 22:08:24 -0700 (PDT)
Date: Mon, 3 Jul 2000 22:08:24 -0700
From: Alfred Perlstein <bright@wintelcom.net>
To: Greg Lehey <grog@lemis.com>
Cc: "Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
Message-ID: <20000703220823.Z25571@fw.wintelcom.net>
References: <20000703114535.T39024@wantadilla.lemis.com> <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <20000704083822.A65029@wantadilla.lemis.com>; from grog@lemis.com on Tue, Jul 04, 2000 at 08:38:22AM +0930
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

* Greg Lehey <grog@lemis.com> [000703 16:10] wrote:
> > That's my reading of Sun's claims in Solaris and given that they
> > have a little more experience with this kind of thing I'm inclined
> > to believe them until I see facts stating the contrary.
> 
> Sun's problem with Solaris is non-obvious, and may not bite us.
> 
> I think we should hold off with this kind of discussion for the while.
> Everything I can see suggests that it's crazy to wake all processes.
> If we find that we run into race conditions which we can only solve
> with wake_all, though, we'll compare the effort in fixing them with
> the (undoubted) performance degradation caused by waking them all.

The idea is that for spin or spin-then-sleep mutexes (very short
hold time) is that since you won't have as many processes as cpus
contending (and when you do it's ok) that the mutual exclusion is
so short lived that by the time the next 'thundering' process is
actually given the CPU, the likelyhood is that other processes have
already aquired _and_ released the spinlock making it more than
likely that the reasource is free.

The idea is that the a quantum is actually so great that there's
little chance of one of the wake_all processes colliding on the
lock.

By effectively you gain a whole lot because you avoid having to
grab sched-mutex on each aquire/release and you also reduce the
cache cost of wakeups because it's likely that only once kernel
context will wind its way down the sleep queue.

What sort of interesting is that doing it one way or the other is
so similar that in reality the initial implementation doesn't
matter, switching from one to the other will be trivial at most,
the importance lies in getting one implementation done.

-Alfred


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Mon Jul  3 22:38:16 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by hub.freebsd.org (Postfix) with ESMTP id E97C737BC70
	for <smp@FreeBSD.ORG>; Mon,  3 Jul 2000 22:38:07 -0700 (PDT)
	(envelope-from grog@wantadilla.lemis.com)
Received: (from grog@localhost)
	by wantadilla.lemis.com (8.9.3/8.9.3) id PAA95496;
	Tue, 4 Jul 2000 15:07:36 +0930 (CST)
	(envelope-from grog)
Date: Tue, 4 Jul 2000 15:07:36 +0930
From: Greg Lehey <grog@lemis.com>
To: Alfred Perlstein <bright@wintelcom.net>
Cc: "Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
Message-ID: <20000704150736.H94351@wantadilla.lemis.com>
References: <20000703114535.T39024@wantadilla.lemis.com> <Pine.SUN.3.91.1000703060948.5216A-100000@pcnet1.pcnet.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com> <20000703220823.Z25571@fw.wintelcom.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <20000703220823.Z25571@fw.wintelcom.net>
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Monday,  3 July 2000 at 22:08:24 -0700, Alfred Perlstein wrote:
> What sort of interesting is that doing it one way or the other is
> so similar that in reality the initial implementation doesn't
> matter, switching from one to the other will be trivial at most,
> the importance lies in getting one implementation done.

There's a big difference in which implementation we do.  The BSD/OS
implementation works, at least in the BSD/OS environment.  Nothing
else has been written.  I think it's very important that we get the
BSD/OS version up and hobbling before we start redesigning things.  By
the time we've done that, we'll understand the material so much better
that we'll have a double win (working code and an understanding of how
to do it better).  I'm currently up to my elbows in dead interrupt
code, and I'm surprised how much I'm learning [wipes mess off arms].

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Tue Jul  4  7:41:42 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from barney.ife.no (barney.ife.no [128.39.229.49])
	by hub.freebsd.org (Postfix) with ESMTP
	id 3F2CD37B5A2; Tue,  4 Jul 2000 07:41:35 -0700 (PDT)
	(envelope-from stein@ife.no)
Received: from ife.no (virginis.ife.no [128.39.229.176])
	by barney.ife.no (8.9.3/8.9.3) with ESMTP id QAA32593;
	Tue, 4 Jul 2000 16:41:30 +0200 (MET DST)
Message-ID: <3961F79A.1643F238@ife.no>
Date: Tue, 04 Jul 2000 16:41:30 +0200
From: "Stein M. Sandbech" <stein@ife.no>
Reply-To: stein@ife.no
Organization: IFE
X-Mailer: Mozilla 4.7 [en] (X11; I; FreeBSD 3.2-RELEASE i386)
X-Accept-Language: en
MIME-Version: 1.0
To: freebsd-smp@freebsd.org
Cc: Mike Smith <msmith@freebsd.org>
Subject: Re: Q: SMP on Intel OR840 MB?
References: <200003131947.LAA00931@mass.cdrom.com>
Content-Type: multipart/alternative;
 boundary="------------D6160584934825134C81431F"
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


--------------D6160584934825134C81431F
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

FYI.

Mike Smith wrote:

> Just as a general "heads up", I'm having problems with SMP on an
> i840-based board at the moment.  I haven't had time to characterise it
> (still! 8(), but you should be careful for a little while here.

As a followup on Mike`s answer to my initial query on freebsd-SMP.

I`ve installed FreeBSD 3.4 Release and FreeBSD 4.0 Release on the
Intel OR840 (Outrigger) motherboard without any problems (except for
not recognizing the keyboard, did the -Dh on boot prompt -> OK).

Configuration: OR840 MB / 256MB RAMBUS memory / 4 UFW SCSI
disks / Adaptec 2940U2W / Creative GeForce256 / Tandberg SLR5
streamer /  Pioneer SCSI DVD rom / Yamaha CD R / 2 x 800/133MHz
Pentium III`s / integrated Intel Pro100+ nic (fxp).

Built and installed a SMP kernel on both FreeBSD versions. It runs
like a charm, doing simulations (floating point intensive), with periodically
heavy interactive use.

I can run the  mptable  on this system, if you`ll find it usefull
(I have this machine @home, so ...  :))

All in all, an extremely nice system, both HW and OS!


--Stein Morten

--
/* Stein M Sandbech                  Email: stein@ife.no     **
** Senior Systems Engineer, EDP dept Email: stein@www.ife.no **
** Institute for Energy Technology   Tel: +47 63 80 60 00    **
** Box 40, N-2007 Kjeller, NORWAY    Fax: +47 63 81 11 68    */


--------------D6160584934825134C81431F
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
FYI.
<p>Mike Smith wrote:
<blockquote TYPE=CITE>Just as a general "heads up", I'm having problems
with SMP on an
<br>i840-based board at the moment.&nbsp; I haven't had time to characterise
it
<br>(still! 8(), but you should be careful for a little while here.</blockquote>
As a followup on Mike`s answer to my initial query on freebsd-SMP.
<p>I`ve installed FreeBSD 3.4 Release and FreeBSD 4.0 Release on the
<br>Intel OR840 (Outrigger) motherboard without any problems (except for
<br>not recognizing the keyboard, did the -Dh on boot prompt -> OK).
<p>Configuration: OR840 MB / 256MB RAMBUS memory / 4 UFW SCSI
<br>disks / Adaptec 2940U2W / Creative GeForce256 / Tandberg SLR5
<br>streamer /&nbsp; Pioneer SCSI DVD rom / Yamaha CD R / 2 x 800/133MHz
<br>Pentium III`s / integrated Intel Pro100+ nic (fxp).
<p>Built and installed a SMP kernel on both FreeBSD versions. It runs
<br>like a charm, doing simulations (floating point intensive), with periodically
<br>heavy interactive use.
<p>I can run the&nbsp; mptable&nbsp; on this system, if you`ll find it
usefull
<br>(I have this machine @home, so ...&nbsp; :))
<p>All in all, an extremely nice system, both HW and OS!
<br>&nbsp;
<p>--Stein Morten
<pre>--&nbsp;
/* Stein M Sandbech&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Email: stein@ife.no&nbsp;&nbsp;&nbsp;&nbsp; **
** Senior Systems Engineer, EDP dept Email: stein@www.ife.no **
** Institute for Energy Technology&nbsp;&nbsp; Tel: +47 63 80 60 00&nbsp;&nbsp;&nbsp; **
** Box 40, N-2007 Kjeller, NORWAY&nbsp;&nbsp;&nbsp; Fax: +47 63 81 11 68&nbsp;&nbsp;&nbsp; */</pre>
&nbsp;</html>

--------------D6160584934825134C81431F--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Tue Jul  4 14:45:45 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from netplex.com.au (adsl-63-207-30-186.dsl.snfc21.pacbell.net [63.207.30.186])
	by hub.freebsd.org (Postfix) with ESMTP id 6B86237B5EA
	for <smp@FreeBSD.ORG>; Tue,  4 Jul 2000 14:45:33 -0700 (PDT)
	(envelope-from peter@netplex.com.au)
Received: from netplex.com.au (peter@localhost [127.0.0.1])
	by netplex.com.au (8.9.3/8.9.3) with ESMTP id OAA54160;
	Tue, 4 Jul 2000 14:44:09 -0700 (PDT)
	(envelope-from peter@netplex.com.au)
Message-Id: <200007042144.OAA54160@netplex.com.au>
X-Mailer: exmh version 2.1.1 10/15/1999
To: Greg Lehey <grog@lemis.com>
Cc: Alfred Perlstein <bright@wintelcom.net>,
	"Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary 
In-Reply-To: Message from Greg Lehey <grog@lemis.com> 
   of "Tue, 04 Jul 2000 15:07:36 +0930." <20000704150736.H94351@wantadilla.lemis.com> 
Date: Tue, 04 Jul 2000 14:44:09 -0700
From: Peter Wemm <peter@netplex.com.au>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Greg Lehey wrote:
> On Monday,  3 July 2000 at 22:08:24 -0700, Alfred Perlstein wrote:
> > What sort of interesting is that doing it one way or the other is
> > so similar that in reality the initial implementation doesn't
> > matter, switching from one to the other will be trivial at most,
> > the importance lies in getting one implementation done.
> 
> There's a big difference in which implementation we do.  The BSD/OS
> implementation works, at least in the BSD/OS environment.  Nothing
> else has been written.  I think it's very important that we get the
> BSD/OS version up and hobbling before we start redesigning things.  By
> the time we've done that, we'll understand the material so much better
> that we'll have a double win (working code and an understanding of how
> to do it better).  I'm currently up to my elbows in dead interrupt
> code, and I'm surprised how much I'm learning [wipes mess off arms].

A general comment..  It was made very clear at the SMP meeting that things
would have taken a lot less time if they had the "safe but slower" fallback
code available right from the start.  I feel that it is imperative that we
implement a minimal-but-functional set of code that we can trust first and
*then* take a shot at the lightweight interrupt context, and do it in such
a way that when Weird Shit(TM) starts happening that we can easily fall
back to the conservative code so that we can eliminate the optimized
lightweight interrupt contexts from suspicion.  Having the BSD/OS code
available as a starting point is a huge help.  We should not have to worry
about the mutex or witness code until we are up and running.

There are truckloads of optimizations that can be done afterwards, but we
must walk first, not run.  Doing things conservatively and safely now with
an eye towards later optimization will hopefully save our sanity.  Whatever
we can leverage from BSD/OS as a "known quantity" we should - it will
reduce the amount of green or untried code while we get up to speed.  If
this means that our SMP work looks a lot like BSD/OS, then so what?  It
doesn't have to stay that way forever.  Once we have something that runs
and doesn't panic in 3 seconds, then we have something to tune/optimize/
reimplement/whatever.  If we all dive in and invent our own stuff right
from the start, we will have just as much pain and suffering as the BSDI
folks had and it will take just as long (or longer).

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Tue Jul  4 15:59:54 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by hub.freebsd.org (Postfix) with ESMTP id 22A6837B5A9
	for <smp@FreeBSD.ORG>; Tue,  4 Jul 2000 15:59:39 -0700 (PDT)
	(envelope-from grog@wantadilla.lemis.com)
Received: (from grog@localhost)
	by wantadilla.lemis.com (8.9.3/8.9.3) id IAA97127;
	Wed, 5 Jul 2000 08:29:00 +0930 (CST)
	(envelope-from grog)
Date: Wed, 5 Jul 2000 08:29:00 +0930
From: Greg Lehey <grog@lemis.com>
To: Peter Wemm <peter@netplex.com.au>
Cc: Alfred Perlstein <bright@wintelcom.net>,
	"Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
Message-ID: <20000705082900.I94351@wantadilla.lemis.com>
References: <grog@lemis.com> <200007042144.OAA54160@netplex.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <200007042144.OAA54160@netplex.com.au>
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tuesday,  4 July 2000 at 14:44:09 -0700, Peter Wemm wrote:
> Greg Lehey wrote:
>> On Monday,  3 July 2000 at 22:08:24 -0700, Alfred Perlstein wrote:
>>> What sort of interesting is that doing it one way or the other is
>>> so similar that in reality the initial implementation doesn't
>>> matter, switching from one to the other will be trivial at most,
>>> the importance lies in getting one implementation done.
>>
>> There's a big difference in which implementation we do.  The BSD/OS
>> implementation works, at least in the BSD/OS environment.  Nothing
>> else has been written.  I think it's very important that we get the
>> BSD/OS version up and hobbling before we start redesigning things.  By
>> the time we've done that, we'll understand the material so much better
>> that we'll have a double win (working code and an understanding of how
>> to do it better).  I'm currently up to my elbows in dead interrupt
>> code, and I'm surprised how much I'm learning [wipes mess off arms].
>
> A general comment..  It was made very clear at the SMP meeting that
> things would have taken a lot less time if they had the "safe but
> slower" fallback code available right from the start.  I feel that
> it is imperative that we implement a minimal-but-functional set of
> code that we can trust first and *then* take a shot at the
> lightweight interrupt context, and do it in such a way that when
> Weird Shit(TM) starts happening that we can easily fall back to the
> conservative code so that we can eliminate the optimized lightweight
> interrupt contexts from suspicion.

Agreed.  That's the way I'm going.  Is there anything I have said that
gives you reason to think I'm advocating something else?

> Having the BSD/OS code available as a starting point is a huge help.
> We should not have to worry about the mutex or witness code until we
> are up and running.

For some definition of "worry".

> There are truckloads of optimizations that can be done afterwards,
> but we must walk first, not run.  Doing things conservatively and
> safely now with an eye towards later optimization will hopefully
> save our sanity.  Whatever we can leverage from BSD/OS as a "known
> quantity" we should - it will reduce the amount of green or untried
> code while we get up to speed.  If this means that our SMP work
> looks a lot like BSD/OS, then so what?  It doesn't have to stay that
> way forever. 

I'm also not advocating change for change's sake.  If it turns out
that the BSD/OS code is the way to go, then I wouldn't want to change.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Tue Jul  4 16:38:23 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from netplex.com.au (adsl-63-207-30-186.dsl.snfc21.pacbell.net [63.207.30.186])
	by hub.freebsd.org (Postfix) with ESMTP id 8F57737BA82
	for <smp@FreeBSD.ORG>; Tue,  4 Jul 2000 16:38:12 -0700 (PDT)
	(envelope-from peter@netplex.com.au)
Received: from netplex.com.au (peter@localhost [127.0.0.1])
	by netplex.com.au (8.9.3/8.9.3) with ESMTP id QAA54794;
	Tue, 4 Jul 2000 16:06:34 -0700 (PDT)
	(envelope-from peter@netplex.com.au)
Message-Id: <200007042306.QAA54794@netplex.com.au>
X-Mailer: exmh version 2.1.1 10/15/1999
To: Greg Lehey <grog@lemis.com>
Cc: Alfred Perlstein <bright@wintelcom.net>,
	"Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary 
In-Reply-To: Message from Greg Lehey <grog@lemis.com> 
   of "Wed, 05 Jul 2000 08:29:00 +0930." <20000705082900.I94351@wantadilla.lemis.com> 
Date: Tue, 04 Jul 2000 16:06:34 -0700
From: Peter Wemm <peter@netplex.com.au>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Greg Lehey wrote:
> On Tuesday,  4 July 2000 at 14:44:09 -0700, Peter Wemm wrote:
[..]
> Agreed.  That's the way I'm going.  Is there anything I have said that
> gives you reason to think I'm advocating something else?

No, I was backing you up, not aiming it at you.  I was reiterating a point
that seemed to have been lost (or forgotten) in the noise.

Cheers,
-Peter


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5  2:24:28 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from io.yi.org (24.67.218.186.bc.wave.home.com [24.67.218.186])
	by hub.freebsd.org (Postfix) with ESMTP id 9746137B510
	for <freebsd-smp@FreeBSD.ORG>; Wed,  5 Jul 2000 02:24:21 -0700 (PDT)
	(envelope-from jburkhol@home.com)
Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1])
	by io.yi.org (Postfix) with ESMTP
	id A6B47BA4E; Wed,  5 Jul 2000 02:24:20 -0700 (PDT)
X-Mailer: exmh version 2.1.1 10/15/1999
To: Jake Burkholder <jburkhol@home.com>
Cc: Jason Evans <jasone@canonware.com>, Greg Lehey <grog@lemis.com>,
	Matthew Dillon <dillon@apollo.backplane.com>, freebsd-smp@FreeBSD.ORG
Subject: Re: SMP progress (was: Stepping on Toes) 
In-Reply-To: Message from Jake Burkholder <jburkhol@home.com> 
   of "Mon, 03 Jul 2000 13:15:16 PDT." <20000703201516.28BA3BA4E@io.yi.org> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 05 Jul 2000 02:24:20 -0700
From: Jake Burkholder <jburkhol@home.com>
Message-Id: <20000705092420.A6B47BA4E@io.yi.org>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > patch set on the web page (http://people.freebsd.org/~jasone/smp/) shortly.
> > Meanwhile, you can get Jake's patch set at:
> > 
> > http://people.freebsd.org/~jake/smpng2.tar
> 

dfr filled me in on how to include new files in a diff, so this is
now available as one:  http://people.freebsd.org/~jake/smpng.diff


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5  9:32:25 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 9AC1737B5B5
	for <freebsd-smp@freebsd.org>; Wed,  5 Jul 2000 09:32:23 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id JAA87977;
	Wed, 5 Jul 2000 09:32:23 -0700 (PDT)
	(envelope-from dillon)
Date: Wed, 5 Jul 2000 09:32:23 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200007051632.JAA87977@apollo.backplane.com>
To: freebsd-smp@freebsd.org
Subject: Personal URL has moved (also: SMP patchset URL)
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

    www.backplane.com -> apollo.backplane.com

    http://apollo.backplane.com/FreeBSDSmp/

    I moved backplane.com and www.backplane.com to point to Backplane Inc's
    web server, so my stuff is not available via those hostnames any more.
    Please use 'apollo.backplane.com'.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5  9:44:10 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 473BB37C1B4
	for <freebsd-smp@FreeBSD.ORG>; Wed,  5 Jul 2000 09:44:04 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id JAA88121;
	Wed, 5 Jul 2000 09:43:57 -0700 (PDT)
	(envelope-from dillon)
Date: Wed, 5 Jul 2000 09:43:57 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200007051643.JAA88121@apollo.backplane.com>
To: Greg Lehey <grog@lemis.com>
Cc: Chuck Paterson <cp@bsdi.com>, David Greenman <dg@root.com>,
	freebsd-smp@FreeBSD.ORG
Subject: Re: SMP progress (was: Stepping on Toes)
References: <200006261650.KAA17801@berserker.bsdi.com> <200006271742.KAA35851@apollo.backplane.com> <20000703112203.B61851@wantadilla.lemis.com>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

:...
:>     At this moment, without interrupt threads, interrupts can share Giant
:>     with the curproc they interrupted.  This is how our existing MP stuff
:>     worked already.
:>
:>     When Greg moves interrupts to their own threads, and obtains Giant to
:>     run those interrupts, no more sharing will occur and just the fact
:>     that the interrupt is holding Giant guarentees that nobody else will
:>     be messing with SPLs, thus the SPLs can be removed entirely.
:
:Agreed.  I'm in the process of implementing the heavy-weight interrupt
:processes now.  I've just taken a look at your web page and note that
:the URL no longer exists; in conjunction with the discussion above,
:I'm no longer sure how far you are.  Are you importing the BSD/OS code
:now?
:
:We should probably take the rest of this offline, but I wanted to
:discuss how we do things.  My idea is:
:
:1.  You import the BSD/OS mutexes.
:2.  I import/implement the heavy-weight interrupt code, which I will
:    endeavour to get working relatively reliably.  This should be a
:    fallback while I break^H^H^H^H^Himplement light-wait interrupt
:    threads.
:3.  You and I test our stuff together until it can stay up for an hour
:    or so (exact time to be determined by Jason, who'll be carrying
:    the can).
:4.  We commit the marginally stable stuff.
:5.  I carry on working on the light-weight threads.
:
:Any comments?
:
:Greg

    Jake Burkholder is porting the BSD/OS mutexes.  I don't expect there
    to be much of a difference in regards to your heavy-weight interrupt
    work.  I'm going to take a look at Jake's patchset tonight.  I think
    the only operational item we need to research is the sti/cli stuff in
    the BSDI mutexes... we should be able to remove them at some point
    (my interrupt code is already using the ipending mechanism to deal
    with the scheduler mutex being active on the current cpu).  

    If Jake's removed that, then we'll want to put it back in at some point
    since it saves a significant amount of overhead ('sti' and 'cli' are
    expensive instructions).

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5  9:53:20 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id A3C6937B509
	for <freebsd-smp@freebsd.org>; Wed,  5 Jul 2000 09:53:17 -0700 (PDT)
	(envelope-from cp@berserker.bsdi.com)
Received: from berserker.bsdi.com (cp@localhost [127.0.0.1])
	by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id KAA14768;
	Wed, 5 Jul 2000 10:52:24 -0600 (MDT)
Message-Id: <200007051652.KAA14768@berserker.bsdi.com>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Greg Lehey <grog@lemis.com>, David Greenman <dg@root.com>,
	freebsd-smp@freebsd.org
Subject: Re: SMP progress (was: Stepping on Toes) 
From: Chuck Paterson <cp@bsdi.com>
Date: Wed, 05 Jul 2000 10:52:23 -0600
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

}
}    Jake Burkholder is porting the BSD/OS mutexes.  I don't expect there
}    to be much of a difference in regards to your heavy-weight interrupt
}    work.  I'm going to take a look at Jake's patchset tonight.  I think
}    the only operational item we need to research is the sti/cli stuff in
}    the BSDI mutexes... we should be able to remove them at some point
}    (my interrupt code is already using the ipending mechanism to deal
}    with the scheduler mutex being active on the current cpu).  
}
}    If Jake's removed that, then we'll want to put it back in at some point
}    since it saves a significant amount of overhead ('sti' and 'cli' are
}    expensive instructions).
}
}					-Matt
}					Matthew Dillon 
}					<dillon@backplane.com>


	I believe ipending wants to go away totally. It really
isn't meaningful in the thread environment and the locked operations
needed to support it once multiple processor are running
in the kernel are more expensive the sti, cli.

Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5  9:58:18 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id 4ADF037B638
	for <freebsd-smp@freebsd.org>; Wed,  5 Jul 2000 09:58:15 -0700 (PDT)
	(envelope-from cp@berserker.bsdi.com)
Received: from berserker.bsdi.com (cp@localhost [127.0.0.1])
	by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id KAA14847;
	Wed, 5 Jul 2000 10:57:59 -0600 (MDT)
Message-Id: <200007051657.KAA14847@berserker.bsdi.com>
Cc: Matthew Dillon <dillon@apollo.backplane.com>,
	Greg Lehey <grog@lemis.com>, David Greenman <dg@root.com>,
	freebsd-smp@freebsd.org
Subject: Re: SMP progress (was: Stepping on Toes) 
From: Chuck Paterson <cp@bsdi.com>
Date: Wed, 05 Jul 2000 10:57:58 -0600
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

}
}
}	I believe ipending wants to go away totally. It really
}isn't meaningful in the thread environment and the locked operations
}needed to support it once multiple processor are running
}in the kernel are more expensive the sti, cli.
}
}Chuck
}

	I should have said that it is the locked operations
needed to supports the masks is the really expensive part, not
ipending itself. Also with the spin locks we want to mask interrupts
to a particular processor not all processors. 


Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5 10:31:45 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from lor.watermarkgroup.com (lor.watermarkgroup.com [207.202.73.33])
	by hub.freebsd.org (Postfix) with ESMTP id C248E37BC9B
	for <freebsd-smp@FreeBSD.ORG>; Wed,  5 Jul 2000 10:31:42 -0700 (PDT)
	(envelope-from luoqi@watermarkgroup.com)
Received: (from luoqi@localhost)
	by lor.watermarkgroup.com (8.10.1/8.10.1) id e65HUGJ11739;
	Wed, 5 Jul 2000 13:30:17 -0400 (EDT)
Date: Wed, 5 Jul 2000 13:30:17 -0400 (EDT)
From: Luoqi Chen <luoqi@watermarkgroup.com>
Message-Id: <200007051730.e65HUGJ11739@lor.watermarkgroup.com>
To: cp@bsdi.com
Subject: Re: SMP progress (was: Stepping on Toes)
Cc: dg@root.com, dillon@apollo.backplane.com,
	freebsd-smp@FreeBSD.ORG, grog@lemis.com
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> }
> }	I believe ipending wants to go away totally. It really
> }isn't meaningful in the thread environment and the locked operations
> }needed to support it once multiple processor are running
> }in the kernel are more expensive the sti, cli.
> }
> }Chuck
> }
> 
> 	I should have said that it is the locked operations
> needed to supports the masks is the really expensive part, not
> ipending itself. Also with the spin locks we want to mask interrupts
> to a particular processor not all processors. 
> 
> 
> Chuck
> 
There's a third way out, we could keep a per-cpu spin lock count, and
disallow kernel preemption when the count is non-zero. Of course, this
doesn't apply to sched_lock, we'll still have to use sti/cli there.

-lq


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5 11:28:49 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 837F737C03A
	for <freebsd-smp@freebsd.org>; Wed,  5 Jul 2000 11:28:45 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id LAA88574;
	Wed, 5 Jul 2000 11:28:35 -0700 (PDT)
	(envelope-from dillon)
Date: Wed, 5 Jul 2000 11:28:35 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200007051828.LAA88574@apollo.backplane.com>
To: Chuck Paterson <cp@bsdi.com>
Cc: Greg Lehey <grog@lemis.com>, David Greenman <dg@root.com>,
	freebsd-smp@freebsd.org
Subject: Re: SMP progress (was: Stepping on Toes) 
References:  <200007051652.KAA14768@berserker.bsdi.com>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


:}    Jake Burkholder is porting the BSD/OS mutexes.  I don't expect there
:}    to be much of a difference in regards to your heavy-weight interrupt
:}    work.  I'm going to take a look at Jake's patchset tonight.  I think
:}    the only operational item we need to research is the sti/cli stuff in
:}    the BSDI mutexes... we should be able to remove them at some point
:}    (my interrupt code is already using the ipending mechanism to deal
:}    with the scheduler mutex being active on the current cpu).  
:}
:}    If Jake's removed that, then we'll want to put it back in at some point
:}    since it saves a significant amount of overhead ('sti' and 'cli' are
:}    expensive instructions).
:}
:}					-Matt
:}					Matthew Dillon 
:}					<dillon@backplane.com>
:
:
:	I believe ipending wants to go away totally. It really
:isn't meaningful in the thread environment and the locked operations
:needed to support it once multiple processor are running
:in the kernel are more expensive the sti, cli.
:
:Chuck

    They're less expensive overall.  Think about it... how many times
    do you get and release the scheduler lock verses how many interrupts
    you take in a second.   In a loaded system we might be doing 
    10,000 scheduler lock operations a sec, or more, but still only be
    doing 800 interrupts/sec.  It's a matter of streamlining the 
    critical path.  This is why cli/sti was removed from the spl*() code
    in the first place.

    If we are going to be using mutexes heavily, being able to remove the
    cli/sti will cut the mutex overhead by around 35%, and more if there is
    no contention for the mutex.

    I think ipending should stay in.  Having the flexibility may prove
    useful.  For example, if one cpu can't take an interrupt due to holding
    the scheduler lock another idle cpu can take it and at least get 
    most of the state pushed before spinning on the scheduler mutex.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5 15:39:50 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id B730037B72A
	for <smp@FreeBSD.ORG>; Wed,  5 Jul 2000 15:39:43 -0700 (PDT)
	(envelope-from tlambert@usr05.primenet.com)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id PAA25249;
	Wed, 5 Jul 2000 15:40:01 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp05.primenet.com, id smtpdAAA_raifX; Wed Jul  5 15:39:48 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id PAA26386;
	Wed, 5 Jul 2000 15:39:18 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200007052239.PAA26386@usr05.primenet.com>
Subject: Re: SMP meeting summary
To: cp@bsdi.com (Chuck Paterson)
Date: Wed, 5 Jul 2000 22:39:18 +0000 (GMT)
Cc: eischen@vigrid.com (Daniel Eischen), grog@lemis.com (Greg Lehey),
	jasone@canonware.com (Jason Evans),
	luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG
In-Reply-To: <200007031528.JAA26798@berserker.bsdi.com> from "Chuck Paterson" at Jul 03, 2000 09:28:34 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> 	In general there ought not to be multiple processes piling
> up on a mutex. If there are and for some reason they can't be
> fixed then these particular mutexs are going to dictate how this
> area is handled. Once we have these cases in hand we can make
> some decisions as to how to proceed.

The atime mutex on directories in which parallel compiles are
being attempted, when one uses data protection instead of
critical sectioning as the reason for the mutex.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5 16: 1: 2 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id 1A22B37B8B6
	for <smp@FreeBSD.ORG>; Wed,  5 Jul 2000 16:00:58 -0700 (PDT)
	(envelope-from tlambert@usr05.primenet.com)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id QAA02345;
	Wed, 5 Jul 2000 16:01:17 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp05.primenet.com, id smtpdAAAO.aOHe; Wed Jul  5 16:01:13 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id QAA26840;
	Wed, 5 Jul 2000 16:00:49 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200007052300.QAA26840@usr05.primenet.com>
Subject: Re: SMP meeting summary
To: cp@bsdi.com (Chuck Paterson)
Date: Wed, 5 Jul 2000 23:00:49 +0000 (GMT)
Cc: grog@lemis.com (Greg Lehey), eischen@vigrid.com (Daniel Eischen),
	jasone@canonware.com (Jason Evans),
	luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG
In-Reply-To: <200007040218.UAA01169@berserker.bsdi.com> from "Chuck Paterson" at Jul 03, 2000 08:18:00 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> }I'm not sure we're talking about the same thing, but if so I must be
> }missing something.  If I'm waiting on a mutex, I still need to
> }reacquire it on wakeup, don't I?  In that case, only the first process
> }to be scheduled will actually get the mutex, and the others will block
> }again.
> 
> 	Yes, you need to acquire the mutex on wakeup, but likely
> one process will run acquiring and releasing the mutex in an
> uncontested fashion before other processes run and do the same
> thing.

You can assume in SVR4, in the wake_one case, that you will be
the only process awake, and so your acquisition will not be
contested, and will not result in a sleep.

Logically, you can consider that there is one waiter and N-1
sleepers for every N processses trying to acquire a mutex.

This is normally handled [in the literature] by using a hybrid
lock in a hierarchy.

That is, you attempt a fast lock, and if that fails, then you
attempt a slow ("sleeping") lock.  You are guaranteed a wakeup
on release of a fast lock, and on release of a sleeping lock,
so it's sixes,

Of course, it's a lot easier to just critical section.


> }In my experience, I've seen mutexes used for long-term waits, and I
> }don't see any a priori reason not to do so.  Of course, if we make
> }design decisions based on the assumption that all waits will be short,
> }then we will have a reason, but it won't be a good one.
> }
> }Before you say that long-term waits are evil, note that we're probably
> }talking about different kinds of waits.  Obviously anything that
> }threatens to keep the system idle while it waits is bad, but a
> }replacement for tsleep(), say, can justifiably wait for a long time.
> 
> 	A replacement for tsleep is not a mutex, but in Solaris
> parlance a conditional variable. The uses are different, one is
> for locking a resource, the other is waiting on a synch event. A
> conditional variable, like the sleep queues has a mutex associated
> with it. This mutex is not held except while processing the event,
> both by the process waiting and the process doing the activation.
> I don't think it is a good idea to assume that the heuristics for
> waking up tsleep / conditional variables  is going to be
> anything like those seen with mutexs.

Effectively, condition variables are critical sectioned in their
manipulation through the use of a mutex.

In practice, there are some ugly areas in the Solaris SMP
reentrant VFS code that necessitate trating the cond variable
as if it were a mutext on a larger structure.  This reduces
concurrency considerably.

The main point about wake_one that's problematic is the deadly
embrace deadlock, not the priority inversion deadlock, which
can always be "opted out of" by lending (or making the wake_one
more choosy about who it wakes, above and beyond the head of
the wait queue).

The thing that makes a thundering herd expensive is less the
herd than it is the traversal of the list; think about it: if
I have the cycles to burn in the scheduler to pick someone to
run, then I wasn't doing important other work anyway, and I
might as well burn them in the herd, as opposed to other places
I could burn them.

A spinlock fixes this by implementing back-off + retry, at
least for sets of two locks.  Sets of more locks are really
problematic.

A lot of work was done in SVR4 ES/MP to, effectively, resolve
the problem using Djikstra's "Banker's Algorithm" (that is, all
the resources for sets of greater than two members, and in some
cases, one member -- usually parent directory in a descending
path lookup -- are allocated "up front", which is to say "at
the same stack depth/in the same function" to permit state to
be backed out easily in the case of a deadlock detection).

This stuff is really unsatisfying from the point of view of
someone trying to write a reentrant ("kernel thread safe" or
"kernel preemption safe") VFS provider of some kind, since
it's really hard to know when the semantics applied by an upper
level function might result in a problem.  Other subsystems
have similar issues, but most of my experience was with VFS
providers, so I can't give you the PCMCIA device attach issues
in SVR4 (maybe we can track down Kurt Mahon, though).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5 16: 7: 5 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
	by hub.freebsd.org (Postfix) with ESMTP id EDCDA37BB9F
	for <smp@FreeBSD.ORG>; Wed,  5 Jul 2000 16:06:52 -0700 (PDT)
	(envelope-from tlambert@usr05.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.9.3/8.9.3) id QAA07200;
	Wed, 5 Jul 2000 16:05:49 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp02.primenet.com, id smtpdAAA3Fa42n; Wed Jul  5 16:05:37 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id QAA27004;
	Wed, 5 Jul 2000 16:06:32 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200007052306.QAA27004@usr05.primenet.com>
Subject: Re: SMP meeting summary
To: grog@lemis.com (Greg Lehey)
Date: Wed, 5 Jul 2000 23:06:32 +0000 (GMT)
Cc: cp@bsdi.com (Chuck Paterson),
	eischen@vigrid.com (Daniel Eischen),
	jasone@canonware.com (Jason Evans),
	luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG
In-Reply-To: <20000704120930.G94351@wantadilla.lemis.com> from "Greg Lehey" at Jul 04, 2000 12:09:30 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> I've never been able to understand the advantages of
> conditional variables, which may be my viewpoint, or it may be some
> basic lack of understanding.

You can use the address of the mutex in a condition variable
in the same set of sleep/contention spaces that you use any
other mutex.

This means that you can do deadlock detection, without having
to consider multiple name spaces (e.g. one for mutexes, and
another for event flags).

The other neat thing is that you can treat them opaquely in
the manipulation routines, so long as the address of the
mutex is always what's used, and you don't know if it is a
mutex protecting a structure, or an event flag, or something
else.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5 16:10:59 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
	by hub.freebsd.org (Postfix) with ESMTP id C2FE037BA29
	for <smp@FreeBSD.ORG>; Wed,  5 Jul 2000 16:10:55 -0700 (PDT)
	(envelope-from tlambert@usr05.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.9.3/8.9.3) id QAA08551;
	Wed, 5 Jul 2000 16:09:51 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp02.primenet.com, id smtpdAAAzeaiNq; Wed Jul  5 16:09:41 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id QAA27112;
	Wed, 5 Jul 2000 16:10:39 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200007052310.QAA27112@usr05.primenet.com>
Subject: Re: SMP meeting summary
To: bright@wintelcom.net (Alfred Perlstein)
Date: Wed, 5 Jul 2000 23:10:39 +0000 (GMT)
Cc: grog@lemis.com (Greg Lehey),
	jeroen@vangelderen.org (Jeroen C. van Gelderen),
	eischen@vigrid.com (Daniel Eischen),
	jasone@canonware.com (Jason Evans),
	luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG
In-Reply-To: <20000703220823.Z25571@fw.wintelcom.net> from "Alfred Perlstein" at Jul 03, 2000 10:08:24 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> The idea is that for spin or spin-then-sleep mutexes (very short
> hold time) is that since you won't have as many processes as cpus
> contending (and when you do it's ok) that the mutual exclusion is
> so short lived that by the time the next 'thundering' process is
> actually given the CPU, the likelyhood is that other processes have
> already aquired _and_ released the spinlock making it more than
> likely that the reasource is free.
> 
> The idea is that the a quantum is actually so great that there's
> little chance of one of the wake_all processes colliding on the
> lock.

This is a bogus idea, both in the case of a large number of
processors, and in quantum ownership case.

The quantum ownership case is "so long as I have work to do, if
the scheduler gave me a quantum, it's my damn quantum!".  In other
words, the idea of voluntary preemption or semivoluntary preenption,
such as one might get when the system makes a process blocke merely
because it has made a system call that can't be immediately
satisfied.  A multithreaded of FSA process doesn't care about a
single blocking context: it wants to use the remainder of its
quantum.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5 16:21:36 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by hub.freebsd.org (Postfix) with ESMTP id 5FB1E37BD3C
	for <freebsd-smp@freebsd.org>; Wed,  5 Jul 2000 16:21:31 -0700 (PDT)
	(envelope-from grog@wantadilla.lemis.com)
Received: (from grog@localhost)
	by wantadilla.lemis.com (8.9.3/8.9.3) id IAA00499;
	Thu, 6 Jul 2000 08:51:03 +0930 (CST)
	(envelope-from grog)
Date: Thu, 6 Jul 2000 08:51:03 +0930
From: Greg Lehey <grog@lemis.com>
To: Chuck Paterson <cp@bsdi.com>
Cc: Matthew Dillon <dillon@apollo.backplane.com>,
	David Greenman <dg@root.com>, freebsd-smp@freebsd.org
Subject: Re: SMP progress (was: Stepping on Toes)
Message-ID: <20000706085103.P97425@wantadilla.lemis.com>
References: <200007051652.KAA14768@berserker.bsdi.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <200007051652.KAA14768@berserker.bsdi.com>
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wednesday,  5 July 2000 at 10:52:23 -0600, Chuck Paterson wrote:
>>
>>    Jake Burkholder is porting the BSD/OS mutexes.  I don't expect there
>>    to be much of a difference in regards to your heavy-weight interrupt
>>    work.  I'm going to take a look at Jake's patchset tonight.  I think
>>    the only operational item we need to research is the sti/cli stuff in
>>    the BSDI mutexes... we should be able to remove them at some point
>>    (my interrupt code is already using the ipending mechanism to deal
>>    with the scheduler mutex being active on the current cpu).
>>
>>    If Jake's removed that, then we'll want to put it back in at some point
>>    since it saves a significant amount of overhead ('sti' and 'cli' are
>>    expensive instructions).
>>
>>					-Matt
>>					Matthew Dillon
>>					<dillon@backplane.com>
>
>
> 	I believe ipending wants to go away totally. It really isn't
> meaningful in the thread environment and the locked operations
> needed to support it once multiple processor are running in the
> kernel are more expensive the sti, cli.

Agreed.  I can't see any meaning in it, either.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Wed Jul  5 16:30:42 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id A26C937B77A
	for <smp@FreeBSD.ORG>; Wed,  5 Jul 2000 16:30:38 -0700 (PDT)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id e65NTQQ18535;
	Wed, 5 Jul 2000 16:29:26 -0700 (PDT)
Date: Wed, 5 Jul 2000 16:29:26 -0700
From: Alfred Perlstein <bright@wintelcom.net>
To: Terry Lambert <tlambert@primenet.com>
Cc: Greg Lehey <grog@lemis.com>,
	"Jeroen C. van Gelderen" <jeroen@vangelderen.org>,
	Daniel Eischen <eischen@vigrid.com>,
	Jason Evans <jasone@canonware.com>,
	Luoqi Chen <luoqi@watermarkgroup.com>, smp@FreeBSD.ORG
Subject: Re: SMP meeting summary
Message-ID: <20000705162925.V25571@fw.wintelcom.net>
References: <20000703220823.Z25571@fw.wintelcom.net> <200007052310.QAA27112@usr05.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <200007052310.QAA27112@usr05.primenet.com>; from tlambert@primenet.com on Wed, Jul 05, 2000 at 11:10:39PM +0000
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

* Terry Lambert <tlambert@primenet.com> [000705 16:11] wrote:
> > The idea is that for spin or spin-then-sleep mutexes (very short
> > hold time) is that since you won't have as many processes as cpus
> > contending (and when you do it's ok) that the mutual exclusion is
> > so short lived that by the time the next 'thundering' process is
> > actually given the CPU, the likelyhood is that other processes have
> > already aquired _and_ released the spinlock making it more than
> > likely that the reasource is free.
> > 
> > The idea is that the a quantum is actually so great that there's
> > little chance of one of the wake_all processes colliding on the
> > lock.
> 
> This is a bogus idea, both in the case of a large number of
> processors, and in quantum ownership case.

You are correct in that it's bogus for _large_ number of processors,
but for small numbers it makes a lot of sense.  It would work nicely
if one could attempt to schedule the processes in order that they
were unblocked.

Even on system with a signifigant amount of CPUs, sssuming the
system is busy, then between actually scheduling these thundering
processes on other CPUs and having them run there will be enough
time to avoid another pileup on the mutex.

If the CPUs are not busy and collisions occur, well then the
collisions are free because we have cycles to burn. :)

> The quantum ownership case is "so long as I have work to do, if
> the scheduler gave me a quantum, it's my damn quantum!".  In other
> words, the idea of voluntary preemption or semivoluntary preenption,
> such as one might get when the system makes a process blocke merely
> because it has made a system call that can't be immediately
> satisfied.  A multithreaded of FSA process doesn't care about a
> single blocking context: it wants to use the remainder of its
> quantum.

I'm not sure I understand this nor how it applies.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Thu Jul  6  1:23:25 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from anchor-post-32.mail.demon.net (anchor-post-32.mail.demon.net [194.217.242.90])
	by hub.freebsd.org (Postfix) with ESMTP id 4C10337B7F0
	for <freebsd-smp@freebsd.org>; Thu,  6 Jul 2000 01:23:14 -0700 (PDT)
	(envelope-from dfr@nlsystems.com)
Received: from nlsys.demon.co.uk ([158.152.125.33] helo=herring.nlsystems.com)
	by anchor-post-32.mail.demon.net with esmtp (Exim 2.12 #1)
	id 13A6wB-000NFy-0W; Thu, 6 Jul 2000 09:23:12 +0100
Received: from salmon.nlsystems.com (salmon.nlsystems.com [10.0.0.3])
	by herring.nlsystems.com (8.9.3/8.8.8) with ESMTP id JAA71694;
	Thu, 6 Jul 2000 09:23:26 +0100 (BST)
	(envelope-from dfr@nlsystems.com)
Date: Thu, 6 Jul 2000 09:26:51 +0100 (BST)
From: Doug Rabson <dfr@nlsystems.com>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Greg Lehey <grog@lemis.com>, Chuck Paterson <cp@bsdi.com>,
	David Greenman <dg@root.com>, freebsd-smp@freebsd.org
Subject: Re: SMP progress (was: Stepping on Toes)
In-Reply-To: <200007051643.JAA88121@apollo.backplane.com>
Message-ID: <Pine.BSF.4.21.0007060925340.6058-100000@salmon.nlsystems.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, 5 Jul 2000, Matthew Dillon wrote:

> :...
> :>     At this moment, without interrupt threads, interrupts can share Giant
> :>     with the curproc they interrupted.  This is how our existing MP stuff
> :>     worked already.
> :>
> :>     When Greg moves interrupts to their own threads, and obtains Giant to
> :>     run those interrupts, no more sharing will occur and just the fact
> :>     that the interrupt is holding Giant guarentees that nobody else will
> :>     be messing with SPLs, thus the SPLs can be removed entirely.
> :
> :Agreed.  I'm in the process of implementing the heavy-weight interrupt
> :processes now.  I've just taken a look at your web page and note that
> :the URL no longer exists; in conjunction with the discussion above,
> :I'm no longer sure how far you are.  Are you importing the BSD/OS code
> :now?
> :
> :We should probably take the rest of this offline, but I wanted to
> :discuss how we do things.  My idea is:
> :
> :1.  You import the BSD/OS mutexes.
> :2.  I import/implement the heavy-weight interrupt code, which I will
> :    endeavour to get working relatively reliably.  This should be a
> :    fallback while I break^H^H^H^H^Himplement light-wait interrupt
> :    threads.
> :3.  You and I test our stuff together until it can stay up for an hour
> :    or so (exact time to be determined by Jason, who'll be carrying
> :    the can).
> :4.  We commit the marginally stable stuff.
> :5.  I carry on working on the light-weight threads.
> :
> :Any comments?
> :
> :Greg
> 
>     Jake Burkholder is porting the BSD/OS mutexes.  I don't expect there
>     to be much of a difference in regards to your heavy-weight interrupt
>     work.  I'm going to take a look at Jake's patchset tonight.  I think
>     the only operational item we need to research is the sti/cli stuff in
>     the BSDI mutexes... we should be able to remove them at some point
>     (my interrupt code is already using the ipending mechanism to deal
>     with the scheduler mutex being active on the current cpu).  
> 
>     If Jake's removed that, then we'll want to put it back in at some point
>     since it saves a significant amount of overhead ('sti' and 'cli' are
>     expensive instructions).

A spin lock which is used from both top and bottom halves *must* disable
interrupts, surely. Since we will only really end up with approximately
one of these (sched_lock) I don't think there is a real problem.

-- 
Doug Rabson				Mail:  dfr@nlsystems.com
Nonlinear Systems Ltd.			Phone: +44 20 8442 9037


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Thu Jul  6 10:55:36 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 4D5E237C033
	for <freebsd-smp@FreeBSD.ORG>; Thu,  6 Jul 2000 10:55:31 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id KAA94534;
	Thu, 6 Jul 2000 10:43:23 -0700 (PDT)
	(envelope-from dillon)
Date: Thu, 6 Jul 2000 10:43:23 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200007061743.KAA94534@apollo.backplane.com>
To: Doug Rabson <dfr@nlsystems.com>
Cc: Greg Lehey <grog@lemis.com>, Chuck Paterson <cp@bsdi.com>,
	David Greenman <dg@root.com>, freebsd-smp@FreeBSD.ORG
Subject: Re: SMP progress (was: Stepping on Toes)
References:  <Pine.BSF.4.21.0007060925340.6058-100000@salmon.nlsystems.com>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

:
:A spin lock which is used from both top and bottom halves *must* disable
:interrupts, surely. Since we will only really end up with approximately
:one of these (sched_lock) I don't think there is a real problem.
:
:-- 
:Doug Rabson				Mail:  dfr@nlsystems.com

    Read the document I wrote a month ago that's sitting on 
    apollo.backplane.com/FreeBSDSmp/

    There are two ways to do this:

    (1) (The way I implemented it)

	For the schedular mutex, which is the only spin mutex in the
	system, interrupts are left enabled.

	If an interrupt occurs the interrupt vector code will test to
	see if the current cpu holds the schedular mutex.  If it does,
	the code will set the appropriate ipending bit and return
	immediately.

	Pending interrupts are tested and run when the schedular mutex
	is released.

	Advantages:

	    * Removes sti/cli from the critical mutex path, which is
	      executed far more often then interrupts are.

	    * Allows interrupts to be forwarded if we wanted to forward
	      them, rather then blocking (at some point in the future)

	    * Allows passive pickups of interrupts by idle processors
	      (at some point in the future)

	    * Allows us to implement a separate interrupt scheduler, if
	      we want to (at some point in the future).

	    * Allows us to implement a parallel spl mechanism (though
	      apparently nobody is interested in doing that).

	Disadvantages:

	     None.


    (2) (The way BSDI implemented it)

	When the schedular mutex is obtained interrupts are disabled,
	blocking interrupts from occuring.   When the schedular mutex
	is released, interrupts are reenabled.

	An interrupt thus cannot occur while the schedular mutex is being
	held by the current process.

	Advantages:

	    * Slightly less complex code.

	Disadvantages:

	    * Does not allow any manipulation of pending interrupts to
	      occur while the schedular lock is held.

	    * Causes sti/cli to be run quite often, which slows down the
	      critical path for spin mutexes.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
   

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Thu Jul  6 11:55:26 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from web1403.mail.yahoo.com (web1403.mail.yahoo.com [128.11.23.167])
	by hub.freebsd.org (Postfix) with SMTP id B859B37B89E
	for <freebsd-smp@freebsd.org>; Thu,  6 Jul 2000 11:55:23 -0700 (PDT)
	(envelope-from fgozzo@yahoo.com)
Received: (qmail 20778 invoked by uid 60001); 6 Jul 2000 18:55:20 -0000
Message-ID: <20000706185520.20776.qmail@web1403.mail.yahoo.com>
Received: from [128.210.251.12] by web1403.mail.yahoo.com; Thu, 06 Jul 2000 11:55:20 PDT
Date: Thu, 6 Jul 2000 11:55:20 -0700 (PDT)
From: Fabio Gozzo <fgozzo@yahoo.com>
Subject: i820/i815 Chipset ?
To: freebsd-smp@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Hello,

I'm about to buy a new computer and I'm just wondering if FreeBSD runs
well on those new chipsets. I remeber someone having problems with i820
some time ago. Is that still true ?
Additionally, if someone could give me good recomendations to dual PIII
motherboards, I would be very greatfull.
Thank you,

Fabio

__________________________________________________
Do You Yahoo!?
Send instant messages & get email alerts with Yahoo! Messenger.
http://im.yahoo.com/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Thu Jul  6 14:58:48 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from io.yi.org (24.67.218.186.bc.wave.home.com [24.67.218.186])
	by hub.freebsd.org (Postfix) with ESMTP
	id E452637B84E; Thu,  6 Jul 2000 14:58:46 -0700 (PDT)
	(envelope-from jburkhol@home.com)
Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1])
	by io.yi.org (Postfix) with ESMTP
	id 68B90BA4E; Thu,  6 Jul 2000 14:58:57 -0700 (PDT)
X-Mailer: exmh version 2.1.1 10/15/1999
To: jasone@freebsd.org
Cc: smp@freebsd.org
Subject: mutex(9)
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 06 Jul 2000 14:58:57 -0700
From: Jake Burkholder <jburkhol@home.com>
Message-Id: <20000706215857.68B90BA4E@io.yi.org>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


I asked Sheldon Hearn to have a look at the BSD/OS mutex man page and
get it ready for inclusion into FreeBSD.

http://people.freebsd.org/~jake/mutex.9

Thanks Sheldon!
-- 
It may be that the asymptotic advantage doesn't set in until well after the
limits of human interest.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Thu Jul  6 16:51:47 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP id BE22037B78C
	for <smp@FreeBSD.ORG>; Thu,  6 Jul 2000 16:51:41 -0700 (PDT)
	(envelope-from tlambert@usr02.primenet.com)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id QAA08119;
	Thu, 6 Jul 2000 16:51:00 -0700 (MST)
Received: from usr02.primenet.com(206.165.6.202)
 via SMTP by smtp03.primenet.com, id smtpdAAAtOaaWp; Thu Jul  6 16:50:54 2000
Received: (from tlambert@localhost)
	by usr02.primenet.com (8.8.5/8.8.5) id QAA08404;
	Thu, 6 Jul 2000 16:51:09 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200007062351.QAA08404@usr02.primenet.com>
Subject: Re: SMP meeting summary
To: bright@wintelcom.net (Alfred Perlstein)
Date: Thu, 6 Jul 2000 23:51:08 +0000 (GMT)
Cc: tlambert@primenet.com (Terry Lambert),
	grog@lemis.com (Greg Lehey),
	jeroen@vangelderen.org (Jeroen C. van Gelderen),
	eischen@vigrid.com (Daniel Eischen),
	jasone@canonware.com (Jason Evans),
	luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG
In-Reply-To: <20000705162925.V25571@fw.wintelcom.net> from "Alfred Perlstein" at Jul 05, 2000 04:29:26 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > > The idea is that the a quantum is actually so great that there's
> > > little chance of one of the wake_all processes colliding on the
> > > lock.
> > 
> > This is a bogus idea, both in the case of a large number of
> > processors, and in quantum ownership case.
> 
> You are correct in that it's bogus for _large_ number of processors,
> but for small numbers it makes a lot of sense.  It would work nicely
> if one could attempt to schedule the processes in order that they
> were unblocked.

What's small?


> Even on system with a signifigant amount of CPUs, sssuming the
> system is busy, then between actually scheduling these thundering
> processes on other CPUs and having them run there will be enough
> time to avoid another pileup on the mutex.
> 
> If the CPUs are not busy and collisions occur, well then the
> collisions are free because we have cycles to burn. :)

This is not a safe assumption; consider the case of a distributed
cluster which supported migration.  High communications latencies
between processors are destructive.

Equally, on low latency communications links, such as in a 4-way
Xeon system, the amount of interprocessor contention on cache
line invalidation is significant.

If there isn't such conetntion, why lock it?

If there is such contention, why take the invalidation hit for
all processors, instead of (on average) 1.5?


From a research perspective, consider the case of goal-based
computing, goal-based computing where the participants have
incomplete information, and cooperative robotics.

These are fields of research where your simplification will make
use of BSD impossible.


> > The quantum ownership case is "so long as I have work to do, if
> > the scheduler gave me a quantum, it's my damn quantum!".  In other
> > words, the idea of voluntary preemption or semivoluntary preenption,
> > such as one might get when the system makes a process block merely
> > because it has made a system call that can't be immediately
> > satisfied.  A multithreaded of FSA process doesn't care about a
> > single blocking context: it wants to use the remainder of its
> > quantum.
> 
> I'm not sure I understand this nor how it applies.

A multithreaded file system architecture could be used to
implement a concurrent "team" type program, which transfered
data from one region in a contention domain to another.  If
that were implemented using your approach, there would by
lockstep read/write/read/write, rather than concurrent operation
with a single latency (e.g. read/write+read/write+read/.../write).

This is the moral equivalent of a sliding window for data copies
on contention domains, involving two processors.

One of the basic flaws of SVR4 based Dynix for a very long time
(and perhaps still; I haven't been inside a Sequent box for 2
years now) was that the FSA was not multithreaded.

If you want a more prosaic example, take any web server you
currently operate, and replace the GIF images with CGI's that
invoke the "team" program to stream the data out to the
requester, instead of delivering the image directly.

Note the almost 170% improvement in download time for the
pages (this is the same reason that "sendfile" is a stupid idea,
given that it can never achieve this improvement becuase it can
never achieve this concurrency, even if you could send all the
data in the UNIX disk format).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Thu Jul  6 20: 7:53 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178])
	by hub.freebsd.org (Postfix) with ESMTP id 4736437B698
	for <smp@freebsd.org>; Thu,  6 Jul 2000 20:07:50 -0700 (PDT)
	(envelope-from keichii@peorth.iteration.net)
Received: by peorth.iteration.net (Postfix, from userid 1000)
	id 80ACF64C0E; Thu,  6 Jul 2000 22:07:53 -0500 (CDT)
Date: Thu, 6 Jul 2000 22:07:53 -0500
From: "Michael C. Wu" <keichii@peorth.iteration.net>
To: smp@freebsd.org
Subject: Re: i820/i815 Chipset ?
Message-ID: <20000706220753.D21156@peorth.iteration.net>
Mail-Followup-To: "Michael C. Wu" <keichii@peorth.iteration.net>,
	smp@freebsd.org
References: <20000706185520.20776.qmail@web1403.mail.yahoo.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <20000706185520.20776.qmail@web1403.mail.yahoo.com>; from fgozzo@yahoo.com on Thu, Jul 06, 2000 at 11:55:20AM -0700
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Thu, Jul 06, 2000 at 11:55:20AM -0700, Fabio Gozzo scribbled:
| I'm about to buy a new computer and I'm just wondering if FreeBSD runs
| well on those new chipsets. I remeber someone having problems with i820
| some time ago. Is that still true ?

Do not know. Sorry.

| Additionally, if someone could give me good recomendations to dual PIII
| motherboards, I would be very greatfull.
| Thank you,

Tyan dual boards have worked well for me.
---end quoted text---

P.S. I don't know if asking this question in -smp is a very idea.

Regards,

--
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Fri Jul  7  8:17:37 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id 989B237C1AB
	for <freebsd-smp@freebsd.org>; Fri,  7 Jul 2000 08:17:29 -0700 (PDT)
	(envelope-from cp@berserker.bsdi.com)
Received: from berserker.bsdi.com (cp@localhost [127.0.0.1])
	by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id JAA12975;
	Fri, 7 Jul 2000 09:16:20 -0600 (MDT)
Message-Id: <200007071516.JAA12975@berserker.bsdi.com>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Doug Rabson <dfr@nlsystems.com>, Greg Lehey <grog@lemis.com>,
	David Greenman <dg@root.com>, freebsd-smp@freebsd.org
Date: Fri, 07 Jul 2000 09:16:20 -0600
From: Chuck Paterson <cp@berserker.bsdi.com>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Subject: Spin Locks, blocking interrupts, and ipending
From: Chuck Paterson <cp@bsdi.com>
Fcc: outbound
--------

Matthew Dillon wrote on: Thu, 06 Jul 2000 10:43:23 PDT

}
}    There are two ways to do this:
}
}    (1) (The way I implemented it)

Removed stuff

}       Disadvantages:
}
}            None.
}


One disadvantage is that the actual mechanism to implement
the lazy interrupts is going to be a bitch to actually make work
right/well on multiple processors in parallel, further complicated
by the light weight context switches.  Also I'm pretty sure interrupts
don't want to get pended to by the pic level, but rather on individual
apic pins.

A second disadvantage is that interrupts will get randomly delivered to
a processor holding a spin mutex rather than run immediately on a
processor not holding a spin mutex.  This is a problem with the
current scheme also, and not fixed with masks and ipending.


Having said this, it may be that Matt is correct and in the long
term we will want to ditch the cli/sti's in mutices. I really hope
we don't have to go down the lazy path, my mind goes numb just
thinking about the corner cases.

Some random data points.

    The assumption that the scheduler lock will be the only
    spin mutex is wrong. BSD/OS currently has about 5 which
    are pretty much architecture independent. Most/all of these
    are at one time or another acquired while the scheduler
    lock is held.

    On Sparc BSD/OS already has the notion of a priority
    level associated with a spin lock. Interrupts are blocked
    by changing the interrupting block level, which on Sparc
    is very cheap.

    When a processor is cli'd IPIs are not delivered. In general
    this made life lots easier in BSD/OS. It is possible that
    a condition will arise where is it required to deliver
    an IPI while another processor holds a spin lock. This
    did occur on Sparc with some low cache code (stuff done
    in hardware on Intel). This particular IPI is delivered
    at a level higher than just acquiring a mutex will not block.


    It is likely that some devices will want a lower level half
    that always runs in a borrowed context and is protected
    by spin locks. Currently in BSD/OS the Sparc zs driver
    falls into this category. It sure seems likely that
    the X86 com driver will also fall into this category. Using
    sti/cli does not preclude this, but it does preclude
    a driver of this type interrupting another driver of this
    type. 

    In order to avoid a deadly embrace spin mutice must block all
    interrupts of mutice which which are logically before them in
    the locking order. This lends itself to a priority scheme for
    blocking rather than a unordered bit mask. This is not to
    say that a bit mask couldn't be used to implement a priority
    scheme.

    The task priority register could used to block interrupts on
    Pentium. The code could then work much like Sparc. This has
    the advantage of allowing some spin mutice to be interrupted
    without the added complexity of ipending. The real good thing
    is that the interrupt will get dispatched to a processor able
    to handle it, rather than just being pended on a processor
    which has it blocked. I don't yet know how expensive writes to
    the task priority register are.

    The cost of spin mutice are not that big of a deal. They are the
    more expensive mutex, and generally when the are acquired
    something very expensive, like a task switch, is already
    occuring. The cost for use with drivers like the com driver is
    not an issue. Just taking the interrupt and then talking to the 
    hardware totally swamps the cost of the mutex.
    
Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Fri Jul  7  8:57:58 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 15F9737BF9A
	for <freebsd-smp@FreeBSD.ORG>; Fri,  7 Jul 2000 08:57:26 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id IAA00648;
	Fri, 7 Jul 2000 08:54:37 -0700 (PDT)
	(envelope-from dillon)
Date: Fri, 7 Jul 2000 08:54:37 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200007071554.IAA00648@apollo.backplane.com>
To: Chuck Paterson <cp@berserker.bsdi.com>
Cc: Doug Rabson <dfr@nlsystems.com>, Greg Lehey <grog@lemis.com>,
	David Greenman <dg@root.com>, freebsd-smp@FreeBSD.ORG
Subject: 
References:  <200007071516.JAA12975@berserker.bsdi.com>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

:One disadvantage is that the actual mechanism to implement
:the lazy interrupts is going to be a bitch to actually make work
:right/well on multiple processors in parallel, further complicated
:by the light weight context switches.  Also I'm pretty sure interrupts
:don't want to get pended to by the pic level, but rather on individual
:apic pins.
:
:A second disadvantage is that interrupts will get randomly delivered to
:a processor holding a spin mutex rather than run immediately on a
:processor not holding a spin mutex.  This is a problem with the
:current scheme also, and not fixed with masks and ipending.

    Well, the first isn't a disadvantage, since I implemented it a few
    weeks ago.  It was trivial.

    The second one doesn't apply to cli/sti verses ipending.  I don't
    see any relationship.  Interrupt delivery is not controlled by
    cli/sti, it's controlled by the APIC.

:Having said this, it may be that Matt is correct and in the long
:term we will want to ditch the cli/sti's in mutices. I really hope
:we don't have to go down the lazy path, my mind goes numb just
:thinking about the corner cases.
:
:Some random data points.
:
:    The assumption that the scheduler lock will be the only
:    spin mutex is wrong. BSD/OS currently has about 5 which
:    are pretty much architecture independent. Most/all of these
:    are at one time or another acquired while the scheduler
:    lock is held.

    A per-process spin-held counter would address this both for
    the schedular mutex and any other spin mutex.

:    On Sparc BSD/OS already has the notion of a priority
:    level associated with a spin lock. Interrupts are blocked
:    by changing the interrupting block level, which on Sparc
:    is very cheap.
:
:    When a processor is cli'd IPIs are not delivered. In general
:    this made life lots easier in BSD/OS. It is possible that
:    a condition will arise where is it required to deliver
:    an IPI while another processor holds a spin lock. This
:    did occur on Sparc with some low cache code (stuff done
:    in hardware on Intel). This particular IPI is delivered
:    at a level higher than just acquiring a mutex will not block.

    IPI's could be an issue since they don't have equivalent 
    interrupt bits in irunning or ipending, and so can't be defered.
    I don't know of any FreeBSD IPIs that can't 'just run', with or
    without the scheduler mutex.  In this case IPIs would not have to
    be defered even if the scheduler mutex were held by the current
    cpu.

    The only thing we use IPIs for seriously are VM page operations,
    to invalidate pte's on other cpu's, and for interrupt forwarding
    (which never worked quite right anyway).  The former occurs only while
    Giant is held.

:...  lots of good stuff removed

:Chuck
:

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Fri Jul  7 10:49:39 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id A13F037BF22
	for <freebsd-smp@freebsd.org>; Fri,  7 Jul 2000 10:49:34 -0700 (PDT)
	(envelope-from cp@berserker.bsdi.com)
Received: from berserker.bsdi.com (cp@localhost [127.0.0.1])
	by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id LAA14029;
	Fri, 7 Jul 2000 11:48:34 -0600 (MDT)
Message-Id: <200007071748.LAA14029@berserker.bsdi.com>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Doug Rabson <dfr@nlsystems.com>, Greg Lehey <grog@lemis.com>,
	David Greenman <dg@root.com>, freebsd-smp@freebsd.org
Subject: Re: Spin Locks, blocking interrupts, and ipending
From: Chuck Paterson <cp@bsdi.com>
Date: Fri, 07 Jul 2000 11:48:34 -0600
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Finally get a subject line.

}
}    Well, the first isn't a disadvantage, since I implemented it a few
}    weeks ago.  It was trivial.

	I maintain the the implementation to work on single
running processor is not in the same category of difficulty as that
needed to work on multiple processors.

}
}    The second one doesn't apply to cli/sti verses ipending.  I don't
}    see any relationship.  Interrupt delivery is not controlled by
}    cli/sti, it's controlled by the APIC.
}

	However, if the APIC is programmed to deliver interrupts
to the right place, or not deliver them as the case may be, then
there is no need for the ipending stuff. There is no way just
using ipending to get the "right" job done was the point.


}:
}:Some random data points.
}:
}:    The assumption that the scheduler lock will be the only
}:    spin mutex is wrong. BSD/OS currently has about 5 which
}:    are pretty much architecture independent. Most/all of these
}:    are at one time or another acquired while the scheduler
}:    lock is held.
}
}    A per-process spin-held counter would address this both for
}    the scheduler mutex and any other spin mutex.
}

	The comment I originally made here wasn't aimed at anything
Matt said, but rather the comments others had made to the effect
that the scheduler mutex was likely to be the only spin mutex.

}
}    The only thing we use IPIs for seriously are VM page operations,
}    to invalidate pte's on other cpu's, and for interrupt forwarding
}    (which never worked quite right anyway).  The former occurs only while
}    Giant is held.

Hopefully it will eventually be the case that Giant isn't held when
IPI's are sent. BSD/OS already dispatches pcpu clock IPIs without
holding any locks.  It certainly is the case with BSD/OS that the
goal is to make Giant go away totally as soon as possible. However,
I don't think this fundamentally changes anything.

Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Fri Jul  7 13: 2:51 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 0332437B743
	for <freebsd-smp@FreeBSD.ORG>; Fri,  7 Jul 2000 13:02:47 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id NAA01310;
	Fri, 7 Jul 2000 13:02:41 -0700 (PDT)
	(envelope-from dillon)
Date: Fri, 7 Jul 2000 13:02:41 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200007072002.NAA01310@apollo.backplane.com>
To: Chuck Paterson <cp@bsdi.com>
Cc: Doug Rabson <dfr@nlsystems.com>, Greg Lehey <grog@lemis.com>,
	David Greenman <dg@root.com>, freebsd-smp@FreeBSD.ORG
Subject: Re: Spin Locks, blocking interrupts, and ipending
References:  <200007071748.LAA14029@berserker.bsdi.com>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


:Finally get a subject line.
:
:}
:}    Well, the first isn't a disadvantage, since I implemented it a few
:}    weeks ago.  It was trivial.
:
:	I maintain the the implementation to work on single
:running processor is not in the same category of difficulty as that
:needed to work on multiple processors.

    For ipending it IS the same category.  At least for i386.  If you look at
    the patchset you will see that the ipending code I wrote
    is 100% MP safe.

:}    The second one doesn't apply to cli/sti verses ipending.  I don't
:}    see any relationship.  Interrupt delivery is not controlled by
:}    cli/sti, it's controlled by the APIC.
:}
:
:	However, if the APIC is programmed to deliver interrupts
:to the right place, or not deliver them as the case may be, then
:there is no need for the ipending stuff. There is no way just
:using ipending to get the "right" job done was the point.

    You are assuming that you can program APIC at every state change that
    would otherwise effect interrupt performance.  That sounds rather 
    messy to me.  With the ipending mechanism (which gives you the ability
    to enter into an interrupt even with spin locks held), you only need
    to decide what to do with the interrupt in *one* place in the code.

    We had (and have) code strewn all over the FreeBSD to deal with the
    APIC.  It's a mess, and a mistake.  A few strategic places, sure,
    but everywhere?  No.

    For example, take the case where you want to tie a NIC interrupt to
    a particular cpu.  You could tie the interrupt and simply leave it
    that way, and if the interrupt occurs at an inopportune time the 
    interrupt vector code can choose what to do:

	* set ipending and return (the interrupt will be run the moment
	  the scheduler lock is released)

	* forward the interrupt to another cpu

	* do something else...

:}    A per-process spin-held counter would address this both for
:}    the scheduler mutex and any other spin mutex.
:}
:
:	The comment I originally made here wasn't aimed at anything
:Matt said, but rather the comments others had made to the effect
:that the scheduler mutex was likely to be the only spin mutex.
:
:}
:}    The only thing we use IPIs for seriously are VM page operations,
:}    to invalidate pte's on other cpu's, and for interrupt forwarding
:}    (which never worked quite right anyway).  The former occurs only while
:}    Giant is held.
:
:Hopefully it will eventually be the case that Giant isn't held when
:IPI's are sent. BSD/OS already dispatches pcpu clock IPIs without
:holding any locks.  It certainly is the case with BSD/OS that the
:goal is to make Giant go away totally as soon as possible. However,
:I don't think this fundamentally changes anything.
:
:Chuck

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message


From owner-freebsd-smp  Fri Jul  7 14: 4:23 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id 730C837BFE6
	for <freebsd-smp@freebsd.org>; Fri,  7 Jul 2000 14:04:11 -0700 (PDT)
	(envelope-from cp@berserker.bsdi.com)
Received: from berserker.bsdi.com (cp@localhost [127.0.0.1])
	by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id PAA16442;
	Fri, 7 Jul 2000 15:03:43 -0600 (MDT)
Message-Id: <200007072103.PAA16442@berserker.bsdi.com>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Doug Rabson <dfr@nlsystems.com>, Greg Lehey <grog@lemis.com>,
	David Greenman <dg@root.com>, freebsd-smp@freebsd.org
Subject: Re: Spin Locks, blocking interrupts, and ipending 
From: Chuck Paterson <cp@bsdi.com>
Date: Fri, 07 Jul 2000 15:03:43 -0600
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


}
}    For ipending it IS the same category.  At least for i386.  If you look at
}    the patchset you will see that the ipending code I wrote
}    is 100% MP safe.

	It may be true that ipending itself is safe, but 

	1) Wrapping all the code in Giant isn't the same as being MP safe.
	    Especially when the code to acquire Giant just can work
	    at the start of an interrupt stub.

	2) If you ever have a second processor you will soon get to
                mtp->mtx_lock = newv;
                panic("blocking");			<----
                mtx_enter(&SchedMutex, MTX_SPIN);

	3) The code which deals with masking interrupts in the
	   hardware has to be MP aware. Functions like Xresume
	   may happen on multiple processors at the same time,
	   at least once stuff moves out from under Giant.

}
}    We had (and have) code strewn all over the FreeBSD to deal with the
}    APIC.  It's a mess, and a mistake.  A few strategic places, sure,
}    but everywhere?  No.
}

You definitely don't need the code strewn all over the place. There
is no reason it can't be encapsulated.


}    For example, take the case where you want to tie a NIC interrupt to
}    a particular cpu.  You could tie the interrupt and simply leave it
}    that way, and if the interrupt occurs at an inopportune time the 
}    interrupt vector code can choose what to do:
}

	What exactly do you mean by tie an interrupt to a particular
	CPU? Route in hardware, is the really possible on X86? 
	Route in software?


}	* set ipending and return (the interrupt will be run the moment
}	  the scheduler lock is released)

	For an edge triggered device you need to do more than set ipending.
	You need to muck with the hardware. You will need pcpu ipending
	if you want to cause it to run on a particular processor

}
}	* forward the interrupt to another cpu
}
}	* do something else...
}

Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message