From owner-freebsd-smp Sun Jul 2 18:52:41 2000 Delivered-To: freebsd-smp@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 8232237BEB3 for ; Sun, 2 Jul 2000 18:52:28 -0700 (PDT) (envelope-from grog@wantadilla.lemis.com) Received: (from grog@localhost) by wantadilla.lemis.com (8.9.3/8.9.3) id LAA62040; Mon, 3 Jul 2000 11:22:03 +0930 (CST) (envelope-from grog) Date: Mon, 3 Jul 2000 11:22:03 +0930 From: Greg Lehey To: Matthew Dillon Cc: Chuck Paterson , David Greenman , freebsd-smp@freebsd.org Subject: SMP progress (was: Stepping on Toes) Message-ID: <20000703112203.B61851@wantadilla.lemis.com> References: <200006261650.KAA17801@berserker.bsdi.com> <200006271742.KAA35851@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <200006271742.KAA35851@apollo.backplane.com> Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tuesday, 27 June 2000 at 10:42:20 -0700, Matthew Dillon wrote: > >>> Even with interrupt threads we have the GiantMutex issue... the same >>> issue that we have with our current MP implementation. We cannot remove >>> SPL's until we remove the GiantMutex, and we cannot remove GiantMutex >>> without major modifications to just about every single source file in sys/ >> >> In general this isn't true. If you get to the point where >> >> 1) All entrance to unsafe code is proteced by Giant. >> 2) Tsleep and friend if any release Giant when they >> a process is suspended and re-acquire it on exit >> 3) Interrupts have a context to run in. >> 4) You have one or more scheduling locks. >> >> Then you can just turn spls into a nop. There is lots of hand waving >> in regards to details at this point. BSD/OS SMPng is the existance >> proof. >> >> It seems like one of the major problems in retaining spls during >> the change over period is that they don't much useful, and effectively >> push everything under Giant. >> >> Grabbing a spl will only block interrupts, it will not give >> any protection against an interrupt thread which has already >> started. >> >> This means that any device which might be blocked by splbio() can >> not be brought out from under the Giant lock until all instances >> of splbio have been removed. >> >> Chuck > > Yes, I see it. I agree. You don't even need to hold a scheduling > lock... all you need to hold is Giant. > > #1 - done > #2 - done > #3 - (Greg) > #4 - not required I don't understand #4. I thought you had done this. > Right this moment the requirement is that only someone holding Giant > is allowed to mess with spl*()'s (the cpl variable can only be messed > with by people holding Giant). > > At this moment, without interrupt threads, interrupts can share Giant > with the curproc they interrupted. This is how our existing MP stuff > worked already. > > When Greg moves interrupts to their own threads, and obtains Giant to > run those interrupts, no more sharing will occur and just the fact > that the interrupt is holding Giant guarentees that nobody else will > be messing with SPLs, thus the SPLs can be removed entirely. Agreed. I'm in the process of implementing the heavy-weight interrupt processes now. I've just taken a look at your web page and note that the URL no longer exists; in conjunction with the discussion above, I'm no longer sure how far you are. Are you importing the BSD/OS code now? We should probably take the rest of this offline, but I wanted to discuss how we do things. My idea is: 1. You import the BSD/OS mutexes. 2. I import/implement the heavy-weight interrupt code, which I will endeavour to get working relatively reliably. This should be a fallback while I break^H^H^H^H^Himplement light-wait interrupt threads. 3. You and I test our stuff together until it can stay up for an hour or so (exact time to be determined by Jason, who'll be carrying the can). 4. We commit the marginally stable stuff. 5. I carry on working on the light-weight threads. Any comments? Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jul 2 19:16:10 2000 Delivered-To: freebsd-smp@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id F09B637BECD for ; Sun, 2 Jul 2000 19:16:02 -0700 (PDT) (envelope-from grog@wantadilla.lemis.com) Received: (from grog@localhost) by wantadilla.lemis.com (8.9.3/8.9.3) id LAA62161; Mon, 3 Jul 2000 11:45:35 +0930 (CST) (envelope-from grog) Date: Mon, 3 Jul 2000 11:45:35 +0930 From: Greg Lehey To: Daniel Eischen Cc: Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary Message-ID: <20000703114535.T39024@wantadilla.lemis.com> References: <20000626151441.L8965@blitz.canonware.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Monday, 26 June 2000 at 20:00:09 -0400, Daniel Eischen wrote: > On 26 Jun 2000, Jason Evans wrote: > >> On Mon, Jun 26, 2000 at 02:49:57PM -0700, Jason Evans wrote: >>> On Mon, Jun 26, 2000 at 04:13:24PM -0400, Luoqi Chen wrote: >>>>> Processes that block on a mutex are granted the lock in FIFO order, rather >>>>> than priority order. In order to avoid priority inversion, the mutex wait >>>>> queue implements priority lending. >>>>> >>>> Ok. I remember I have read somewhere that solaris 7 has given up the behavior >>>> of waking up only one thread after a mutex is released, now it wakes up all >>>> the blocking threads. It seems that the "thundering herd" problem is not >>>> serious after all if the lock granuity is high enough. >>> >>> I don't think this is the case. >> >> Whoops. The article is broken into two web pages, and the second page >> states exactly what you said: as of Solaris 7, all waiting threads are >> woken up. > > Yes, this confirms what Jim Mauro said in the Solaris Internals course > at USENIX. Since mutexes are held only for very small amounts of time > and the kernel is sufficiently fine-grained, their was no advantage > to calling wake_one() as opposed to wake_all(). Obviously with these > semantics, the waiter with the highest priority should obtain the > mutex. At least that was my recollection... I find this rather strange. There can be many reasons to take a mutex, and not all of them have to be fast. Even in the case where they are, it doesn't seem to be of any value to wake more processes than can take the mutex. From http://www.sunworld.com/sunworldonline/swol-08-1999/swol-08-insidesolaris-2.html: Sun engineering coded the turnstile_wakeup() in Solaris 7 in a generic enough way so that a single thread wakeup could be executed, instead of all threads inevitably waking up together. Exhaustive testing under a variety of different loads has shown that, in practice, we very rarely end up with a large blocking chain of threads, and thus almost never run into the thundering herd problem. The wakeup-all implementation also solves some bit synchronization issues that make a wakeup-one scenario tricky. This seems like a less honest way of saying "We couldn't figure out how to avoid race conditions on wakeup, and so far nobody has been able to point to a thundering herd". I'd need some conviction. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 3:24:18 2000 Delivered-To: freebsd-smp@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 9607F37C048 for ; Mon, 3 Jul 2000 03:24:10 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id GAA06569; Mon, 3 Jul 2000 06:23:28 -0400 (EDT) Date: Mon, 3 Jul 2000 06:23:28 -0400 (EDT) From: Daniel Eischen To: Greg Lehey Cc: Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary In-Reply-To: <20000703114535.T39024@wantadilla.lemis.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, 3 Jul 2000, Greg Lehey wrote: > On Monday, 26 June 2000 at 20:00:09 -0400, Daniel Eischen wrote: > > Yes, this confirms what Jim Mauro said in the Solaris Internals course > > at USENIX. Since mutexes are held only for very small amounts of time > > and the kernel is sufficiently fine-grained, their was no advantage > > to calling wake_one() as opposed to wake_all(). Obviously with these > > semantics, the waiter with the highest priority should obtain the > > mutex. At least that was my recollection... > > I find this rather strange. There can be many reasons to take a > mutex, and not all of them have to be fast. Even in the case where > they are, it doesn't seem to be of any value to wake more processes > than can take the mutex. From > http://www.sunworld.com/sunworldonline/swol-08-1999/swol-08-insidesolaris-2.html: > > Sun engineering coded the turnstile_wakeup() in Solaris 7 in a > generic enough way so that a single thread wakeup could be > executed, instead of all threads inevitably waking up > together. Exhaustive testing under a variety of different loads has > shown that, in practice, we very rarely end up with a large > blocking chain of threads, and thus almost never run into the > thundering herd problem. The wakeup-all implementation also solves > some bit synchronization issues that make a wakeup-one scenario > tricky. > > This seems like a less honest way of saying "We couldn't figure out > how to avoid race conditions on wakeup, and so far nobody has been > able to point to a thundering herd". I'd need some conviction. Well if you are considering spinning for a bit of time on a held mutex (which you seem to advocate?), then why not wake everyone? If mutexes are held for very short periods of time and you don't often have a thundering herd problem, then waking everyone is an optimization since you only have to take the scheduling lock once. If mutexes can be held for long periods of time, then you probably wouldn't want to wake everyone. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 3:31:20 2000 Delivered-To: freebsd-smp@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id F23EB37C05A for ; Mon, 3 Jul 2000 03:31:07 -0700 (PDT) (envelope-from grog@wantadilla.lemis.com) Received: (from grog@localhost) by wantadilla.lemis.com (8.9.3/8.9.3) id UAA63751; Mon, 3 Jul 2000 20:00:39 +0930 (CST) (envelope-from grog) Date: Mon, 3 Jul 2000 20:00:39 +0930 From: Greg Lehey To: Daniel Eischen Cc: Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary Message-ID: <20000703200039.H62680@wantadilla.lemis.com> References: <20000703114535.T39024@wantadilla.lemis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Monday, 3 July 2000 at 6:23:28 -0400, Daniel Eischen wrote: > On Mon, 3 Jul 2000, Greg Lehey wrote: >> On Monday, 26 June 2000 at 20:00:09 -0400, Daniel Eischen wrote: >>> Yes, this confirms what Jim Mauro said in the Solaris Internals course >>> at USENIX. Since mutexes are held only for very small amounts of time >>> and the kernel is sufficiently fine-grained, their was no advantage >>> to calling wake_one() as opposed to wake_all(). Obviously with these >>> semantics, the waiter with the highest priority should obtain the >>> mutex. At least that was my recollection... >> >> I find this rather strange. There can be many reasons to take a >> mutex, and not all of them have to be fast. Even in the case where >> they are, it doesn't seem to be of any value to wake more processes >> than can take the mutex. From >> http://www.sunworld.com/sunworldonline/swol-08-1999/swol-08-insidesolaris-2.html: >> >> Sun engineering coded the turnstile_wakeup() in Solaris 7 in a >> generic enough way so that a single thread wakeup could be >> executed, instead of all threads inevitably waking up >> together. Exhaustive testing under a variety of different loads has >> shown that, in practice, we very rarely end up with a large >> blocking chain of threads, and thus almost never run into the >> thundering herd problem. The wakeup-all implementation also solves >> some bit synchronization issues that make a wakeup-one scenario >> tricky. >> >> This seems like a less honest way of saying "We couldn't figure out >> how to avoid race conditions on wakeup, and so far nobody has been >> able to point to a thundering herd". I'd need some conviction. > > Well if you are considering spinning for a bit of time on a held > mutex (which you seem to advocate?), then why not wake everyone? Because it doesn't buy us anything. > If mutexes are held for very short periods of time and you don't > often have a thundering herd problem, That's an assumption. So far we have *never* had a thundering herd, because the code don't work yet. > then waking everyone is an optimization since you only have to take > the scheduling lock once. No. If I understand things correctly, each process would need to get the schedlock, and only one process can get the mutex. Why wake the rest? What do you want them to do? This applies even in the case of a counting semaphore (of which our "mutex" is a special case), since if any slots are available, the process wouldn't be sleeping. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 7:55:49 2000 Delivered-To: freebsd-smp@freebsd.org Received: from cypherpunks.ai (cypherpunks.ai [209.88.68.47]) by hub.freebsd.org (Postfix) with ESMTP id 4367037BA07 for ; Mon, 3 Jul 2000 07:55:46 -0700 (PDT) (envelope-from jeroen@vangelderen.org) Received: from vangelderen.org (grolsch.ai [209.88.68.214]) by cypherpunks.ai (Postfix) with ESMTP id 4EA4E6D; Mon, 3 Jul 2000 10:55:45 -0400 (AST) Message-ID: <3960A971.982DDF07@vangelderen.org> Date: Mon, 03 Jul 2000 10:55:45 -0400 From: "Jeroen C. van Gelderen" X-Mailer: Mozilla 4.72 [en] (X11; I; Linux 2.2.12 i386) X-Accept-Language: en MIME-Version: 1.0 To: Greg Lehey Cc: Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary References: <20000703114535.T39024@wantadilla.lemis.com> <20000703200039.H62680@wantadilla.lemis.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Greg Lehey wrote: [...] > That's an assumption. So far we have *never* had a thundering herd, > because the code don't work yet. Your position is an assumption too. The difference is that one usually doesn't optimize until one has profiling information available. Am I correct in assuming that you haven't done any profiling yet? Am I correct in assuming that wake_one is an optimization? > > then waking everyone is an optimization since you only have to take > > the scheduling lock once. > > No. If I understand things correctly, each process would need to get > the schedlock, and only one process can get the mutex. Why wake the > rest? What do you want them to do? If -on average- there is only one process waiting you don't want to go trough the trouble of implementing a more complex wake_one. It would only complicate the code with negligible gain. That's my reading of Sun's claims in Solaris and given that they have a little more experience with this kind of thing I'm inclined to believe them until I see facts stating the contrary. Cheers, Jeroen -- Jeroen C. van Gelderen o _ _ _ jeroen@vangelderen.org _o /\_ _ \\o (_)\__/o (_) _< \_ _>(_) (_)/<_ \_| \ _|/' \/ (_)>(_) (_) (_) (_) (_)' _\o_ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 8:29:36 2000 Delivered-To: freebsd-smp@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id 89C8737B7CD for ; Mon, 3 Jul 2000 08:29:31 -0700 (PDT) (envelope-from cp@berserker.bsdi.com) Received: from berserker.bsdi.com (cp@localhost [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id JAA26798; Mon, 3 Jul 2000 09:28:34 -0600 (MDT) Message-Id: <200007031528.JAA26798@berserker.bsdi.com> To: Daniel Eischen Cc: Greg Lehey , Jason Evans , Luoqi Chen , smp@freebsd.org Subject: Re: SMP meeting summary From: Chuck Paterson Date: Mon, 03 Jul 2000 09:28:34 -0600 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org As someone else already pointed out once the OS starts to run there is a whole lot of tuning that needs to go on to choose a mix of compromises that works reasonable with the "general" work load. There have been lots of good suggestions made, which need to be considered once we have something up and running and have accumulated some data. }Well if you are considering spinning for a bit of time on a held }mutex (which you seem to advocate?), then why not wake everyone? }If mutexes are held for very short periods of time and you don't }often have a thundering herd problem, then waking everyone is }an optimization since you only have to take the scheduling lock }once. If mutexes can be held for long periods of time, then you }probably wouldn't want to wake everyone. } }-- }Dan Eischen If all processes are made runnable at once then both future releases and acquisitions of the mutex may be uncontested, resulting in not having to acquire the scheduling lock. If the system is busy and there are not idle CPUs then there won't be a thundering herd, because there is no herd to thunder. The probability of threads blocking on the mutex before it is released is a function of mutex hold time to the time it takes a processor to calling switch with the thread which wants to run being the highest priority. In general mutex hold time is small compared to the time a process runs. Only a single free processor is required to cause a problem when priority inversion has occurred and multiple threads are waiting on the mutex. Both the processor doing the release and the free processor will be picking off the run queue and potentially picking threads which want the same mutex. If someone wanted/needed to build a system where prioritization is so important that processes are preempted even if they are on another processor then making processes, which are just going to block immediately, runnable is going to be very bad. This too can most likely be solved with only a single added test in the path of a contested release. In general there ought not to be multiple processes piling up on a mutex. If there are and for some reason they can't be fixed then these particular mutexs are going to dictate how this area is handled. Once we have these cases in hand we can make some decisions as to how to proceed. Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 8:35:59 2000 Delivered-To: freebsd-smp@freebsd.org Received: from ipamzlx.physik.uni-mainz.de (ipamzlx.Physik.Uni-Mainz.DE [134.93.180.54]) by hub.freebsd.org (Postfix) with ESMTP id B49F537B732 for ; Mon, 3 Jul 2000 08:35:51 -0700 (PDT) (envelope-from ohartman@ipamzlx.physik.uni-mainz.de) Received: from ipamzlx.Physik.Uni-Mainz.DE (ipamzlx.Physik.Uni-Mainz.DE [134.93.180.54]) by ipamzlx.physik.uni-mainz.de (8.9.3/8.9.3) with ESMTP id RAA08401 for ; Mon, 3 Jul 2000 17:37:38 +0200 (CEST) (envelope-from ohartman@ipamzlx.physik.uni-mainz.de) Date: Mon, 3 Jul 2000 17:37:38 +0200 (CEST) From: "O. Hartmann" To: smp@freebsd.org Subject: SMP Problems on ALR QSMP Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Dear Sirs. I posted prior some questions concerning problems with a ALR QSMP aka Siemens PCE-5Smp System. This machine has 256 MB RAM and three P100 CPU boards but I cannot activate all three CPUs because while trying to start the slave CPUs the system crashes. This crash seems to be very courios. When booting in SMP mode, kernel tries to start slave CPU #1 but reports that something failed and asks for going on or halt. When typing for going on, the same procedure occurs for the next CPU. When typing for going on, kernel tries to boot the normal way, but when initializing EISA/ISA interrupts on EISA bus, it freezes completely. It seems that there is something wrong with Interrupt handler or anything else, it is courios to me that the kernel is capable of trying to boot each CPU, failing and trying the next one and then crashing by freezing. I'm now for the past 5 months on with this problem, but no glue why it occurs and how it could be fixed. With only one CPU the machine runs fine, but slow. Here is what mptable says: =============================================================================== MPTable, version 2.0.15 ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: EBDA physical address: 0x0009f0a0 signature: '_MP_' length: 16 bytes version: 1.1 checksum: 0x75 mode: PIC ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x0009f0b5 signature: 'PCMP' base table length: 256 version: 1.1 checksum: 0x60 OEM ID: 'ALR ' Product ID: 'Revolut QSMP' OEM table pointer: 0x00000000 OEM table size: 0 entry count: 22 local APIC address: 0xfee00000 extended table length: 0 extended table checksum: 0 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- Processors: APIC ID Version State Family Model Step Flags 0 0x 1 BSP, usable 5 2 5 0x0181 1 0x 1 AP, usable 5 2 5 0x0181 2 0x 1 AP, usable 5 2 5 0x0181 -- Bus: Bus ID Type 0 EISA 1 PCI -- I/O APICs: APIC ID Version State Address 3 0x01 usable 0xfec00000 -- I/O Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# ExtINT conforms conforms 0 0 3 0 INT conforms conforms 0 1 3 1 INT conforms conforms 0 3 3 3 INT conforms conforms 0 4 3 4 INT conforms conforms 0 5 3 5 INT conforms conforms 0 6 3 6 INT conforms conforms 0 7 3 7 INT conforms conforms 0 8 3 8 INT conforms conforms 0 9 3 9 INT conforms conforms 0 10 3 10 INT conforms conforms 0 11 3 11 INT conforms conforms 0 12 3 12 INT conforms conforms 0 14 3 14 INT conforms conforms 0 15 3 15 -- Local Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# ExtINT conforms conforms 0 0 255 0 NMI conforms conforms 0 0 255 1 ------------------------------------------------------------------------------- # SMP kernel config file options: # Required: options SMP # Symmetric MultiProcessor Kernel options APIC_IO # Symmetric (APIC) I/O # Optional (built-in defaults will work in most cases): #options NCPU=3 # number of CPUs #options NBUS=2 # number of busses #options NAPIC=1 # number of IO APICs #options NINTR=24 # number of INTs This is what dmesg shows (while running in Single CPU mode: Copyright (c) 1992-2000 The FreeBSD Project. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 4.0-STABLE #7: Mon Jul 3 13:37:12 CEST 2000 root@wotan.brainstorm-online.net:/usr/src/sys/compile/WOTAN Timecounter "i8254" frequency 1193150 Hz Timecounter "TSC" frequency 99998757 Hz CPU: Pentium/P54C (100.00-MHz 586-class CPU) Origin = "GenuineIntel" Id = 0x525 Stepping = 5 Features=0x1bf real memory = 268435456 (262144K bytes) avail memory = 257667072 (251628K bytes) Preloaded elf kernel "kernel" at 0xc0364000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc036409c. Intel Pentium detected, installing workaround for F00F bug ccd0-3: Concatenated disk drivers npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 mlx0: port 0x7000-0x707f mem 0x82000000-0x8200007f irq 15 at device 17.0 on pci0 mlx0: DAC960P/PD, 3 channels, firmware 3.51-0-12, 4MB RAM mlxd0: on mlx0 mlxd0: 12288MB (25165824 sectors) RAID 5 (online) de0: port 0x7080-0x70ff mem 0x82000080-0x820000ff irq 14 at device 18.0 on pci0 de0: 21142 [10-100Mb/s] pass 1.1 de0: address 00:80:ad:b6:0f:f6 de1: port 0x7400-0x747f mem 0x82000100-0x8200017f irq 9 at device 19.0 on pci0 de1: 21142 [10-100Mb/s] pass 1.1 de1: address 00:80:ad:b6:0f:ea pci0: at 20.0 irq 15 eisa0: on motherboard mainboard0: on eisa0 slot 0 isa0: on motherboard atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 vga0: at port 0x3b0-0x3df iomem 0xa0000-0xbffff on isa0 fb0 at vga0 sc0: on isa0 sc0: VGA <16 virtual consoles, flags=0x200> aic0: at port 0x340-0x35f irq 11 on isa0 aic0: aic6360, dma, disconnection, parity check fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 flags 0x10 on isa0 sio1: type 16550A ppc0: at port 0x378-0x37f irq 7 drq 1 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode lpt0: on ppbus0 lpt0: Interrupt-driven port IP packet filtering initialized, divert enabled, rule-based forwarding enabled, default to deny, unlimited logging DUMMYNET initialized (000608) BRIDGE 990810, have 3 interfaces -- index 1 type 6 phy 0 addrl 6 addr 00.80.ad.b6.0f.f6 -- index 2 type 6 phy 0 addrl 6 addr 00.80.ad.b6.0f.ea IPsec: Initialized Security Association Processing. Waiting 5 seconds for SCSI devices to settle de0: enabling 10baseT port de1: enabling 10baseT port sa0 at aic0 bus 0 target 3 lun 0 sa0: Removable Sequential Access SCSI-2 device sa0: 5.000MB/s transfers (5.000MHz, offset 8) no devsw (majdev=0 bootdev=0xa0200000) Mounting root from ufs:/dev/mlxd0s1a cd0 at aic0 bus 0 target 5 lun 0 cd0: Removable CD-ROM SCSI-2 device cd0: 4.237MB/s transfers (4.237MHz, offset 8) cd0: Attempt to query device size failed: NOT READY, Medium not present cd1 at aic0 bus 0 target 6 lun 0 cd1: Removable CD-ROM SCSI-2 device cd1: 4.237MB/s transfers (4.237MHz, offset 8) cd1: Attempt to query device size failed: NOT READY, Medium not present I hope this is useful ... many thanks in advance, Gruss O. Hartmann ------------------------------------------------------------------- ohartman@ipamzlx.physik.uni-mainz.de Klimadatenserver des IPA, Universitaet Mainz Netzwerk- und Systembetreuung To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 8:42:10 2000 Delivered-To: freebsd-smp@freebsd.org Received: from magnesium.net (toxic.magnesium.net [207.154.84.15]) by hub.freebsd.org (Postfix) with SMTP id 482D237B885 for ; Mon, 3 Jul 2000 08:42:04 -0700 (PDT) (envelope-from jasone@magnesium.net) Received: (qmail 1539 invoked by uid 1142); 3 Jul 2000 15:42:01 -0000 Date: 3 Jul 2000 08:42:01 -0700 Date: Mon, 3 Jul 2000 08:41:31 -0700 From: Jason Evans To: Greg Lehey Cc: Matthew Dillon , Jake Burkholder , freebsd-smp@freebsd.org Subject: Re: SMP progress (was: Stepping on Toes) Message-ID: <20000703084130.D826@blitz.canonware.com> References: <200006261650.KAA17801@berserker.bsdi.com> <200006271742.KAA35851@apollo.backplane.com> <20000703112203.B61851@wantadilla.lemis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20000703112203.B61851@wantadilla.lemis.com>; from grog@lemis.com on Mon, Jul 03, 2000 at 11:22:03AM +0930 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, Jul 03, 2000 at 11:22:03AM +0930, Greg Lehey wrote: > Agreed. I'm in the process of implementing the heavy-weight interrupt > processes now. I've just taken a look at your web page and note that > the URL no longer exists; in conjunction with the discussion above, > I'm no longer sure how far you are. Are you importing the BSD/OS code > now? > > We should probably take the rest of this offline, but I wanted to > discuss how we do things. My idea is: > > 1. You import the BSD/OS mutexes. Jake Burkholder offered to port the mutex code, and I discussed this with Matt last week, who had no problems with Jake doing it. As of last night, it sounds like Jake essentially has this done for i386, and Doug Rabson will be following soon with the alpha bits. Jake's patch set also includes the pertinent parts of Matt's work (per-CPU idle processes, some of the schedlock changes, etc.). I'll be adding a link to Jake's most recent patch set on the web page (http://people.freebsd.org/~jasone/smp/) shortly. Meanwhile, you can get Jake's patch set at: http://people.freebsd.org/~jake/smpng2.tar > 2. I import/implement the heavy-weight interrupt code, which I will > endeavour to get working relatively reliably. This should be a > fallback while I break^H^H^H^H^Himplement light-wait interrupt > threads. Yep, the stage is set for this work to begin now. > 3. You and I test our stuff together until it can stay up for an hour > or so (exact time to be determined by Jason, who'll be carrying > the can). > 4. We commit the marginally stable stuff. A successful buildworld would be a satisfactory test of stability in my eyes. Hopefully we can do that well. =) Jason To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 8:49: 3 2000 Delivered-To: freebsd-smp@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id 9AFCD37B88C for ; Mon, 3 Jul 2000 08:48:57 -0700 (PDT) (envelope-from cp@berserker.bsdi.com) Received: from berserker.bsdi.com (cp@localhost [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id JAA26972; Mon, 3 Jul 2000 09:47:55 -0600 (MDT) Message-Id: <200007031547.JAA26972@berserker.bsdi.com> To: "Jeroen C. van Gelderen" Cc: Greg Lehey , Daniel Eischen , Jason Evans , Luoqi Chen , smp@freebsd.org Subject: Re: SMP meeting summary From: Chuck Paterson Date: Mon, 03 Jul 2000 09:47:54 -0600 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org }That's my reading of Sun's claims in Solaris and given that }they have a little more experience with this kind of thing }I'm inclined to believe them until I see facts stating the }contrary. I would caution against using Solaris to draw too detailed conclusions. The locking in Solaris is finer grained than we are likely to achieve for some time. Also having per processor run queues and all the associated machinery to support this makes Solaris characterize quite different than what we have today. As time goes on we will have to make decisions on the number of processors we want to support most efficiently. The answer for our problem set may be quite different than what Sun arrived for their problem set. While I have no specific knowledge that this is true, I would not be surprised if the Solaris machine dependent implementation differs between Sparc and X86 with only Sparc being reported. Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 9: 1:11 2000 Delivered-To: freebsd-smp@freebsd.org Received: from tinker.exit.com (exit-gw.power.net [207.151.46.196]) by hub.freebsd.org (Postfix) with ESMTP id 5065737B876 for ; Mon, 3 Jul 2000 09:01:08 -0700 (PDT) (envelope-from frank@exit.com) Received: from realtime.exit.com (realtime.exit.com [206.223.0.5]) by tinker.exit.com (8.9.3/8.9.3) with ESMTP id JAA53786 for ; Mon, 3 Jul 2000 09:01:07 -0700 (PDT) (envelope-from frank@exit.com) Received: (from frank@localhost) by realtime.exit.com (8.9.3/8.9.3) id JAA10893 for smp@freebsd.org; Mon, 3 Jul 2000 09:01:06 -0700 (PDT) (envelope-from frank) From: Frank Mayhar Message-Id: <200007031601.JAA10893@realtime.exit.com> Subject: Re: SMP meeting summary In-Reply-To: from Daniel Eischen at "Jul 3, 2000 06:23:28 am" To: Daniel Eischen Date: Mon, 3 Jul 2000 08:49:43 -0700 (PDT) Cc: Greg Lehey , Jason Evans , Luoqi Chen , smp@FreeBSD.ORGG Reply-To: frank@exit.com Organization: Exit Consulting X-Copyright0: Copyright 2000 Frank Mayhar. All Rights Reserved. X-Copyright1: Permission granted for electronic reproduction as Usenet News or email only. X-Mailer: ELM [version 2.4ME+ PL68 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Daniel Eischen wrote: > Well if you are considering spinning for a bit of time on a held > mutex (which you seem to advocate?), then why not wake everyone? > If mutexes are held for very short periods of time and you don't > often have a thundering herd problem, then waking everyone is > an optimization since you only have to take the scheduling lock > once. If mutexes can be held for long periods of time, then you > probably wouldn't want to wake everyone. I'm going to use Daniel's email as a springboard for a strongly-held opinion. I really, really wish that you guys would get away from using the term "mutex" for both short- and long-term locks. All a mutex is is a lock; it says nothing about what kind of lock, whether it sleeps or spins or even whether it's a reader/writer lock. It's the most generic term available. How about "spinlock" for short-term spin locks, "sleeplock" for long-term blocking locks and "rwlock" for reader/writer locks (modified by whether it's a spinlock or a sleeplock)? That would add clarity, I think. Maybe it's my SVR4/Unixware experience, but I always have to figure out whether someone is talking about a spinlock or a sleeplock from context. To address Dan's real question, it depends on whether, after spinning for a short time, the spinlock decides to become a sleeplock (in other words, it's a hybrid). If it sleeps, depending on the implementation there could be a thundering herd problem, since you have no idea how long it's going to sleep or how many threads/processes will be on the chain. Basically, all a hybrid lock is is an optimization of a sleeplock that speeds up the (possibly common) case of a sleeplock being held for a short time relative to the waiting process. (Any errors here are my own and will undoubtedly be promptly corrected by the audience. :-) -- Frank Mayhar frank@exit.com http://www.exit.com/ Exit Consulting http://store.exit.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 13:15:18 2000 Delivered-To: freebsd-smp@freebsd.org Received: from io.yi.org (24.67.218.186.bc.wave.home.com [24.67.218.186]) by hub.freebsd.org (Postfix) with ESMTP id 3A2DB37C0A0 for ; Mon, 3 Jul 2000 13:15:05 -0700 (PDT) (envelope-from jburkhol@home.com) Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1]) by io.yi.org (Postfix) with ESMTP id 28BA3BA4E; Mon, 3 Jul 2000 13:15:16 -0700 (PDT) X-Mailer: exmh version 2.1.1 10/15/1999 To: Jason Evans Cc: Greg Lehey , Matthew Dillon , freebsd-smp@freebsd.org Subject: Re: SMP progress (was: Stepping on Toes) In-Reply-To: Message from Jason Evans of "Mon, 03 Jul 2000 08:41:31 PDT." <20000703084130.D826@blitz.canonware.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 03 Jul 2000 13:15:16 -0700 From: Jake Burkholder Message-Id: <20000703201516.28BA3BA4E@io.yi.org> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > On Mon, Jul 03, 2000 at 11:22:03AM +0930, Greg Lehey wrote: > > Agreed. I'm in the process of implementing the heavy-weight interrupt > > processes now. I've just taken a look at your web page and note that > > the URL no longer exists; in conjunction with the discussion above, > > I'm no longer sure how far you are. Are you importing the BSD/OS code > > now? > > > > We should probably take the rest of this offline, but I wanted to > > discuss how we do things. My idea is: > > > > 1. You import the BSD/OS mutexes. > > Jake Burkholder offered to port the mutex code, and I discussed this with > Matt last week, who had no problems with Jake doing it. As of last night, > it sounds like Jake essentially has this done for i386, and Doug Rabson > will be following soon with the alpha bits. Jake's patch set also includes > the pertinent parts of Matt's work (per-CPU idle processes, some of the > schedlock changes, etc.). I'll be adding a link to Jake's most recent > patch set on the web page (http://people.freebsd.org/~jasone/smp/) shortly. > Meanwhile, you can get Jake's patch set at: > > http://people.freebsd.org/~jake/smpng2.tar I've just updated this to this mornings -current and added my kernel config. The biggest change is that cpu_switch() no longer disable or enable interrupts directly. Its taken care of by sched_lock since BSD/OS spin mutexes enable and disable interrupts on the first and last release. Protecting the run queues is not really necessary right now, but its a step in the right direction. I haven't dealt with the mp_lock, but I've had this patch running on my UP box for a while, building kernels etc. I think we'll want to make INVARIANTS, INVARIANT_SUPPORT, DIAGNOSTIC, and SMP_DEBUG on by default in -current for a while at least. Every assertion helps. WITNESS currently isn't doing anything because the sched_lock is ignored, but we'll likely want that by default too. Jake > > > 2. I import/implement the heavy-weight interrupt code, which I will > > endeavour to get working relatively reliably. This should be a > > fallback while I break^H^H^H^H^Himplement light-wait interrupt > > threads. > > Yep, the stage is set for this work to begin now. > > > 3. You and I test our stuff together until it can stay up for an hour > > or so (exact time to be determined by Jason, who'll be carrying > > the can). > > 4. We commit the marginally stable stuff. > > A successful buildworld would be a satisfactory test of stability in my > eyes. Hopefully we can do that well. =) > > Jason To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 16: 9:44 2000 Delivered-To: freebsd-smp@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id C716737C0E3 for ; Mon, 3 Jul 2000 16:09:36 -0700 (PDT) (envelope-from grog@wantadilla.lemis.com) Received: (from grog@localhost) by wantadilla.lemis.com (8.9.3/8.9.3) id IAA66554; Tue, 4 Jul 2000 08:38:23 +0930 (CST) (envelope-from grog) Date: Tue, 4 Jul 2000 08:38:22 +0930 From: Greg Lehey To: "Jeroen C. van Gelderen" Cc: Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary Message-ID: <20000704083822.A65029@wantadilla.lemis.com> References: <20000703114535.T39024@wantadilla.lemis.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <3960A971.982DDF07@vangelderen.org> Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Monday, 3 July 2000 at 10:55:45 -0400, Jeroen C. van Gelderen wrote: > Greg Lehey wrote: > [...] >> That's an assumption. So far we have *never* had a thundering herd, >> because the code don't work yet. > > Your position is an assumption too. The difference is that > one usually doesn't optimize until one has profiling > information available. Am I correct in assuming that you > haven't done any profiling yet? Am I correct in assuming > that wake_one is an optimization? You're not correct in your implied assumption that we can see any potential problems with wake_one. >>> then waking everyone is an optimization since you only have to take >>> the scheduling lock once. >> >> No. If I understand things correctly, each process would need to get >> the schedlock, and only one process can get the mutex. Why wake the >> rest? What do you want them to do? > > If -on average- there is only one process waiting you don't > want to go trough the trouble of implementing a more complex > wake_one. It would only complicate the code with negligible > gain. There's nothing to say that wake_one is more complex. wake_one takes the first process on the mutex's sleep list and wakes it. wake_all (or whatever) would make a loop out of that wake function and wake all the processes on the list. All would then be scheduled, try to take the mutex, and all except one would fail and be put back on the sleep list. Does this make sense? > That's my reading of Sun's claims in Solaris and given that they > have a little more experience with this kind of thing I'm inclined > to believe them until I see facts stating the contrary. Sun's problem with Solaris is non-obvious, and may not bite us. I think we should hold off with this kind of discussion for the while. Everything I can see suggests that it's crazy to wake all processes. If we find that we run into race conditions which we can only solve with wake_all, though, we'll compare the effort in fixing them with the (undoubted) performance degradation caused by waking them all. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 16:36:21 2000 Delivered-To: freebsd-smp@freebsd.org Received: from mail.enteract.com (mail.enteract.com [207.229.143.33]) by hub.freebsd.org (Postfix) with ESMTP id 2C43E37BA40 for ; Mon, 3 Jul 2000 16:36:18 -0700 (PDT) (envelope-from dscheidt@enteract.com) Received: from shell-2.enteract.com (dscheidt@shell-2.enteract.com [207.229.143.41]) by mail.enteract.com (8.9.3/8.9.3) with SMTP id SAA49983; Mon, 3 Jul 2000 18:35:55 -0500 (CDT) (envelope-from dscheidt@enteract.com) Date: Mon, 3 Jul 2000 18:35:55 -0500 (CDT) From: David Scheidt To: Greg Lehey Cc: "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary In-Reply-To: <20000704083822.A65029@wantadilla.lemis.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 4 Jul 2000, Greg Lehey wrote: : :There's nothing to say that wake_one is more complex. wake_one takes :the first process on the mutex's sleep list and wakes it. wake_all :(or whatever) would make a loop out of that wake function and wake all :the processes on the list. All would then be scheduled, try to take :the mutex, and all except one would fail and be put back on the sleep :list. Does this make sense? With a wake_one function, you need to be much more careful to avoid priority inversion, and all sorts of other potential races. Solaris's locks are very fine grained, and it wouldn't suprise me at all if their average case was to wake only one process. Under that case, you get a wake_all that performs very much like wake_one, and you get to avoid the overhead of having to sort a sleep queue, or the like. There may be a slight performance penality, but a whole class of deadlock is removed, which makes it easier to produce correct code. In many cases, I'll take a performance hit for availability. It's quite likely, perhaps even certain, that FreeBSD isn't going to be a position where the sleep queues average length is close enough to one for this to be a viable approch in the medium-term. (I haven't had a chance to thourghly understand what the SMP road map looks like) If that is the case, then wake_one would be a win for FreeBSD. David Scheidt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 16:48: 5 2000 Delivered-To: freebsd-smp@freebsd.org Received: from mailhost.iprg.nokia.com (mailhost.iprg.nokia.com [205.226.5.12]) by hub.freebsd.org (Postfix) with ESMTP id 4F90337BA9D for ; Mon, 3 Jul 2000 16:48:01 -0700 (PDT) (envelope-from jre@iprg.nokia.com) Received: from darkstar.iprg.nokia.com (darkstar.iprg.nokia.com [205.226.5.69]) by mailhost.iprg.nokia.com (8.9.3/8.9.3-GLGS) with ESMTP id QAA17821; Mon, 3 Jul 2000 16:47:59 -0700 (PDT) Received: (from mail@localhost) by darkstar.iprg.nokia.com (8.9.3/8.9.3-VIRSCAN) id QAA05600; Mon, 3 Jul 2000 16:47:57 -0700 X-Virus-Scanned: Mon, 3 Jul 2000 16:47:57 -0700 Nokia Silicon Valley Email Exploit Scanner Received: from UNKNOWN (205.226.1.150, claiming to be "iprg.nokia.com") by darkstar with SMTP id smtpdoKR9wT; Mon, 03 Jul 2000 16:47:52 PDT Message-ID: <39612626.3E3AE2C4@iprg.nokia.com> Date: Mon, 03 Jul 2000 16:47:51 -0700 From: Joe Eykholt Organization: Nokia IPRG X-Mailer: Mozilla 4.7 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386) X-Accept-Language: en MIME-Version: 1.0 To: Greg Lehey Cc: "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary References: <20000703114535.T39024@wantadilla.lemis.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Greg Lehey wrote: > There's nothing to say that wake_one is more complex. wake_one takes > the first process on the mutex's sleep list and wakes it. wake_all > (or whatever) would make a loop out of that wake function and wake all > the processes on the list. All would then be scheduled, try to take > the mutex, and all except one would fail and be put back on the sleep > list. Does this make sense? With adaptive mutexes, the threads which are woken will either run one serially on one CPU, or some run at the same time on multiple CPUs. In that case, one gets the lock right away, and the rest SPIN on it (as long as the new owner doesn't get suspended on something else). They don't necessarily go back to sleep on that same lock. I agree it's too early to talk about this degree of optimization, though. Joe Eykholt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 16:55:59 2000 Delivered-To: freebsd-smp@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 8E7BA37BA9D for ; Mon, 3 Jul 2000 16:55:47 -0700 (PDT) (envelope-from grog@wantadilla.lemis.com) Received: (from grog@localhost) by wantadilla.lemis.com (8.9.3/8.9.3) id JAA93985; Tue, 4 Jul 2000 09:22:45 +0930 (CST) (envelope-from grog) Date: Tue, 4 Jul 2000 09:22:45 +0930 From: Greg Lehey To: Chuck Paterson Cc: Daniel Eischen , Jason Evans , Luoqi Chen , smp@freebsd.org Subject: Re: SMP meeting summary Message-ID: <20000704092245.B65029@wantadilla.lemis.com> References: <200007031528.JAA26798@berserker.bsdi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <200007031528.JAA26798@berserker.bsdi.com> Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Monday, 3 July 2000 at 9:28:34 -0600, Chuck Paterson wrote: >> Well if you are considering spinning for a bit of time on a held >> mutex (which you seem to advocate?), then why not wake everyone? >> If mutexes are held for very short periods of time and you don't >> often have a thundering herd problem, then waking everyone is >> an optimization since you only have to take the scheduling lock >> once. If mutexes can be held for long periods of time, then you >> probably wouldn't want to wake everyone. >> >> Dan Eischen > > If all processes are made runnable at once then both future > releases and acquisitions of the mutex may be uncontested, resulting > in not having to acquire the scheduling lock. I'm not sure we're talking about the same thing, but if so I must be missing something. If I'm waiting on a mutex, I still need to reacquire it on wakeup, don't I? In that case, only the first process to be scheduled will actually get the mutex, and the others will block again. > If the system is busy and there are not idle CPUs then there won't > be a thundering herd, because there is no herd to thunder. So we get a creeping herd? If we wake more processes than we can handle, we're just going to spend time putting the rest to sleep again. > The probability of threads blocking on the mutex before it is > released is a function of mutex hold time to the time it takes a > processor to calling switch with the thread which wants to run being > the highest priority. In general mutex hold time is small compared > to the time a process runs. Fine, but there are exceptions. Obviously if we only ever have one thread waiting on the mutex, we don't have any basis for discussion. > In general there ought not to be multiple processes piling > up on a mutex. If there are and for some reason they can't be > fixed then these particular mutexs are going to dictate how this > area is handled. Once we have these cases in hand we can make > some decisions as to how to proceed. In my experience, I've seen mutexes used for long-term waits, and I don't see any a priori reason not to do so. Of course, if we make design decisions based on the assumption that all waits will be short, then we will have a reason, but it won't be a good one. Before you say that long-term waits are evil, note that we're probably talking about different kinds of waits. Obviously anything that threatens to keep the system idle while it waits is bad, but a replacement for tsleep(), say, can justifiably wait for a long time. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 16:59:22 2000 Delivered-To: freebsd-smp@freebsd.org Received: from mailhost.iprg.nokia.com (mailhost.iprg.nokia.com [205.226.5.12]) by hub.freebsd.org (Postfix) with ESMTP id 120E337BA9D for ; Mon, 3 Jul 2000 16:59:16 -0700 (PDT) (envelope-from jre@iprg.nokia.com) Received: from darkstar.iprg.nokia.com (darkstar.iprg.nokia.com [205.226.5.69]) by mailhost.iprg.nokia.com (8.9.3/8.9.3-GLGS) with ESMTP id QAA18361; Mon, 3 Jul 2000 16:59:15 -0700 (PDT) Received: (from mail@localhost) by darkstar.iprg.nokia.com (8.9.3/8.9.3-VIRSCAN) id QAA11098; Mon, 3 Jul 2000 16:59:12 -0700 X-Virus-Scanned: Mon, 3 Jul 2000 16:59:12 -0700 Nokia Silicon Valley Email Exploit Scanner Received: from UNKNOWN (205.226.1.150, claiming to be "iprg.nokia.com") by darkstar with SMTP id smtpdsWDuJ1; Mon, 03 Jul 2000 16:59:06 PDT Message-ID: <396128CD.DC304CAE@iprg.nokia.com> Date: Mon, 03 Jul 2000 16:59:09 -0700 From: Joe Eykholt Organization: Nokia IPRG X-Mailer: Mozilla 4.7 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386) X-Accept-Language: en MIME-Version: 1.0 To: Greg Lehey , "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary References: <20000703114535.T39024@wantadilla.lemis.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com> <39612626.3E Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Joe Eykholt wrote: > > Greg Lehey wrote: > > > There's nothing to say that wake_one is more complex. wake_one takes > > the first process on the mutex's sleep list and wakes it. wake_all > > (or whatever) would make a loop out of that wake function and wake all > > the processes on the list. All would then be scheduled, try to take > > the mutex, and all except one would fail and be put back on the sleep > > list. Does this make sense? > > With adaptive mutexes, the threads which are woken will either run one > serially on one CPU, or some run at the same time on multiple CPUs. > In that case, one gets the lock right away, and the rest SPIN on it > (as long as the new owner doesn't get suspended on something else). > They don't necessarily go back to sleep on that same lock. Another thing that would happen if you don't wake all the waiters, and you have a lot of CPUs, is that a thread that comes along and wants the lock while the new owner is running (or before it's started running) either SPINs or acquires the lock in front of all of the threads that are still asleep. In other words it cuts in line. I guess that could be bad for certain locks and for a very large number of CPUs. If you do wake_all, then they all get the same chance at the lock ... and likely they'll just spin for a short time (at worst) if locks are held briefly. I guess that could be bad for certain locks and for a very large number of CPUs. Joe To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 17:27:49 2000 Delivered-To: freebsd-smp@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 8385B37BCC5 for ; Mon, 3 Jul 2000 17:27:42 -0700 (PDT) (envelope-from grog@wantadilla.lemis.com) Received: (from grog@localhost) by wantadilla.lemis.com (8.9.3/8.9.3) id JAA94126; Tue, 4 Jul 2000 09:57:05 +0930 (CST) (envelope-from grog) Date: Tue, 4 Jul 2000 09:57:05 +0930 From: Greg Lehey To: Joe Eykholt Cc: "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary Message-ID: <20000704095705.D65029@wantadilla.lemis.com> References: <20000703114535.T39024@wantadilla.lemis.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com> <39612626.3E <396128CD.DC304CAE@iprg.nokia.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <396128CD.DC304CAE@iprg.nokia.com> Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Monday, 3 July 2000 at 16:59:09 -0700, Joe Eykholt wrote: > Joe Eykholt wrote: >> >> Greg Lehey wrote: >> >>> There's nothing to say that wake_one is more complex. wake_one takes >>> the first process on the mutex's sleep list and wakes it. wake_all >>> (or whatever) would make a loop out of that wake function and wake all >>> the processes on the list. All would then be scheduled, try to take >>> the mutex, and all except one would fail and be put back on the sleep >>> list. Does this make sense? >> >> With adaptive mutexes, the threads which are woken will either run one >> serially on one CPU, or some run at the same time on multiple CPUs. >> In that case, one gets the lock right away, and the rest SPIN on it >> (as long as the new owner doesn't get suspended on something else). >> They don't necessarily go back to sleep on that same lock. > > Another thing that would happen if you don't wake all the waiters, > and you have a lot of CPUs, is that a thread that comes along and wants > the lock while the new owner is running (or before it's started running) > either SPINs or acquires the lock in front of all of the threads > that are still asleep. In other words it cuts in line. I need to look at the implementation more carefully. My understanding is that the mutexes allow exactly one process to run, independent of the number of processors (or any other resource on which they're waiting). If we can allow multiple (but a finite number of) processes to run concurrently, then we should be using counting semaphores (which are, in fact, very similar to sleeping mutexes). > I guess that could be bad for certain locks and for a very large > number of CPUs. You'll always find good and bad examples. It's difficult to generalize. > If you do wake_all, then they all get the same chance at the lock ... and > likely they'll just spin for a short time (at worst) if locks are held > briefly. If we have a certain number of slots available, then we wake that many. Normally, though, if you're sleeping at all, you've used up all your slots. Consider a counting semaphore which allows four concurrent processes to enter. Initially the counter is set to 4. - process 1 takes semaphore. Counter goes to 3. - process 2 takes semaphore. Counter goes to 2. - process 3 takes semaphore. Counter goes to 1. - process 4 takes semaphore. Counter goes to 0. - process 5 tries to take semaphore. Counter goes to -1, so process 5 sleeps. - process 6 tries to take semaphore. Counter is -1, so process 6 sleeps. - process 2 releases semaphore. Counter goes to 0. It's still not > 0, so nothing further happens. - process 1 releases semaphore. Counter goes to 1. Process 5 gets scheduled, decreasing counter to 0. I'm making assumptions about the exact counter implementation and that process 5 gets scheduled, and not process 6, but they're not relevant to the discussion. Clearly at this point, it wouldn't make any sense to try to schedule process 6, since the counter won't allow it. Things might be slightly different if: - process 1 releases semaphore. Counter goes to 1. *interrupt* - process 3 releases semaphore. Counter goes to 2. Process 5 gets scheduled, decreasing counter to 1. *return from interrupt* - Counter is 1, so process 6 gets scheduled, decreasing the counter to 0. Clearly at this point, if we had a process 7 waiting on the queue, it wouldn't make any sense to have it scheduled too. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 17:42:13 2000 Delivered-To: freebsd-smp@freebsd.org Received: from mailhost.iprg.nokia.com (mailhost.iprg.nokia.com [205.226.5.12]) by hub.freebsd.org (Postfix) with ESMTP id 48EBA37B8A0 for ; Mon, 3 Jul 2000 17:42:10 -0700 (PDT) (envelope-from jre@iprg.nokia.com) Received: from darkstar.iprg.nokia.com (darkstar.iprg.nokia.com [205.226.5.69]) by mailhost.iprg.nokia.com (8.9.3/8.9.3-GLGS) with ESMTP id RAA21990; Mon, 3 Jul 2000 17:42:09 -0700 (PDT) Received: (from mail@localhost) by darkstar.iprg.nokia.com (8.9.3/8.9.3-VIRSCAN) id RAA00969; Mon, 3 Jul 2000 17:42:07 -0700 X-Virus-Scanned: Mon, 3 Jul 2000 17:42:07 -0700 Nokia Silicon Valley Email Exploit Scanner Received: from UNKNOWN (205.226.1.150, claiming to be "iprg.nokia.com") by darkstar with SMTP id smtpdRnE9JC; Mon, 03 Jul 2000 17:42:02 PDT Message-ID: <396132D8.460539CA@iprg.nokia.com> Date: Mon, 03 Jul 2000 17:42:00 -0700 From: Joe Eykholt Organization: Nokia IPRG X-Mailer: Mozilla 4.7 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386) X-Accept-Language: en MIME-Version: 1.0 To: Greg Lehey Cc: "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary References: <20000703114535.T39024@wantadilla.lemis.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com> <39612626.3E Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Greg Lehey wrote: > > On Monday, 3 July 2000 at 16:59:09 -0700, Joe Eykholt wrote: > > Joe Eykholt wrote: > >> > >> Greg Lehey wrote: > >> > >>> There's nothing to say that wake_one is more complex. wake_one takes > >>> the first process on the mutex's sleep list and wakes it. wake_all > >>> (or whatever) would make a loop out of that wake function and wake all > >>> the processes on the list. All would then be scheduled, try to take > >>> the mutex, and all except one would fail and be put back on the sleep > >>> list. Does this make sense? > >> > >> With adaptive mutexes, the threads which are woken will either run one > >> serially on one CPU, or some run at the same time on multiple CPUs. > >> In that case, one gets the lock right away, and the rest SPIN on it > >> (as long as the new owner doesn't get suspended on something else). > >> They don't necessarily go back to sleep on that same lock. > > > > Another thing that would happen if you don't wake all the waiters, > > and you have a lot of CPUs, is that a thread that comes along and wants > > the lock while the new owner is running (or before it's started running) > > either SPINs or acquires the lock in front of all of the threads > > that are still asleep. In other words it cuts in line. > > I need to look at the implementation more carefully. My understanding > is that the mutexes allow exactly one process to run, independent of > the number of processors (or any other resource on which they're > waiting). If we can allow multiple (but a finite number of) processes > to run concurrently, then we should be using counting semaphores > (which are, in fact, very similar to sleeping mutexes). Mutexes only allow one thread to get the mutex, but multiple threads can be spinning for the mutex. Wake_one (or wake_all) doesn't give the mutex to anyone, the threads must to get scheduled by a CPU and then ACQUIRE the mutex ... which is hopefully still free at that time. If we wake_all, then all of the threads will try to get the mutex once they have a CPU. As long as the thread owning the mutex is on a CPU, the other contenders spin. Semaphores are tough to use adaptively, since there's no identifiable 'owner'. Joe To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 19:18:28 2000 Delivered-To: freebsd-smp@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id 3F7CF37C2DF for ; Mon, 3 Jul 2000 19:18:24 -0700 (PDT) (envelope-from cp@berserker.bsdi.com) Received: from berserker.bsdi.com (cp@localhost [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id UAA01169; Mon, 3 Jul 2000 20:18:04 -0600 (MDT) Message-Id: <200007040218.UAA01169@berserker.bsdi.com> To: Greg Lehey Cc: Daniel Eischen , Jason Evans , Luoqi Chen , smp@freebsd.org Subject: Re: SMP meeting summary From: Chuck Paterson Date: Mon, 03 Jul 2000 20:18:00 -0600 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org }I'm not sure we're talking about the same thing, but if so I must be }missing something. If I'm waiting on a mutex, I still need to }reacquire it on wakeup, don't I? In that case, only the first process }to be scheduled will actually get the mutex, and the others will block }again. Yes, you need to acquire the mutex on wakeup, but likely one process will run acquiring and releasing the mutex in an uncontested fashion before other processes run and do the same thing. } }> If the system is busy and there are not idle CPUs then there won't }> be a thundering herd, because there is no herd to thunder. } }So we get a creeping herd? If we wake more processes than we can }handle, we're just going to spend time putting the rest to sleep }again. } Perhaps, but odds are not see next graph. }> The probability of threads blocking on the mutex before it is }> released is a function of mutex hold time to the time it takes a }> processor to calling switch with the thread which wants to run being }> the highest priority. In general mutex hold time is small compared }> to the time a process runs. } }Fine, but there are exceptions. Obviously if we only ever have one }thread waiting on the mutex, we don't have any basis for discussion. } } } }> In general there ought not to be multiple processes piling }> up on a mutex. If there are and for some reason they can't be }> fixed then these particular mutexs are going to dictate how this }> area is handled. Once we have these cases in hand we can make }> some decisions as to how to proceed. } }In my experience, I've seen mutexes used for long-term waits, and I }don't see any a priori reason not to do so. Of course, if we make }design decisions based on the assumption that all waits will be short, }then we will have a reason, but it won't be a good one. } }Before you say that long-term waits are evil, note that we're probably }talking about different kinds of waits. Obviously anything that }threatens to keep the system idle while it waits is bad, but a }replacement for tsleep(), say, can justifiably wait for a long time. A replacement for tsleep is not a mutex, but in Solaris parlance a conditional variable. The uses are different, one is for locking a resource, the other is waiting on a synch event. A conditional variable, like the sleep queues has a mutex associated with it. This mutex is not held except while processing the event, both by the process waiting and the process doing the activation. I don't think it is a good idea to assume that the heuristics for waking up tsleep / conditional variables is going to be anything like those seen with mutexs. Since things have been cuts and pasted I'll say again I don't have a good idea what the right answer is on any of this. I do believe we need to get what we have running, instrument it, and reach some decisions. Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 19:26: 6 2000 Delivered-To: freebsd-smp@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 86E9437C270 for ; Mon, 3 Jul 2000 19:26:03 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id WAA03158; Mon, 3 Jul 2000 22:25:43 -0400 (EDT) Date: Mon, 3 Jul 2000 22:25:32 -0400 (EDT) From: Daniel Eischen To: Greg Lehey Cc: Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary In-Reply-To: <20000703200039.H62680@wantadilla.lemis.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, 3 Jul 2000, Greg Lehey wrote: > On Monday, 3 July 2000 at 6:23:28 -0400, Daniel Eischen wrote: > > Well if you are considering spinning for a bit of time on a held > > mutex (which you seem to advocate?), then why not wake everyone? > > Because it doesn't buy us anything. You only have to take the sleep queue lock once to wake all the waiting threads/processes instead of N times if you have N waiting threads. You only have to take the scheduling queue lock once to place the waiting threads on the scheduling queue. I'm not advocating doing this (right now) in FreeBSD. But this seems like a potential optimization we could possibly do in the future. But in order to do this, we need to make sure that the time mutexes are held is very short. My only suggestion was that we comment those sections of code that can hold mutexes for long periods of time or use a different naming convention for those mutexes. > > If mutexes are held for very short periods of time and you don't > > often have a thundering herd problem, > > That's an assumption. So far we have *never* had a thundering herd, > because the code don't work yet. I wouldn't call the above an assumption. I'm not assuming anything. It is predicated with an "if". If we would/do have a thundering herd problem then I sure wouldn't want to wake_all(). > > > then waking everyone is an optimization since you only have to take > > the scheduling lock once. > > No. If I understand things correctly, each process would need to get > the schedlock, and only one process can get the mutex. Why wake the > rest? What do you want them to do? This applies even in the case of > a counting semaphore (of which our "mutex" is a special case), since > if any slots are available, the process wouldn't be sleeping. No each process would be placed in the run queue with wake_all() semantics. Plus you would only have to take the sleep queue lock once too. Waking processes doesn't mean that they will run immediately. The first process to run will hopefully take the mutex and release it is a timely manner so that by the time the next process runs it can take the mutex uncontested. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 19:36:48 2000 Delivered-To: freebsd-smp@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id CA45D37C4AD for ; Mon, 3 Jul 2000 19:36:34 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id WAA04290; Mon, 3 Jul 2000 22:36:16 -0400 (EDT) Date: Mon, 3 Jul 2000 22:36:14 -0400 (EDT) From: Daniel Eischen To: Joe Eykholt Cc: Greg Lehey , "Jeroen C. van Gelderen" , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary In-Reply-To: <39612626.3E3AE2C4@iprg.nokia.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, 3 Jul 2000, Joe Eykholt wrote: > Greg Lehey wrote: > > > There's nothing to say that wake_one is more complex. wake_one takes > > the first process on the mutex's sleep list and wakes it. wake_all > > (or whatever) would make a loop out of that wake function and wake all > > the processes on the list. All would then be scheduled, try to take > > the mutex, and all except one would fail and be put back on the sleep > > list. Does this make sense? > > With adaptive mutexes, the threads which are woken will either run one > serially on one CPU, or some run at the same time on multiple CPUs. > In that case, one gets the lock right away, and the rest SPIN on it > (as long as the new owner doesn't get suspended on something else). > They don't necessarily go back to sleep on that same lock. Thanks, I forgot about this. Even if you wake multiple threads, they will not be put back to sleep until the owner of the mutex is no longer running. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 19:40:10 2000 Delivered-To: freebsd-smp@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 3A6F737C2D5 for ; Mon, 3 Jul 2000 19:40:00 -0700 (PDT) (envelope-from grog@wantadilla.lemis.com) Received: (from grog@localhost) by wantadilla.lemis.com (8.9.3/8.9.3) id MAA94857; Tue, 4 Jul 2000 12:09:30 +0930 (CST) (envelope-from grog) Date: Tue, 4 Jul 2000 12:09:30 +0930 From: Greg Lehey To: Chuck Paterson Cc: Daniel Eischen , Jason Evans , Luoqi Chen , smp@freebsd.org Subject: Re: SMP meeting summary Message-ID: <20000704120930.G94351@wantadilla.lemis.com> References: <200007040218.UAA01169@berserker.bsdi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <200007040218.UAA01169@berserker.bsdi.com> Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Monday, 3 July 2000 at 20:18:00 -0600, Chuck Paterson wrote: > >> I'm not sure we're talking about the same thing, but if so I must be >> missing something. If I'm waiting on a mutex, I still need to >> reacquire it on wakeup, don't I? In that case, only the first process >> to be scheduled will actually get the mutex, and the others will block >> again. > > Yes, you need to acquire the mutex on wakeup, but likely > one process will run acquiring and releasing the mutex in an > uncontested fashion before other processes run and do the same > thing. Hmmm. Yes, I suppose that would happen in a single processor environment. >>> In general there ought not to be multiple processes piling >>> up on a mutex. If there are and for some reason they can't be >>> fixed then these particular mutexs are going to dictate how this >>> area is handled. Once we have these cases in hand we can make >>> some decisions as to how to proceed. >> >> In my experience, I've seen mutexes used for long-term waits, and I >> don't see any a priori reason not to do so. Of course, if we make >> design decisions based on the assumption that all waits will be short, >> then we will have a reason, but it won't be a good one. >> >> Before you say that long-term waits are evil, note that we're probably >> talking about different kinds of waits. Obviously anything that >> threatens to keep the system idle while it waits is bad, but a >> replacement for tsleep(), say, can justifiably wait for a long time. > > A replacement for tsleep is not a mutex, but in Solaris > parlance a conditional variable. I think we have a certain problem with terminology, and it seems to be clouding the discussion. The big difference between the BSD/OS sleep mutex and the semaphores we used at Tandem (amongst other things for long-term waits) wasn't the counter (which was always set to 1) but the name. > The uses are different, one is for locking a resource, the other is > waiting on a synch event. A conditional variable, like the sleep > queues has a mutex associated with it. This mutex is not held except > while processing the event, both by the process waiting and the > process doing the activation. This is a different paradigm from the one we used. > I don't think it is a good idea to assume that the heuristics for > waking up tsleep / conditional variables is going to be anything > like those seen with mutexs. Maybe. I need to let this go through my head. Just because we found it to be the right idea at Tandem doesn't mean it's the right idea here. I've never been able to understand the advantages of conditional variables, which may be my viewpoint, or it may be some basic lack of understanding. > Since things have been cuts and pasted I'll say again I don't > have a good idea what the right answer is on any of this. I do > believe we need to get what we have running, instrument it, and > reach some decisions. Agreed entirely. At the moment the discussion is academic. When we've done the implementation, we'll have a much better idea about what we really want^H^H^H^Hneed. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 19:41:58 2000 Delivered-To: freebsd-smp@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id C530137C384 for ; Mon, 3 Jul 2000 19:41:49 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id WAA04905; Mon, 3 Jul 2000 22:41:29 -0400 (EDT) Date: Mon, 3 Jul 2000 22:41:27 -0400 (EDT) From: Daniel Eischen To: Greg Lehey Cc: Chuck Paterson , Jason Evans , Luoqi Chen , smp@freebsd.org Subject: Re: SMP meeting summary In-Reply-To: <20000704092245.B65029@wantadilla.lemis.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 4 Jul 2000, Greg Lehey wrote: > On Monday, 3 July 2000 at 9:28:34 -0600, Chuck Paterson wrote: > >> Well if you are considering spinning for a bit of time on a held > >> mutex (which you seem to advocate?), then why not wake everyone? > >> If mutexes are held for very short periods of time and you don't > >> often have a thundering herd problem, then waking everyone is > >> an optimization since you only have to take the scheduling lock > >> once. If mutexes can be held for long periods of time, then you > >> probably wouldn't want to wake everyone. > >> > >> Dan Eischen > > > > If all processes are made runnable at once then both future > > releases and acquisitions of the mutex may be uncontested, resulting > > in not having to acquire the scheduling lock. > > I'm not sure we're talking about the same thing, but if so I must be > missing something. If I'm waiting on a mutex, I still need to > reacquire it on wakeup, don't I? In that case, only the first process > to be scheduled will actually get the mutex, and the others will block > again. > > > If the system is busy and there are not idle CPUs then there won't > > be a thundering herd, because there is no herd to thunder. > > So we get a creeping herd? If we wake more processes than we can > handle, we're just going to spend time putting the rest to sleep > again. > > > The probability of threads blocking on the mutex before it is > > released is a function of mutex hold time to the time it takes a > > processor to calling switch with the thread which wants to run being > > the highest priority. In general mutex hold time is small compared > > to the time a process runs. > > Fine, but there are exceptions. Obviously if we only ever have one > thread waiting on the mutex, we don't have any basis for discussion. > > > > > In general there ought not to be multiple processes piling > > up on a mutex. If there are and for some reason they can't be > > fixed then these particular mutexs are going to dictate how this > > area is handled. Once we have these cases in hand we can make > > some decisions as to how to proceed. > > In my experience, I've seen mutexes used for long-term waits, and I > don't see any a priori reason not to do so. Of course, if we make > design decisions based on the assumption that all waits will be short, > then we will have a reason, but it won't be a good one. > > Before you say that long-term waits are evil, note that we're probably > talking about different kinds of waits. Obviously anything that > threatens to keep the system idle while it waits is bad, but a > replacement for tsleep(), say, can justifiably wait for a long time. Which is why we want condition variables to replace tsleep(). If you want to wait long periods of time, then use condition variables or reader/writer locks. Mutex and condition variables can be used in a very similar way to splXXX() and tsleep(). -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 20:41:21 2000 Delivered-To: freebsd-smp@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id 4AF8E37B5E7 for ; Mon, 3 Jul 2000 20:41:17 -0700 (PDT) (envelope-from cp@berserker.bsdi.com) Received: from berserker.bsdi.com (cp@localhost [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id VAA01866; Mon, 3 Jul 2000 21:40:57 -0600 (MDT) Message-Id: <200007040340.VAA01866@berserker.bsdi.com> To: Greg Lehey Cc: Daniel Eischen , Jason Evans , Luoqi Chen , smp@freebsd.org Subject: Re: SMP meeting summary From: Chuck Paterson Date: Mon, 03 Jul 2000 21:40:57 -0600 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org } }Maybe. I need to let this go through my head. Just because we found }it to be the right idea at Tandem doesn't mean it's the right idea }here. I've never been able to understand the advantages of }conditional variables, which may be my viewpoint, or it may be some }basic lack of understanding. } This is how I think of it: Mutexs are a synchronization mechanism optimized such that they have to be acquired and then released by the same process. Tsleep, conditional variables are a mechanism that is optimized such that it is used for one process to wait for an event posted by another process. A general purpose semaphore could be used for either one. The current use of lock manager locks with bufs is an example of this. Mutexs map well into what hardware such as Intel or Sparc can natively support. Any of the other mechanism are likely to be twice as expensive, requiring two low level locked operations per high level locked operation. This isn't always true as read/writer locks can be as cheap as a mutex for the case where there is a single reader or writer. But once again this is the case where the same process is acquiring and releasing the resource without action from another process. It may well be that Tandem hardware made general purpose semaphores as cheap as mutexs. This could be because there was support for a higher level operation or just a few (or one as in the case of Cray) synchronization registers. In either case software can use general purpose semaphores for everything and not screw with all these hybrids. Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 22: 8:40 2000 Delivered-To: freebsd-smp@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 374C137B8C3 for ; Mon, 3 Jul 2000 22:08:37 -0700 (PDT) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e6458OX14088; Mon, 3 Jul 2000 22:08:24 -0700 (PDT) Date: Mon, 3 Jul 2000 22:08:24 -0700 From: Alfred Perlstein To: Greg Lehey Cc: "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary Message-ID: <20000703220823.Z25571@fw.wintelcom.net> References: <20000703114535.T39024@wantadilla.lemis.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <20000704083822.A65029@wantadilla.lemis.com>; from grog@lemis.com on Tue, Jul 04, 2000 at 08:38:22AM +0930 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org * Greg Lehey [000703 16:10] wrote: > > That's my reading of Sun's claims in Solaris and given that they > > have a little more experience with this kind of thing I'm inclined > > to believe them until I see facts stating the contrary. > > Sun's problem with Solaris is non-obvious, and may not bite us. > > I think we should hold off with this kind of discussion for the while. > Everything I can see suggests that it's crazy to wake all processes. > If we find that we run into race conditions which we can only solve > with wake_all, though, we'll compare the effort in fixing them with > the (undoubted) performance degradation caused by waking them all. The idea is that for spin or spin-then-sleep mutexes (very short hold time) is that since you won't have as many processes as cpus contending (and when you do it's ok) that the mutual exclusion is so short lived that by the time the next 'thundering' process is actually given the CPU, the likelyhood is that other processes have already aquired _and_ released the spinlock making it more than likely that the reasource is free. The idea is that the a quantum is actually so great that there's little chance of one of the wake_all processes colliding on the lock. By effectively you gain a whole lot because you avoid having to grab sched-mutex on each aquire/release and you also reduce the cache cost of wakeups because it's likely that only once kernel context will wind its way down the sleep queue. What sort of interesting is that doing it one way or the other is so similar that in reality the initial implementation doesn't matter, switching from one to the other will be trivial at most, the importance lies in getting one implementation done. -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 3 22:38:16 2000 Delivered-To: freebsd-smp@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id E97C737BC70 for ; Mon, 3 Jul 2000 22:38:07 -0700 (PDT) (envelope-from grog@wantadilla.lemis.com) Received: (from grog@localhost) by wantadilla.lemis.com (8.9.3/8.9.3) id PAA95496; Tue, 4 Jul 2000 15:07:36 +0930 (CST) (envelope-from grog) Date: Tue, 4 Jul 2000 15:07:36 +0930 From: Greg Lehey To: Alfred Perlstein Cc: "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary Message-ID: <20000704150736.H94351@wantadilla.lemis.com> References: <20000703114535.T39024@wantadilla.lemis.com> <20000703200039.H62680@wantadilla.lemis.com> <3960A971.982DDF07@vangelderen.org> <20000704083822.A65029@wantadilla.lemis.com> <20000703220823.Z25571@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <20000703220823.Z25571@fw.wintelcom.net> Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Monday, 3 July 2000 at 22:08:24 -0700, Alfred Perlstein wrote: > What sort of interesting is that doing it one way or the other is > so similar that in reality the initial implementation doesn't > matter, switching from one to the other will be trivial at most, > the importance lies in getting one implementation done. There's a big difference in which implementation we do. The BSD/OS implementation works, at least in the BSD/OS environment. Nothing else has been written. I think it's very important that we get the BSD/OS version up and hobbling before we start redesigning things. By the time we've done that, we'll understand the material so much better that we'll have a double win (working code and an understanding of how to do it better). I'm currently up to my elbows in dead interrupt code, and I'm surprised how much I'm learning [wipes mess off arms]. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jul 4 7:41:42 2000 Delivered-To: freebsd-smp@freebsd.org Received: from barney.ife.no (barney.ife.no [128.39.229.49]) by hub.freebsd.org (Postfix) with ESMTP id 3F2CD37B5A2; Tue, 4 Jul 2000 07:41:35 -0700 (PDT) (envelope-from stein@ife.no) Received: from ife.no (virginis.ife.no [128.39.229.176]) by barney.ife.no (8.9.3/8.9.3) with ESMTP id QAA32593; Tue, 4 Jul 2000 16:41:30 +0200 (MET DST) Message-ID: <3961F79A.1643F238@ife.no> Date: Tue, 04 Jul 2000 16:41:30 +0200 From: "Stein M. Sandbech" Reply-To: stein@ife.no Organization: IFE X-Mailer: Mozilla 4.7 [en] (X11; I; FreeBSD 3.2-RELEASE i386) X-Accept-Language: en MIME-Version: 1.0 To: freebsd-smp@freebsd.org Cc: Mike Smith Subject: Re: Q: SMP on Intel OR840 MB? References: <200003131947.LAA00931@mass.cdrom.com> Content-Type: multipart/alternative; boundary="------------D6160584934825134C81431F" Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org --------------D6160584934825134C81431F Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit FYI. Mike Smith wrote: > Just as a general "heads up", I'm having problems with SMP on an > i840-based board at the moment. I haven't had time to characterise it > (still! 8(), but you should be careful for a little while here. As a followup on Mike`s answer to my initial query on freebsd-SMP. I`ve installed FreeBSD 3.4 Release and FreeBSD 4.0 Release on the Intel OR840 (Outrigger) motherboard without any problems (except for not recognizing the keyboard, did the -Dh on boot prompt -> OK). Configuration: OR840 MB / 256MB RAMBUS memory / 4 UFW SCSI disks / Adaptec 2940U2W / Creative GeForce256 / Tandberg SLR5 streamer / Pioneer SCSI DVD rom / Yamaha CD R / 2 x 800/133MHz Pentium III`s / integrated Intel Pro100+ nic (fxp). Built and installed a SMP kernel on both FreeBSD versions. It runs like a charm, doing simulations (floating point intensive), with periodically heavy interactive use. I can run the mptable on this system, if you`ll find it usefull (I have this machine @home, so ... :)) All in all, an extremely nice system, both HW and OS! --Stein Morten -- /* Stein M Sandbech Email: stein@ife.no ** ** Senior Systems Engineer, EDP dept Email: stein@www.ife.no ** ** Institute for Energy Technology Tel: +47 63 80 60 00 ** ** Box 40, N-2007 Kjeller, NORWAY Fax: +47 63 81 11 68 */ --------------D6160584934825134C81431F Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit FYI.

Mike Smith wrote:

Just as a general "heads up", I'm having problems with SMP on an
i840-based board at the moment.  I haven't had time to characterise it
(still! 8(), but you should be careful for a little while here.
As a followup on Mike`s answer to my initial query on freebsd-SMP.

I`ve installed FreeBSD 3.4 Release and FreeBSD 4.0 Release on the
Intel OR840 (Outrigger) motherboard without any problems (except for
not recognizing the keyboard, did the -Dh on boot prompt -> OK).

Configuration: OR840 MB / 256MB RAMBUS memory / 4 UFW SCSI
disks / Adaptec 2940U2W / Creative GeForce256 / Tandberg SLR5
streamer /  Pioneer SCSI DVD rom / Yamaha CD R / 2 x 800/133MHz
Pentium III`s / integrated Intel Pro100+ nic (fxp).

Built and installed a SMP kernel on both FreeBSD versions. It runs
like a charm, doing simulations (floating point intensive), with periodically
heavy interactive use.

I can run the  mptable  on this system, if you`ll find it usefull
(I have this machine @home, so ...  :))

All in all, an extremely nice system, both HW and OS!
 

--Stein Morten

-- 
/* Stein M Sandbech                  Email: stein@ife.no     **
** Senior Systems Engineer, EDP dept Email: stein@www.ife.no **
** Institute for Energy Technology   Tel: +47 63 80 60 00    **
** Box 40, N-2007 Kjeller, NORWAY    Fax: +47 63 81 11 68    */
  --------------D6160584934825134C81431F-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jul 4 14:45:45 2000 Delivered-To: freebsd-smp@freebsd.org Received: from netplex.com.au (adsl-63-207-30-186.dsl.snfc21.pacbell.net [63.207.30.186]) by hub.freebsd.org (Postfix) with ESMTP id 6B86237B5EA for ; Tue, 4 Jul 2000 14:45:33 -0700 (PDT) (envelope-from peter@netplex.com.au) Received: from netplex.com.au (peter@localhost [127.0.0.1]) by netplex.com.au (8.9.3/8.9.3) with ESMTP id OAA54160; Tue, 4 Jul 2000 14:44:09 -0700 (PDT) (envelope-from peter@netplex.com.au) Message-Id: <200007042144.OAA54160@netplex.com.au> X-Mailer: exmh version 2.1.1 10/15/1999 To: Greg Lehey Cc: Alfred Perlstein , "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary In-Reply-To: Message from Greg Lehey of "Tue, 04 Jul 2000 15:07:36 +0930." <20000704150736.H94351@wantadilla.lemis.com> Date: Tue, 04 Jul 2000 14:44:09 -0700 From: Peter Wemm Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Greg Lehey wrote: > On Monday, 3 July 2000 at 22:08:24 -0700, Alfred Perlstein wrote: > > What sort of interesting is that doing it one way or the other is > > so similar that in reality the initial implementation doesn't > > matter, switching from one to the other will be trivial at most, > > the importance lies in getting one implementation done. > > There's a big difference in which implementation we do. The BSD/OS > implementation works, at least in the BSD/OS environment. Nothing > else has been written. I think it's very important that we get the > BSD/OS version up and hobbling before we start redesigning things. By > the time we've done that, we'll understand the material so much better > that we'll have a double win (working code and an understanding of how > to do it better). I'm currently up to my elbows in dead interrupt > code, and I'm surprised how much I'm learning [wipes mess off arms]. A general comment.. It was made very clear at the SMP meeting that things would have taken a lot less time if they had the "safe but slower" fallback code available right from the start. I feel that it is imperative that we implement a minimal-but-functional set of code that we can trust first and *then* take a shot at the lightweight interrupt context, and do it in such a way that when Weird Shit(TM) starts happening that we can easily fall back to the conservative code so that we can eliminate the optimized lightweight interrupt contexts from suspicion. Having the BSD/OS code available as a starting point is a huge help. We should not have to worry about the mutex or witness code until we are up and running. There are truckloads of optimizations that can be done afterwards, but we must walk first, not run. Doing things conservatively and safely now with an eye towards later optimization will hopefully save our sanity. Whatever we can leverage from BSD/OS as a "known quantity" we should - it will reduce the amount of green or untried code while we get up to speed. If this means that our SMP work looks a lot like BSD/OS, then so what? It doesn't have to stay that way forever. Once we have something that runs and doesn't panic in 3 seconds, then we have something to tune/optimize/ reimplement/whatever. If we all dive in and invent our own stuff right from the start, we will have just as much pain and suffering as the BSDI folks had and it will take just as long (or longer). Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jul 4 15:59:54 2000 Delivered-To: freebsd-smp@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 22A6837B5A9 for ; Tue, 4 Jul 2000 15:59:39 -0700 (PDT) (envelope-from grog@wantadilla.lemis.com) Received: (from grog@localhost) by wantadilla.lemis.com (8.9.3/8.9.3) id IAA97127; Wed, 5 Jul 2000 08:29:00 +0930 (CST) (envelope-from grog) Date: Wed, 5 Jul 2000 08:29:00 +0930 From: Greg Lehey To: Peter Wemm Cc: Alfred Perlstein , "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary Message-ID: <20000705082900.I94351@wantadilla.lemis.com> References: <200007042144.OAA54160@netplex.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <200007042144.OAA54160@netplex.com.au> Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tuesday, 4 July 2000 at 14:44:09 -0700, Peter Wemm wrote: > Greg Lehey wrote: >> On Monday, 3 July 2000 at 22:08:24 -0700, Alfred Perlstein wrote: >>> What sort of interesting is that doing it one way or the other is >>> so similar that in reality the initial implementation doesn't >>> matter, switching from one to the other will be trivial at most, >>> the importance lies in getting one implementation done. >> >> There's a big difference in which implementation we do. The BSD/OS >> implementation works, at least in the BSD/OS environment. Nothing >> else has been written. I think it's very important that we get the >> BSD/OS version up and hobbling before we start redesigning things. By >> the time we've done that, we'll understand the material so much better >> that we'll have a double win (working code and an understanding of how >> to do it better). I'm currently up to my elbows in dead interrupt >> code, and I'm surprised how much I'm learning [wipes mess off arms]. > > A general comment.. It was made very clear at the SMP meeting that > things would have taken a lot less time if they had the "safe but > slower" fallback code available right from the start. I feel that > it is imperative that we implement a minimal-but-functional set of > code that we can trust first and *then* take a shot at the > lightweight interrupt context, and do it in such a way that when > Weird Shit(TM) starts happening that we can easily fall back to the > conservative code so that we can eliminate the optimized lightweight > interrupt contexts from suspicion. Agreed. That's the way I'm going. Is there anything I have said that gives you reason to think I'm advocating something else? > Having the BSD/OS code available as a starting point is a huge help. > We should not have to worry about the mutex or witness code until we > are up and running. For some definition of "worry". > There are truckloads of optimizations that can be done afterwards, > but we must walk first, not run. Doing things conservatively and > safely now with an eye towards later optimization will hopefully > save our sanity. Whatever we can leverage from BSD/OS as a "known > quantity" we should - it will reduce the amount of green or untried > code while we get up to speed. If this means that our SMP work > looks a lot like BSD/OS, then so what? It doesn't have to stay that > way forever. I'm also not advocating change for change's sake. If it turns out that the BSD/OS code is the way to go, then I wouldn't want to change. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jul 4 16:38:23 2000 Delivered-To: freebsd-smp@freebsd.org Received: from netplex.com.au (adsl-63-207-30-186.dsl.snfc21.pacbell.net [63.207.30.186]) by hub.freebsd.org (Postfix) with ESMTP id 8F57737BA82 for ; Tue, 4 Jul 2000 16:38:12 -0700 (PDT) (envelope-from peter@netplex.com.au) Received: from netplex.com.au (peter@localhost [127.0.0.1]) by netplex.com.au (8.9.3/8.9.3) with ESMTP id QAA54794; Tue, 4 Jul 2000 16:06:34 -0700 (PDT) (envelope-from peter@netplex.com.au) Message-Id: <200007042306.QAA54794@netplex.com.au> X-Mailer: exmh version 2.1.1 10/15/1999 To: Greg Lehey Cc: Alfred Perlstein , "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary In-Reply-To: Message from Greg Lehey of "Wed, 05 Jul 2000 08:29:00 +0930." <20000705082900.I94351@wantadilla.lemis.com> Date: Tue, 04 Jul 2000 16:06:34 -0700 From: Peter Wemm Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Greg Lehey wrote: > On Tuesday, 4 July 2000 at 14:44:09 -0700, Peter Wemm wrote: [..] > Agreed. That's the way I'm going. Is there anything I have said that > gives you reason to think I'm advocating something else? No, I was backing you up, not aiming it at you. I was reiterating a point that seemed to have been lost (or forgotten) in the noise. Cheers, -Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 2:24:28 2000 Delivered-To: freebsd-smp@freebsd.org Received: from io.yi.org (24.67.218.186.bc.wave.home.com [24.67.218.186]) by hub.freebsd.org (Postfix) with ESMTP id 9746137B510 for ; Wed, 5 Jul 2000 02:24:21 -0700 (PDT) (envelope-from jburkhol@home.com) Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1]) by io.yi.org (Postfix) with ESMTP id A6B47BA4E; Wed, 5 Jul 2000 02:24:20 -0700 (PDT) X-Mailer: exmh version 2.1.1 10/15/1999 To: Jake Burkholder Cc: Jason Evans , Greg Lehey , Matthew Dillon , freebsd-smp@FreeBSD.ORG Subject: Re: SMP progress (was: Stepping on Toes) In-Reply-To: Message from Jake Burkholder of "Mon, 03 Jul 2000 13:15:16 PDT." <20000703201516.28BA3BA4E@io.yi.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 05 Jul 2000 02:24:20 -0700 From: Jake Burkholder Message-Id: <20000705092420.A6B47BA4E@io.yi.org> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > patch set on the web page (http://people.freebsd.org/~jasone/smp/) shortly. > > Meanwhile, you can get Jake's patch set at: > > > > http://people.freebsd.org/~jake/smpng2.tar > dfr filled me in on how to include new files in a diff, so this is now available as one: http://people.freebsd.org/~jake/smpng.diff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 9:32:25 2000 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 9AC1737B5B5 for ; Wed, 5 Jul 2000 09:32:23 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id JAA87977; Wed, 5 Jul 2000 09:32:23 -0700 (PDT) (envelope-from dillon) Date: Wed, 5 Jul 2000 09:32:23 -0700 (PDT) From: Matthew Dillon Message-Id: <200007051632.JAA87977@apollo.backplane.com> To: freebsd-smp@freebsd.org Subject: Personal URL has moved (also: SMP patchset URL) Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org www.backplane.com -> apollo.backplane.com http://apollo.backplane.com/FreeBSDSmp/ I moved backplane.com and www.backplane.com to point to Backplane Inc's web server, so my stuff is not available via those hostnames any more. Please use 'apollo.backplane.com'. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 9:44:10 2000 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 473BB37C1B4 for ; Wed, 5 Jul 2000 09:44:04 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id JAA88121; Wed, 5 Jul 2000 09:43:57 -0700 (PDT) (envelope-from dillon) Date: Wed, 5 Jul 2000 09:43:57 -0700 (PDT) From: Matthew Dillon Message-Id: <200007051643.JAA88121@apollo.backplane.com> To: Greg Lehey Cc: Chuck Paterson , David Greenman , freebsd-smp@FreeBSD.ORG Subject: Re: SMP progress (was: Stepping on Toes) References: <200006261650.KAA17801@berserker.bsdi.com> <200006271742.KAA35851@apollo.backplane.com> <20000703112203.B61851@wantadilla.lemis.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :... :> At this moment, without interrupt threads, interrupts can share Giant :> with the curproc they interrupted. This is how our existing MP stuff :> worked already. :> :> When Greg moves interrupts to their own threads, and obtains Giant to :> run those interrupts, no more sharing will occur and just the fact :> that the interrupt is holding Giant guarentees that nobody else will :> be messing with SPLs, thus the SPLs can be removed entirely. : :Agreed. I'm in the process of implementing the heavy-weight interrupt :processes now. I've just taken a look at your web page and note that :the URL no longer exists; in conjunction with the discussion above, :I'm no longer sure how far you are. Are you importing the BSD/OS code :now? : :We should probably take the rest of this offline, but I wanted to :discuss how we do things. My idea is: : :1. You import the BSD/OS mutexes. :2. I import/implement the heavy-weight interrupt code, which I will : endeavour to get working relatively reliably. This should be a : fallback while I break^H^H^H^H^Himplement light-wait interrupt : threads. :3. You and I test our stuff together until it can stay up for an hour : or so (exact time to be determined by Jason, who'll be carrying : the can). :4. We commit the marginally stable stuff. :5. I carry on working on the light-weight threads. : :Any comments? : :Greg Jake Burkholder is porting the BSD/OS mutexes. I don't expect there to be much of a difference in regards to your heavy-weight interrupt work. I'm going to take a look at Jake's patchset tonight. I think the only operational item we need to research is the sti/cli stuff in the BSDI mutexes... we should be able to remove them at some point (my interrupt code is already using the ipending mechanism to deal with the scheduler mutex being active on the current cpu). If Jake's removed that, then we'll want to put it back in at some point since it saves a significant amount of overhead ('sti' and 'cli' are expensive instructions). -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 9:53:20 2000 Delivered-To: freebsd-smp@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id A3C6937B509 for ; Wed, 5 Jul 2000 09:53:17 -0700 (PDT) (envelope-from cp@berserker.bsdi.com) Received: from berserker.bsdi.com (cp@localhost [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id KAA14768; Wed, 5 Jul 2000 10:52:24 -0600 (MDT) Message-Id: <200007051652.KAA14768@berserker.bsdi.com> To: Matthew Dillon Cc: Greg Lehey , David Greenman , freebsd-smp@freebsd.org Subject: Re: SMP progress (was: Stepping on Toes) From: Chuck Paterson Date: Wed, 05 Jul 2000 10:52:23 -0600 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org } } Jake Burkholder is porting the BSD/OS mutexes. I don't expect there } to be much of a difference in regards to your heavy-weight interrupt } work. I'm going to take a look at Jake's patchset tonight. I think } the only operational item we need to research is the sti/cli stuff in } the BSDI mutexes... we should be able to remove them at some point } (my interrupt code is already using the ipending mechanism to deal } with the scheduler mutex being active on the current cpu). } } If Jake's removed that, then we'll want to put it back in at some point } since it saves a significant amount of overhead ('sti' and 'cli' are } expensive instructions). } } -Matt } Matthew Dillon } I believe ipending wants to go away totally. It really isn't meaningful in the thread environment and the locked operations needed to support it once multiple processor are running in the kernel are more expensive the sti, cli. Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 9:58:18 2000 Delivered-To: freebsd-smp@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id 4ADF037B638 for ; Wed, 5 Jul 2000 09:58:15 -0700 (PDT) (envelope-from cp@berserker.bsdi.com) Received: from berserker.bsdi.com (cp@localhost [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id KAA14847; Wed, 5 Jul 2000 10:57:59 -0600 (MDT) Message-Id: <200007051657.KAA14847@berserker.bsdi.com> Cc: Matthew Dillon , Greg Lehey , David Greenman , freebsd-smp@freebsd.org Subject: Re: SMP progress (was: Stepping on Toes) From: Chuck Paterson Date: Wed, 05 Jul 2000 10:57:58 -0600 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org } } } I believe ipending wants to go away totally. It really }isn't meaningful in the thread environment and the locked operations }needed to support it once multiple processor are running }in the kernel are more expensive the sti, cli. } }Chuck } I should have said that it is the locked operations needed to supports the masks is the really expensive part, not ipending itself. Also with the spin locks we want to mask interrupts to a particular processor not all processors. Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 10:31:45 2000 Delivered-To: freebsd-smp@freebsd.org Received: from lor.watermarkgroup.com (lor.watermarkgroup.com [207.202.73.33]) by hub.freebsd.org (Postfix) with ESMTP id C248E37BC9B for ; Wed, 5 Jul 2000 10:31:42 -0700 (PDT) (envelope-from luoqi@watermarkgroup.com) Received: (from luoqi@localhost) by lor.watermarkgroup.com (8.10.1/8.10.1) id e65HUGJ11739; Wed, 5 Jul 2000 13:30:17 -0400 (EDT) Date: Wed, 5 Jul 2000 13:30:17 -0400 (EDT) From: Luoqi Chen Message-Id: <200007051730.e65HUGJ11739@lor.watermarkgroup.com> To: cp@bsdi.com Subject: Re: SMP progress (was: Stepping on Toes) Cc: dg@root.com, dillon@apollo.backplane.com, freebsd-smp@FreeBSD.ORG, grog@lemis.com Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > } > } I believe ipending wants to go away totally. It really > }isn't meaningful in the thread environment and the locked operations > }needed to support it once multiple processor are running > }in the kernel are more expensive the sti, cli. > } > }Chuck > } > > I should have said that it is the locked operations > needed to supports the masks is the really expensive part, not > ipending itself. Also with the spin locks we want to mask interrupts > to a particular processor not all processors. > > > Chuck > There's a third way out, we could keep a per-cpu spin lock count, and disallow kernel preemption when the count is non-zero. Of course, this doesn't apply to sched_lock, we'll still have to use sti/cli there. -lq To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 11:28:49 2000 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 837F737C03A for ; Wed, 5 Jul 2000 11:28:45 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id LAA88574; Wed, 5 Jul 2000 11:28:35 -0700 (PDT) (envelope-from dillon) Date: Wed, 5 Jul 2000 11:28:35 -0700 (PDT) From: Matthew Dillon Message-Id: <200007051828.LAA88574@apollo.backplane.com> To: Chuck Paterson Cc: Greg Lehey , David Greenman , freebsd-smp@freebsd.org Subject: Re: SMP progress (was: Stepping on Toes) References: <200007051652.KAA14768@berserker.bsdi.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :} Jake Burkholder is porting the BSD/OS mutexes. I don't expect there :} to be much of a difference in regards to your heavy-weight interrupt :} work. I'm going to take a look at Jake's patchset tonight. I think :} the only operational item we need to research is the sti/cli stuff in :} the BSDI mutexes... we should be able to remove them at some point :} (my interrupt code is already using the ipending mechanism to deal :} with the scheduler mutex being active on the current cpu). :} :} If Jake's removed that, then we'll want to put it back in at some point :} since it saves a significant amount of overhead ('sti' and 'cli' are :} expensive instructions). :} :} -Matt :} Matthew Dillon :} : : : I believe ipending wants to go away totally. It really :isn't meaningful in the thread environment and the locked operations :needed to support it once multiple processor are running :in the kernel are more expensive the sti, cli. : :Chuck They're less expensive overall. Think about it... how many times do you get and release the scheduler lock verses how many interrupts you take in a second. In a loaded system we might be doing 10,000 scheduler lock operations a sec, or more, but still only be doing 800 interrupts/sec. It's a matter of streamlining the critical path. This is why cli/sti was removed from the spl*() code in the first place. If we are going to be using mutexes heavily, being able to remove the cli/sti will cut the mutex overhead by around 35%, and more if there is no contention for the mutex. I think ipending should stay in. Having the flexibility may prove useful. For example, if one cpu can't take an interrupt due to holding the scheduler lock another idle cpu can take it and at least get most of the state pushed before spinning on the scheduler mutex. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 15:39:50 2000 Delivered-To: freebsd-smp@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id B730037B72A for ; Wed, 5 Jul 2000 15:39:43 -0700 (PDT) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id PAA25249; Wed, 5 Jul 2000 15:40:01 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp05.primenet.com, id smtpdAAA_raifX; Wed Jul 5 15:39:48 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id PAA26386; Wed, 5 Jul 2000 15:39:18 -0700 (MST) From: Terry Lambert Message-Id: <200007052239.PAA26386@usr05.primenet.com> Subject: Re: SMP meeting summary To: cp@bsdi.com (Chuck Paterson) Date: Wed, 5 Jul 2000 22:39:18 +0000 (GMT) Cc: eischen@vigrid.com (Daniel Eischen), grog@lemis.com (Greg Lehey), jasone@canonware.com (Jason Evans), luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG In-Reply-To: <200007031528.JAA26798@berserker.bsdi.com> from "Chuck Paterson" at Jul 03, 2000 09:28:34 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > In general there ought not to be multiple processes piling > up on a mutex. If there are and for some reason they can't be > fixed then these particular mutexs are going to dictate how this > area is handled. Once we have these cases in hand we can make > some decisions as to how to proceed. The atime mutex on directories in which parallel compiles are being attempted, when one uses data protection instead of critical sectioning as the reason for the mutex. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 16: 1: 2 2000 Delivered-To: freebsd-smp@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 1A22B37B8B6 for ; Wed, 5 Jul 2000 16:00:58 -0700 (PDT) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id QAA02345; Wed, 5 Jul 2000 16:01:17 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp05.primenet.com, id smtpdAAAO.aOHe; Wed Jul 5 16:01:13 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id QAA26840; Wed, 5 Jul 2000 16:00:49 -0700 (MST) From: Terry Lambert Message-Id: <200007052300.QAA26840@usr05.primenet.com> Subject: Re: SMP meeting summary To: cp@bsdi.com (Chuck Paterson) Date: Wed, 5 Jul 2000 23:00:49 +0000 (GMT) Cc: grog@lemis.com (Greg Lehey), eischen@vigrid.com (Daniel Eischen), jasone@canonware.com (Jason Evans), luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG In-Reply-To: <200007040218.UAA01169@berserker.bsdi.com> from "Chuck Paterson" at Jul 03, 2000 08:18:00 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > }I'm not sure we're talking about the same thing, but if so I must be > }missing something. If I'm waiting on a mutex, I still need to > }reacquire it on wakeup, don't I? In that case, only the first process > }to be scheduled will actually get the mutex, and the others will block > }again. > > Yes, you need to acquire the mutex on wakeup, but likely > one process will run acquiring and releasing the mutex in an > uncontested fashion before other processes run and do the same > thing. You can assume in SVR4, in the wake_one case, that you will be the only process awake, and so your acquisition will not be contested, and will not result in a sleep. Logically, you can consider that there is one waiter and N-1 sleepers for every N processses trying to acquire a mutex. This is normally handled [in the literature] by using a hybrid lock in a hierarchy. That is, you attempt a fast lock, and if that fails, then you attempt a slow ("sleeping") lock. You are guaranteed a wakeup on release of a fast lock, and on release of a sleeping lock, so it's sixes, Of course, it's a lot easier to just critical section. > }In my experience, I've seen mutexes used for long-term waits, and I > }don't see any a priori reason not to do so. Of course, if we make > }design decisions based on the assumption that all waits will be short, > }then we will have a reason, but it won't be a good one. > } > }Before you say that long-term waits are evil, note that we're probably > }talking about different kinds of waits. Obviously anything that > }threatens to keep the system idle while it waits is bad, but a > }replacement for tsleep(), say, can justifiably wait for a long time. > > A replacement for tsleep is not a mutex, but in Solaris > parlance a conditional variable. The uses are different, one is > for locking a resource, the other is waiting on a synch event. A > conditional variable, like the sleep queues has a mutex associated > with it. This mutex is not held except while processing the event, > both by the process waiting and the process doing the activation. > I don't think it is a good idea to assume that the heuristics for > waking up tsleep / conditional variables is going to be > anything like those seen with mutexs. Effectively, condition variables are critical sectioned in their manipulation through the use of a mutex. In practice, there are some ugly areas in the Solaris SMP reentrant VFS code that necessitate trating the cond variable as if it were a mutext on a larger structure. This reduces concurrency considerably. The main point about wake_one that's problematic is the deadly embrace deadlock, not the priority inversion deadlock, which can always be "opted out of" by lending (or making the wake_one more choosy about who it wakes, above and beyond the head of the wait queue). The thing that makes a thundering herd expensive is less the herd than it is the traversal of the list; think about it: if I have the cycles to burn in the scheduler to pick someone to run, then I wasn't doing important other work anyway, and I might as well burn them in the herd, as opposed to other places I could burn them. A spinlock fixes this by implementing back-off + retry, at least for sets of two locks. Sets of more locks are really problematic. A lot of work was done in SVR4 ES/MP to, effectively, resolve the problem using Djikstra's "Banker's Algorithm" (that is, all the resources for sets of greater than two members, and in some cases, one member -- usually parent directory in a descending path lookup -- are allocated "up front", which is to say "at the same stack depth/in the same function" to permit state to be backed out easily in the case of a deadlock detection). This stuff is really unsatisfying from the point of view of someone trying to write a reentrant ("kernel thread safe" or "kernel preemption safe") VFS provider of some kind, since it's really hard to know when the semantics applied by an upper level function might result in a problem. Other subsystems have similar issues, but most of my experience was with VFS providers, so I can't give you the PCMCIA device attach issues in SVR4 (maybe we can track down Kurt Mahon, though). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 16: 7: 5 2000 Delivered-To: freebsd-smp@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id EDCDA37BB9F for ; Wed, 5 Jul 2000 16:06:52 -0700 (PDT) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id QAA07200; Wed, 5 Jul 2000 16:05:49 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp02.primenet.com, id smtpdAAA3Fa42n; Wed Jul 5 16:05:37 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id QAA27004; Wed, 5 Jul 2000 16:06:32 -0700 (MST) From: Terry Lambert Message-Id: <200007052306.QAA27004@usr05.primenet.com> Subject: Re: SMP meeting summary To: grog@lemis.com (Greg Lehey) Date: Wed, 5 Jul 2000 23:06:32 +0000 (GMT) Cc: cp@bsdi.com (Chuck Paterson), eischen@vigrid.com (Daniel Eischen), jasone@canonware.com (Jason Evans), luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG In-Reply-To: <20000704120930.G94351@wantadilla.lemis.com> from "Greg Lehey" at Jul 04, 2000 12:09:30 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I've never been able to understand the advantages of > conditional variables, which may be my viewpoint, or it may be some > basic lack of understanding. You can use the address of the mutex in a condition variable in the same set of sleep/contention spaces that you use any other mutex. This means that you can do deadlock detection, without having to consider multiple name spaces (e.g. one for mutexes, and another for event flags). The other neat thing is that you can treat them opaquely in the manipulation routines, so long as the address of the mutex is always what's used, and you don't know if it is a mutex protecting a structure, or an event flag, or something else. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 16:10:59 2000 Delivered-To: freebsd-smp@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id C2FE037BA29 for ; Wed, 5 Jul 2000 16:10:55 -0700 (PDT) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id QAA08551; Wed, 5 Jul 2000 16:09:51 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp02.primenet.com, id smtpdAAAzeaiNq; Wed Jul 5 16:09:41 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id QAA27112; Wed, 5 Jul 2000 16:10:39 -0700 (MST) From: Terry Lambert Message-Id: <200007052310.QAA27112@usr05.primenet.com> Subject: Re: SMP meeting summary To: bright@wintelcom.net (Alfred Perlstein) Date: Wed, 5 Jul 2000 23:10:39 +0000 (GMT) Cc: grog@lemis.com (Greg Lehey), jeroen@vangelderen.org (Jeroen C. van Gelderen), eischen@vigrid.com (Daniel Eischen), jasone@canonware.com (Jason Evans), luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG In-Reply-To: <20000703220823.Z25571@fw.wintelcom.net> from "Alfred Perlstein" at Jul 03, 2000 10:08:24 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > The idea is that for spin or spin-then-sleep mutexes (very short > hold time) is that since you won't have as many processes as cpus > contending (and when you do it's ok) that the mutual exclusion is > so short lived that by the time the next 'thundering' process is > actually given the CPU, the likelyhood is that other processes have > already aquired _and_ released the spinlock making it more than > likely that the reasource is free. > > The idea is that the a quantum is actually so great that there's > little chance of one of the wake_all processes colliding on the > lock. This is a bogus idea, both in the case of a large number of processors, and in quantum ownership case. The quantum ownership case is "so long as I have work to do, if the scheduler gave me a quantum, it's my damn quantum!". In other words, the idea of voluntary preemption or semivoluntary preenption, such as one might get when the system makes a process blocke merely because it has made a system call that can't be immediately satisfied. A multithreaded of FSA process doesn't care about a single blocking context: it wants to use the remainder of its quantum. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 16:21:36 2000 Delivered-To: freebsd-smp@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 5FB1E37BD3C for ; Wed, 5 Jul 2000 16:21:31 -0700 (PDT) (envelope-from grog@wantadilla.lemis.com) Received: (from grog@localhost) by wantadilla.lemis.com (8.9.3/8.9.3) id IAA00499; Thu, 6 Jul 2000 08:51:03 +0930 (CST) (envelope-from grog) Date: Thu, 6 Jul 2000 08:51:03 +0930 From: Greg Lehey To: Chuck Paterson Cc: Matthew Dillon , David Greenman , freebsd-smp@freebsd.org Subject: Re: SMP progress (was: Stepping on Toes) Message-ID: <20000706085103.P97425@wantadilla.lemis.com> References: <200007051652.KAA14768@berserker.bsdi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <200007051652.KAA14768@berserker.bsdi.com> Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wednesday, 5 July 2000 at 10:52:23 -0600, Chuck Paterson wrote: >> >> Jake Burkholder is porting the BSD/OS mutexes. I don't expect there >> to be much of a difference in regards to your heavy-weight interrupt >> work. I'm going to take a look at Jake's patchset tonight. I think >> the only operational item we need to research is the sti/cli stuff in >> the BSDI mutexes... we should be able to remove them at some point >> (my interrupt code is already using the ipending mechanism to deal >> with the scheduler mutex being active on the current cpu). >> >> If Jake's removed that, then we'll want to put it back in at some point >> since it saves a significant amount of overhead ('sti' and 'cli' are >> expensive instructions). >> >> -Matt >> Matthew Dillon >> > > > I believe ipending wants to go away totally. It really isn't > meaningful in the thread environment and the locked operations > needed to support it once multiple processor are running in the > kernel are more expensive the sti, cli. Agreed. I can't see any meaning in it, either. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jul 5 16:30:42 2000 Delivered-To: freebsd-smp@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id A26C937B77A for ; Wed, 5 Jul 2000 16:30:38 -0700 (PDT) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e65NTQQ18535; Wed, 5 Jul 2000 16:29:26 -0700 (PDT) Date: Wed, 5 Jul 2000 16:29:26 -0700 From: Alfred Perlstein To: Terry Lambert Cc: Greg Lehey , "Jeroen C. van Gelderen" , Daniel Eischen , Jason Evans , Luoqi Chen , smp@FreeBSD.ORG Subject: Re: SMP meeting summary Message-ID: <20000705162925.V25571@fw.wintelcom.net> References: <20000703220823.Z25571@fw.wintelcom.net> <200007052310.QAA27112@usr05.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <200007052310.QAA27112@usr05.primenet.com>; from tlambert@primenet.com on Wed, Jul 05, 2000 at 11:10:39PM +0000 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org * Terry Lambert [000705 16:11] wrote: > > The idea is that for spin or spin-then-sleep mutexes (very short > > hold time) is that since you won't have as many processes as cpus > > contending (and when you do it's ok) that the mutual exclusion is > > so short lived that by the time the next 'thundering' process is > > actually given the CPU, the likelyhood is that other processes have > > already aquired _and_ released the spinlock making it more than > > likely that the reasource is free. > > > > The idea is that the a quantum is actually so great that there's > > little chance of one of the wake_all processes colliding on the > > lock. > > This is a bogus idea, both in the case of a large number of > processors, and in quantum ownership case. You are correct in that it's bogus for _large_ number of processors, but for small numbers it makes a lot of sense. It would work nicely if one could attempt to schedule the processes in order that they were unblocked. Even on system with a signifigant amount of CPUs, sssuming the system is busy, then between actually scheduling these thundering processes on other CPUs and having them run there will be enough time to avoid another pileup on the mutex. If the CPUs are not busy and collisions occur, well then the collisions are free because we have cycles to burn. :) > The quantum ownership case is "so long as I have work to do, if > the scheduler gave me a quantum, it's my damn quantum!". In other > words, the idea of voluntary preemption or semivoluntary preenption, > such as one might get when the system makes a process blocke merely > because it has made a system call that can't be immediately > satisfied. A multithreaded of FSA process doesn't care about a > single blocking context: it wants to use the remainder of its > quantum. I'm not sure I understand this nor how it applies. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Jul 6 1:23:25 2000 Delivered-To: freebsd-smp@freebsd.org Received: from anchor-post-32.mail.demon.net (anchor-post-32.mail.demon.net [194.217.242.90]) by hub.freebsd.org (Postfix) with ESMTP id 4C10337B7F0 for ; Thu, 6 Jul 2000 01:23:14 -0700 (PDT) (envelope-from dfr@nlsystems.com) Received: from nlsys.demon.co.uk ([158.152.125.33] helo=herring.nlsystems.com) by anchor-post-32.mail.demon.net with esmtp (Exim 2.12 #1) id 13A6wB-000NFy-0W; Thu, 6 Jul 2000 09:23:12 +0100 Received: from salmon.nlsystems.com (salmon.nlsystems.com [10.0.0.3]) by herring.nlsystems.com (8.9.3/8.8.8) with ESMTP id JAA71694; Thu, 6 Jul 2000 09:23:26 +0100 (BST) (envelope-from dfr@nlsystems.com) Date: Thu, 6 Jul 2000 09:26:51 +0100 (BST) From: Doug Rabson To: Matthew Dillon Cc: Greg Lehey , Chuck Paterson , David Greenman , freebsd-smp@freebsd.org Subject: Re: SMP progress (was: Stepping on Toes) In-Reply-To: <200007051643.JAA88121@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, 5 Jul 2000, Matthew Dillon wrote: > :... > :> At this moment, without interrupt threads, interrupts can share Giant > :> with the curproc they interrupted. This is how our existing MP stuff > :> worked already. > :> > :> When Greg moves interrupts to their own threads, and obtains Giant to > :> run those interrupts, no more sharing will occur and just the fact > :> that the interrupt is holding Giant guarentees that nobody else will > :> be messing with SPLs, thus the SPLs can be removed entirely. > : > :Agreed. I'm in the process of implementing the heavy-weight interrupt > :processes now. I've just taken a look at your web page and note that > :the URL no longer exists; in conjunction with the discussion above, > :I'm no longer sure how far you are. Are you importing the BSD/OS code > :now? > : > :We should probably take the rest of this offline, but I wanted to > :discuss how we do things. My idea is: > : > :1. You import the BSD/OS mutexes. > :2. I import/implement the heavy-weight interrupt code, which I will > : endeavour to get working relatively reliably. This should be a > : fallback while I break^H^H^H^H^Himplement light-wait interrupt > : threads. > :3. You and I test our stuff together until it can stay up for an hour > : or so (exact time to be determined by Jason, who'll be carrying > : the can). > :4. We commit the marginally stable stuff. > :5. I carry on working on the light-weight threads. > : > :Any comments? > : > :Greg > > Jake Burkholder is porting the BSD/OS mutexes. I don't expect there > to be much of a difference in regards to your heavy-weight interrupt > work. I'm going to take a look at Jake's patchset tonight. I think > the only operational item we need to research is the sti/cli stuff in > the BSDI mutexes... we should be able to remove them at some point > (my interrupt code is already using the ipending mechanism to deal > with the scheduler mutex being active on the current cpu). > > If Jake's removed that, then we'll want to put it back in at some point > since it saves a significant amount of overhead ('sti' and 'cli' are > expensive instructions). A spin lock which is used from both top and bottom halves *must* disable interrupts, surely. Since we will only really end up with approximately one of these (sched_lock) I don't think there is a real problem. -- Doug Rabson Mail: dfr@nlsystems.com Nonlinear Systems Ltd. Phone: +44 20 8442 9037 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Jul 6 10:55:36 2000 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 4D5E237C033 for ; Thu, 6 Jul 2000 10:55:31 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id KAA94534; Thu, 6 Jul 2000 10:43:23 -0700 (PDT) (envelope-from dillon) Date: Thu, 6 Jul 2000 10:43:23 -0700 (PDT) From: Matthew Dillon Message-Id: <200007061743.KAA94534@apollo.backplane.com> To: Doug Rabson Cc: Greg Lehey , Chuck Paterson , David Greenman , freebsd-smp@FreeBSD.ORG Subject: Re: SMP progress (was: Stepping on Toes) References: Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org : :A spin lock which is used from both top and bottom halves *must* disable :interrupts, surely. Since we will only really end up with approximately :one of these (sched_lock) I don't think there is a real problem. : :-- :Doug Rabson Mail: dfr@nlsystems.com Read the document I wrote a month ago that's sitting on apollo.backplane.com/FreeBSDSmp/ There are two ways to do this: (1) (The way I implemented it) For the schedular mutex, which is the only spin mutex in the system, interrupts are left enabled. If an interrupt occurs the interrupt vector code will test to see if the current cpu holds the schedular mutex. If it does, the code will set the appropriate ipending bit and return immediately. Pending interrupts are tested and run when the schedular mutex is released. Advantages: * Removes sti/cli from the critical mutex path, which is executed far more often then interrupts are. * Allows interrupts to be forwarded if we wanted to forward them, rather then blocking (at some point in the future) * Allows passive pickups of interrupts by idle processors (at some point in the future) * Allows us to implement a separate interrupt scheduler, if we want to (at some point in the future). * Allows us to implement a parallel spl mechanism (though apparently nobody is interested in doing that). Disadvantages: None. (2) (The way BSDI implemented it) When the schedular mutex is obtained interrupts are disabled, blocking interrupts from occuring. When the schedular mutex is released, interrupts are reenabled. An interrupt thus cannot occur while the schedular mutex is being held by the current process. Advantages: * Slightly less complex code. Disadvantages: * Does not allow any manipulation of pending interrupts to occur while the schedular lock is held. * Causes sti/cli to be run quite often, which slows down the critical path for spin mutexes. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Jul 6 11:55:26 2000 Delivered-To: freebsd-smp@freebsd.org Received: from web1403.mail.yahoo.com (web1403.mail.yahoo.com [128.11.23.167]) by hub.freebsd.org (Postfix) with SMTP id B859B37B89E for ; Thu, 6 Jul 2000 11:55:23 -0700 (PDT) (envelope-from fgozzo@yahoo.com) Received: (qmail 20778 invoked by uid 60001); 6 Jul 2000 18:55:20 -0000 Message-ID: <20000706185520.20776.qmail@web1403.mail.yahoo.com> Received: from [128.210.251.12] by web1403.mail.yahoo.com; Thu, 06 Jul 2000 11:55:20 PDT Date: Thu, 6 Jul 2000 11:55:20 -0700 (PDT) From: Fabio Gozzo Subject: i820/i815 Chipset ? To: freebsd-smp@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hello, I'm about to buy a new computer and I'm just wondering if FreeBSD runs well on those new chipsets. I remeber someone having problems with i820 some time ago. Is that still true ? Additionally, if someone could give me good recomendations to dual PIII motherboards, I would be very greatfull. Thank you, Fabio __________________________________________________ Do You Yahoo!? Send instant messages & get email alerts with Yahoo! Messenger. http://im.yahoo.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Jul 6 14:58:48 2000 Delivered-To: freebsd-smp@freebsd.org Received: from io.yi.org (24.67.218.186.bc.wave.home.com [24.67.218.186]) by hub.freebsd.org (Postfix) with ESMTP id E452637B84E; Thu, 6 Jul 2000 14:58:46 -0700 (PDT) (envelope-from jburkhol@home.com) Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1]) by io.yi.org (Postfix) with ESMTP id 68B90BA4E; Thu, 6 Jul 2000 14:58:57 -0700 (PDT) X-Mailer: exmh version 2.1.1 10/15/1999 To: jasone@freebsd.org Cc: smp@freebsd.org Subject: mutex(9) Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 06 Jul 2000 14:58:57 -0700 From: Jake Burkholder Message-Id: <20000706215857.68B90BA4E@io.yi.org> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I asked Sheldon Hearn to have a look at the BSD/OS mutex man page and get it ready for inclusion into FreeBSD. http://people.freebsd.org/~jake/mutex.9 Thanks Sheldon! -- It may be that the asymptotic advantage doesn't set in until well after the limits of human interest. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Jul 6 16:51:47 2000 Delivered-To: freebsd-smp@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id BE22037B78C for ; Thu, 6 Jul 2000 16:51:41 -0700 (PDT) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id QAA08119; Thu, 6 Jul 2000 16:51:00 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp03.primenet.com, id smtpdAAAtOaaWp; Thu Jul 6 16:50:54 2000 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id QAA08404; Thu, 6 Jul 2000 16:51:09 -0700 (MST) From: Terry Lambert Message-Id: <200007062351.QAA08404@usr02.primenet.com> Subject: Re: SMP meeting summary To: bright@wintelcom.net (Alfred Perlstein) Date: Thu, 6 Jul 2000 23:51:08 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), grog@lemis.com (Greg Lehey), jeroen@vangelderen.org (Jeroen C. van Gelderen), eischen@vigrid.com (Daniel Eischen), jasone@canonware.com (Jason Evans), luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG In-Reply-To: <20000705162925.V25571@fw.wintelcom.net> from "Alfred Perlstein" at Jul 05, 2000 04:29:26 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > > The idea is that the a quantum is actually so great that there's > > > little chance of one of the wake_all processes colliding on the > > > lock. > > > > This is a bogus idea, both in the case of a large number of > > processors, and in quantum ownership case. > > You are correct in that it's bogus for _large_ number of processors, > but for small numbers it makes a lot of sense. It would work nicely > if one could attempt to schedule the processes in order that they > were unblocked. What's small? > Even on system with a signifigant amount of CPUs, sssuming the > system is busy, then between actually scheduling these thundering > processes on other CPUs and having them run there will be enough > time to avoid another pileup on the mutex. > > If the CPUs are not busy and collisions occur, well then the > collisions are free because we have cycles to burn. :) This is not a safe assumption; consider the case of a distributed cluster which supported migration. High communications latencies between processors are destructive. Equally, on low latency communications links, such as in a 4-way Xeon system, the amount of interprocessor contention on cache line invalidation is significant. If there isn't such conetntion, why lock it? If there is such contention, why take the invalidation hit for all processors, instead of (on average) 1.5? From a research perspective, consider the case of goal-based computing, goal-based computing where the participants have incomplete information, and cooperative robotics. These are fields of research where your simplification will make use of BSD impossible. > > The quantum ownership case is "so long as I have work to do, if > > the scheduler gave me a quantum, it's my damn quantum!". In other > > words, the idea of voluntary preemption or semivoluntary preenption, > > such as one might get when the system makes a process block merely > > because it has made a system call that can't be immediately > > satisfied. A multithreaded of FSA process doesn't care about a > > single blocking context: it wants to use the remainder of its > > quantum. > > I'm not sure I understand this nor how it applies. A multithreaded file system architecture could be used to implement a concurrent "team" type program, which transfered data from one region in a contention domain to another. If that were implemented using your approach, there would by lockstep read/write/read/write, rather than concurrent operation with a single latency (e.g. read/write+read/write+read/.../write). This is the moral equivalent of a sliding window for data copies on contention domains, involving two processors. One of the basic flaws of SVR4 based Dynix for a very long time (and perhaps still; I haven't been inside a Sequent box for 2 years now) was that the FSA was not multithreaded. If you want a more prosaic example, take any web server you currently operate, and replace the GIF images with CGI's that invoke the "team" program to stream the data out to the requester, instead of delivering the image directly. Note the almost 170% improvement in download time for the pages (this is the same reason that "sendfile" is a stupid idea, given that it can never achieve this improvement becuase it can never achieve this concurrency, even if you could send all the data in the UNIX disk format). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Jul 6 20: 7:53 2000 Delivered-To: freebsd-smp@freebsd.org Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178]) by hub.freebsd.org (Postfix) with ESMTP id 4736437B698 for ; Thu, 6 Jul 2000 20:07:50 -0700 (PDT) (envelope-from keichii@peorth.iteration.net) Received: by peorth.iteration.net (Postfix, from userid 1000) id 80ACF64C0E; Thu, 6 Jul 2000 22:07:53 -0500 (CDT) Date: Thu, 6 Jul 2000 22:07:53 -0500 From: "Michael C. Wu" To: smp@freebsd.org Subject: Re: i820/i815 Chipset ? Message-ID: <20000706220753.D21156@peorth.iteration.net> Mail-Followup-To: "Michael C. Wu" , smp@freebsd.org References: <20000706185520.20776.qmail@web1403.mail.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <20000706185520.20776.qmail@web1403.mail.yahoo.com>; from fgozzo@yahoo.com on Thu, Jul 06, 2000 at 11:55:20AM -0700 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, Jul 06, 2000 at 11:55:20AM -0700, Fabio Gozzo scribbled: | I'm about to buy a new computer and I'm just wondering if FreeBSD runs | well on those new chipsets. I remeber someone having problems with i820 | some time ago. Is that still true ? Do not know. Sorry. | Additionally, if someone could give me good recomendations to dual PIII | motherboards, I would be very greatfull. | Thank you, Tyan dual boards have worked well for me. ---end quoted text--- P.S. I don't know if asking this question in -smp is a very idea. Regards, -- +------------------------------------------------------------------+ | keichii@peorth.iteration.net | keichii@bsdconspiracy.net | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +------------------------------------------------------------------+ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jul 7 8:17:37 2000 Delivered-To: freebsd-smp@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id 989B237C1AB for ; Fri, 7 Jul 2000 08:17:29 -0700 (PDT) (envelope-from cp@berserker.bsdi.com) Received: from berserker.bsdi.com (cp@localhost [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id JAA12975; Fri, 7 Jul 2000 09:16:20 -0600 (MDT) Message-Id: <200007071516.JAA12975@berserker.bsdi.com> To: Matthew Dillon Cc: Doug Rabson , Greg Lehey , David Greenman , freebsd-smp@freebsd.org Date: Fri, 07 Jul 2000 09:16:20 -0600 From: Chuck Paterson Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Subject: Spin Locks, blocking interrupts, and ipending From: Chuck Paterson Fcc: outbound -------- Matthew Dillon wrote on: Thu, 06 Jul 2000 10:43:23 PDT } } There are two ways to do this: } } (1) (The way I implemented it) Removed stuff } Disadvantages: } } None. } One disadvantage is that the actual mechanism to implement the lazy interrupts is going to be a bitch to actually make work right/well on multiple processors in parallel, further complicated by the light weight context switches. Also I'm pretty sure interrupts don't want to get pended to by the pic level, but rather on individual apic pins. A second disadvantage is that interrupts will get randomly delivered to a processor holding a spin mutex rather than run immediately on a processor not holding a spin mutex. This is a problem with the current scheme also, and not fixed with masks and ipending. Having said this, it may be that Matt is correct and in the long term we will want to ditch the cli/sti's in mutices. I really hope we don't have to go down the lazy path, my mind goes numb just thinking about the corner cases. Some random data points. The assumption that the scheduler lock will be the only spin mutex is wrong. BSD/OS currently has about 5 which are pretty much architecture independent. Most/all of these are at one time or another acquired while the scheduler lock is held. On Sparc BSD/OS already has the notion of a priority level associated with a spin lock. Interrupts are blocked by changing the interrupting block level, which on Sparc is very cheap. When a processor is cli'd IPIs are not delivered. In general this made life lots easier in BSD/OS. It is possible that a condition will arise where is it required to deliver an IPI while another processor holds a spin lock. This did occur on Sparc with some low cache code (stuff done in hardware on Intel). This particular IPI is delivered at a level higher than just acquiring a mutex will not block. It is likely that some devices will want a lower level half that always runs in a borrowed context and is protected by spin locks. Currently in BSD/OS the Sparc zs driver falls into this category. It sure seems likely that the X86 com driver will also fall into this category. Using sti/cli does not preclude this, but it does preclude a driver of this type interrupting another driver of this type. In order to avoid a deadly embrace spin mutice must block all interrupts of mutice which which are logically before them in the locking order. This lends itself to a priority scheme for blocking rather than a unordered bit mask. This is not to say that a bit mask couldn't be used to implement a priority scheme. The task priority register could used to block interrupts on Pentium. The code could then work much like Sparc. This has the advantage of allowing some spin mutice to be interrupted without the added complexity of ipending. The real good thing is that the interrupt will get dispatched to a processor able to handle it, rather than just being pended on a processor which has it blocked. I don't yet know how expensive writes to the task priority register are. The cost of spin mutice are not that big of a deal. They are the more expensive mutex, and generally when the are acquired something very expensive, like a task switch, is already occuring. The cost for use with drivers like the com driver is not an issue. Just taking the interrupt and then talking to the hardware totally swamps the cost of the mutex. Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jul 7 8:57:58 2000 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 15F9737BF9A for ; Fri, 7 Jul 2000 08:57:26 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id IAA00648; Fri, 7 Jul 2000 08:54:37 -0700 (PDT) (envelope-from dillon) Date: Fri, 7 Jul 2000 08:54:37 -0700 (PDT) From: Matthew Dillon Message-Id: <200007071554.IAA00648@apollo.backplane.com> To: Chuck Paterson Cc: Doug Rabson , Greg Lehey , David Greenman , freebsd-smp@FreeBSD.ORG Subject: References: <200007071516.JAA12975@berserker.bsdi.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :One disadvantage is that the actual mechanism to implement :the lazy interrupts is going to be a bitch to actually make work :right/well on multiple processors in parallel, further complicated :by the light weight context switches. Also I'm pretty sure interrupts :don't want to get pended to by the pic level, but rather on individual :apic pins. : :A second disadvantage is that interrupts will get randomly delivered to :a processor holding a spin mutex rather than run immediately on a :processor not holding a spin mutex. This is a problem with the :current scheme also, and not fixed with masks and ipending. Well, the first isn't a disadvantage, since I implemented it a few weeks ago. It was trivial. The second one doesn't apply to cli/sti verses ipending. I don't see any relationship. Interrupt delivery is not controlled by cli/sti, it's controlled by the APIC. :Having said this, it may be that Matt is correct and in the long :term we will want to ditch the cli/sti's in mutices. I really hope :we don't have to go down the lazy path, my mind goes numb just :thinking about the corner cases. : :Some random data points. : : The assumption that the scheduler lock will be the only : spin mutex is wrong. BSD/OS currently has about 5 which : are pretty much architecture independent. Most/all of these : are at one time or another acquired while the scheduler : lock is held. A per-process spin-held counter would address this both for the schedular mutex and any other spin mutex. : On Sparc BSD/OS already has the notion of a priority : level associated with a spin lock. Interrupts are blocked : by changing the interrupting block level, which on Sparc : is very cheap. : : When a processor is cli'd IPIs are not delivered. In general : this made life lots easier in BSD/OS. It is possible that : a condition will arise where is it required to deliver : an IPI while another processor holds a spin lock. This : did occur on Sparc with some low cache code (stuff done : in hardware on Intel). This particular IPI is delivered : at a level higher than just acquiring a mutex will not block. IPI's could be an issue since they don't have equivalent interrupt bits in irunning or ipending, and so can't be defered. I don't know of any FreeBSD IPIs that can't 'just run', with or without the scheduler mutex. In this case IPIs would not have to be defered even if the scheduler mutex were held by the current cpu. The only thing we use IPIs for seriously are VM page operations, to invalidate pte's on other cpu's, and for interrupt forwarding (which never worked quite right anyway). The former occurs only while Giant is held. :... lots of good stuff removed :Chuck : -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jul 7 10:49:39 2000 Delivered-To: freebsd-smp@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id A13F037BF22 for ; Fri, 7 Jul 2000 10:49:34 -0700 (PDT) (envelope-from cp@berserker.bsdi.com) Received: from berserker.bsdi.com (cp@localhost [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id LAA14029; Fri, 7 Jul 2000 11:48:34 -0600 (MDT) Message-Id: <200007071748.LAA14029@berserker.bsdi.com> To: Matthew Dillon Cc: Doug Rabson , Greg Lehey , David Greenman , freebsd-smp@freebsd.org Subject: Re: Spin Locks, blocking interrupts, and ipending From: Chuck Paterson Date: Fri, 07 Jul 2000 11:48:34 -0600 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Finally get a subject line. } } Well, the first isn't a disadvantage, since I implemented it a few } weeks ago. It was trivial. I maintain the the implementation to work on single running processor is not in the same category of difficulty as that needed to work on multiple processors. } } The second one doesn't apply to cli/sti verses ipending. I don't } see any relationship. Interrupt delivery is not controlled by } cli/sti, it's controlled by the APIC. } However, if the APIC is programmed to deliver interrupts to the right place, or not deliver them as the case may be, then there is no need for the ipending stuff. There is no way just using ipending to get the "right" job done was the point. }: }:Some random data points. }: }: The assumption that the scheduler lock will be the only }: spin mutex is wrong. BSD/OS currently has about 5 which }: are pretty much architecture independent. Most/all of these }: are at one time or another acquired while the scheduler }: lock is held. } } A per-process spin-held counter would address this both for } the scheduler mutex and any other spin mutex. } The comment I originally made here wasn't aimed at anything Matt said, but rather the comments others had made to the effect that the scheduler mutex was likely to be the only spin mutex. } } The only thing we use IPIs for seriously are VM page operations, } to invalidate pte's on other cpu's, and for interrupt forwarding } (which never worked quite right anyway). The former occurs only while } Giant is held. Hopefully it will eventually be the case that Giant isn't held when IPI's are sent. BSD/OS already dispatches pcpu clock IPIs without holding any locks. It certainly is the case with BSD/OS that the goal is to make Giant go away totally as soon as possible. However, I don't think this fundamentally changes anything. Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jul 7 13: 2:51 2000 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 0332437B743 for ; Fri, 7 Jul 2000 13:02:47 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id NAA01310; Fri, 7 Jul 2000 13:02:41 -0700 (PDT) (envelope-from dillon) Date: Fri, 7 Jul 2000 13:02:41 -0700 (PDT) From: Matthew Dillon Message-Id: <200007072002.NAA01310@apollo.backplane.com> To: Chuck Paterson Cc: Doug Rabson , Greg Lehey , David Greenman , freebsd-smp@FreeBSD.ORG Subject: Re: Spin Locks, blocking interrupts, and ipending References: <200007071748.LAA14029@berserker.bsdi.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :Finally get a subject line. : :} :} Well, the first isn't a disadvantage, since I implemented it a few :} weeks ago. It was trivial. : : I maintain the the implementation to work on single :running processor is not in the same category of difficulty as that :needed to work on multiple processors. For ipending it IS the same category. At least for i386. If you look at the patchset you will see that the ipending code I wrote is 100% MP safe. :} The second one doesn't apply to cli/sti verses ipending. I don't :} see any relationship. Interrupt delivery is not controlled by :} cli/sti, it's controlled by the APIC. :} : : However, if the APIC is programmed to deliver interrupts :to the right place, or not deliver them as the case may be, then :there is no need for the ipending stuff. There is no way just :using ipending to get the "right" job done was the point. You are assuming that you can program APIC at every state change that would otherwise effect interrupt performance. That sounds rather messy to me. With the ipending mechanism (which gives you the ability to enter into an interrupt even with spin locks held), you only need to decide what to do with the interrupt in *one* place in the code. We had (and have) code strewn all over the FreeBSD to deal with the APIC. It's a mess, and a mistake. A few strategic places, sure, but everywhere? No. For example, take the case where you want to tie a NIC interrupt to a particular cpu. You could tie the interrupt and simply leave it that way, and if the interrupt occurs at an inopportune time the interrupt vector code can choose what to do: * set ipending and return (the interrupt will be run the moment the scheduler lock is released) * forward the interrupt to another cpu * do something else... :} A per-process spin-held counter would address this both for :} the scheduler mutex and any other spin mutex. :} : : The comment I originally made here wasn't aimed at anything :Matt said, but rather the comments others had made to the effect :that the scheduler mutex was likely to be the only spin mutex. : :} :} The only thing we use IPIs for seriously are VM page operations, :} to invalidate pte's on other cpu's, and for interrupt forwarding :} (which never worked quite right anyway). The former occurs only while :} Giant is held. : :Hopefully it will eventually be the case that Giant isn't held when :IPI's are sent. BSD/OS already dispatches pcpu clock IPIs without :holding any locks. It certainly is the case with BSD/OS that the :goal is to make Giant go away totally as soon as possible. However, :I don't think this fundamentally changes anything. : :Chuck -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jul 7 14: 4:23 2000 Delivered-To: freebsd-smp@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id 730C837BFE6 for ; Fri, 7 Jul 2000 14:04:11 -0700 (PDT) (envelope-from cp@berserker.bsdi.com) Received: from berserker.bsdi.com (cp@localhost [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id PAA16442; Fri, 7 Jul 2000 15:03:43 -0600 (MDT) Message-Id: <200007072103.PAA16442@berserker.bsdi.com> To: Matthew Dillon Cc: Doug Rabson , Greg Lehey , David Greenman , freebsd-smp@freebsd.org Subject: Re: Spin Locks, blocking interrupts, and ipending From: Chuck Paterson Date: Fri, 07 Jul 2000 15:03:43 -0600 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org } } For ipending it IS the same category. At least for i386. If you look at } the patchset you will see that the ipending code I wrote } is 100% MP safe. It may be true that ipending itself is safe, but 1) Wrapping all the code in Giant isn't the same as being MP safe. Especially when the code to acquire Giant just can work at the start of an interrupt stub. 2) If you ever have a second processor you will soon get to mtp->mtx_lock = newv; panic("blocking"); <---- mtx_enter(&SchedMutex, MTX_SPIN); 3) The code which deals with masking interrupts in the hardware has to be MP aware. Functions like Xresume may happen on multiple processors at the same time, at least once stuff moves out from under Giant. } } We had (and have) code strewn all over the FreeBSD to deal with the } APIC. It's a mess, and a mistake. A few strategic places, sure, } but everywhere? No. } You definitely don't need the code strewn all over the place. There is no reason it can't be encapsulated. } For example, take the case where you want to tie a NIC interrupt to } a particular cpu. You could tie the interrupt and simply leave it } that way, and if the interrupt occurs at an inopportune time the } interrupt vector code can choose what to do: } What exactly do you mean by tie an interrupt to a particular CPU? Route in hardware, is the really possible on X86? Route in software? } * set ipending and return (the interrupt will be run the moment } the scheduler lock is released) For an edge triggered device you need to do more than set ipending. You need to muck with the hardware. You will need pcpu ipending if you want to cause it to run on a particular processor } } * forward the interrupt to another cpu } } * do something else... } Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message