From owner-freebsd-smp Sun Jun 27 3:42:38 1999 Delivered-To: freebsd-smp@freebsd.org Received: from henoch.cc.fh-lippe.de (henoch.cc.fh-lippe.de [193.16.112.72]) by hub.freebsd.org (Postfix) with ESMTP id E518014CA3; Sun, 27 Jun 1999 03:42:18 -0700 (PDT) (envelope-from lkoeller@cc.fh-lippe.de) Received: from spock.cc.fh-lippe.de([193.16.118.120]) (12075 bytes) by henoch.cc.fh-lippe.de via sendmail with P:smtp/R:inet_hosts/T:smtp (sender: ) id for ; Sun, 27 Jun 1999 12:42:15 +0200 (MET DST) (Smail-3.2.0.101 1997-Dec-17 #3 built 1998-Feb-3) Received: from cc.fh-lippe.de by spock.cc.fh-lippe.de with smtp (Smail3.1.29.1 #2) id m10yCO0-0006yaC; Sun, 27 Jun 99 12:42 MET DST Received: from odie.lippe.de (localhost [127.0.0.1]) by cc.fh-lippe.de (8.9.3/8.9.1) with ESMTP id LAA91826; Sun, 27 Jun 1999 11:58:29 +0200 (CEST) (envelope-from lkoeller@odie.lippe.de) Message-Id: <199906270958.LAA91826@cc.fh-lippe.de> X-Mailer: exmh version 2.0.2 2/24/98 From: Lars =?iso-8859-1?Q?K=F6ller?= To: Lars =?iso-8859-1?Q?K=F6ller?= Cc: freebsd-questions@FreeBSD.ORG, freebsd-smp@FreeBSD.ORG, Thierry.Herbelot@alcatel.fr Subject: Re: New freeze with 3.2-RELEASE (SMP and audio)!! In-reply-to: lkoeller's message of Sat, 26 Jun 1999 23:03:10 +0200. <199906262103.XAA00889@cc.fh-lippe.de> X-Face: eCcoCV}FjV*O{6>[1$XP/e%]TJhEw2MF33dFh)^HM7Gfd=[/(4+0a$~ It seems not so easy as mentioned in my first mail, still total > freezes! I just trying another kernel ...... > > Any ideas, comments on my kernel config file are wellcome! > > In reply to Lars =?iso-8859-1?Q?K=F6ller?= who wrote: > > > Hi! > > > > Sorry for warming this up, but migration from 3.1-R to 3.2-R, from > > source was a horor trip! > > > > However, after having a stable system now, I can tell you the reaseon: > > > > It was like in 3.0, too (you remember the thread) a problem with the > > audio driver under SMP. The soundcard is a Soundblaster AWE 32 (ISA). > > > > My 3.1-R system was rock solid, with SMP and audio. I use the same > > kernel config file for 3.2-R, and here the machine locks up without > > any visible reason. After fiddle a littel bit with diferent configs > > and different kernel configuration, all leads to the sound driver > > (voxware). I also tried the last OSS version, but after loading the > > module during install the machine locks up. > > > > I append my kernel config file, the kernel.config file and the > > dmesg output. New since 3.2-R is the output "AWE32 not detected". > > > > Is there anybody out there with running a SB AWE32 on an 3.2-R > > machine with SMP? --==_Exmh_-17238871240 Content-Type: text/plain ; name="ODIE"; charset=us-ascii Content-Description: ODIE Content-Disposition: attachment; filename="ODIE" ##################################################################### # ODIE kernel config file # machine "i386" cpu "I686_CPU" # aka Pentium Pro(tm) ident ODIE maxusers 32 config kernel root on da0 dumps on da0 ##################################################################### # This allows you to actually store this configuration file into # the kernel binary itself, where it may be later read by saying: # strings /kernel | grep ^___ | sed -e 's/^___//' > MYKERNEL # options INCLUDE_CONFIG_FILE # Include this file in kernel ##################################################################### # Create a SMP capable kernel (mandatory options): # options SMP #Smmetric Multiprocessor Kernel options APIC_IO #Smmetric (APIC) I/O ##################################################################### # Lets always enable the kernel debugger for SMP. # options DDB options INET #InterNETworking options FFS #Berkeley Fast Filesystem options FFS_ROOT #FFS usable as root device [keep this!] options PROCFS #Process filesystem options "COMPAT_43" #Compatible with BSD 4.3 options MAXCONS=8 #Number of max. allowed virt. consoles options SOFTUPDATES options QUOTA #enable disk quotas options PERFMON #Pentium (Pro) performance counters options SYSVSHM #System V shared memory support options SYSVSEM #System V Semophore support options SYSVMSG #System V Message support options UCONSOLE #Allow ordinary users to take the #console - this is useful for X. options XSERVER #include code for XFree86 options USERCONFIG #boot -c editor options VISUAL_USERCONFIG #visual boot -c editor options SC_HISTORY_SIZE=512 #number of history buffer lines ##################################################################### # KTRACE enables the system-call tracing facility ktrace(2). # options KTRACE #kernel tracing ##################################################################### # ISA devices ##################################################################### controller isa0 options "AUTO_EOI_1" options "AUTO_EOI_2" ##################################################################### # Floppy # controller fdc0 at isa? port "IO_FD1" bio irq 6 drq 2 disk fd0 at fdc0 drive 0 disk fd1 at fdc0 drive 1 #device apm0 at isa? controller pnp0 ##################################################################### # atkbdc0 controlls both the keyboard and the PS/2 mouse # controller atkbdc0 at isa? port IO_KBD tty device atkbd0 at isa? tty irq 1 device psm0 at isa? tty irq 12 ##################################################################### # The video card driver. # device vga0 at isa? port ? conflicts # Splash screen at start up! Screen savers require this too. pseudo-device splash ##################################################################### # syscons is the default console driver, resembling an SCO console # device sc0 at isa? tty device npx0 at isa? port IO_NPX iosiz 0x0 flags 0x0 irq 13 ##################################################################### # sio1 (dcf-77) # serial console # options BREAK_TO_DEBUGGER # BREAK on console goes to DDB options CONSPEED=19200 # default speed for console device sio0 at isa? port "IO_COM1" flags 0x10 tty irq 4 device sio1 at isa? port "IO_COM2" tty irq 3 ##################################################################### # Parallel-Port Bus # # Anscheinend mit "net" statt "tty" Probleme mit gehechselten Zeilen # beim Drucken # # vpo Iomega Zip Drive # Requires SCSI disk support ('scbus' and 'da'), best # performance is achieved with ports in EPP 1.9 mode. # nlpt Parallel Printer, use _instead_ of lpt0 # plip Parallel network interface # ppi General-purpose I/O ("Geek Port") # pps Pulse per second Timing Interface # lpbb Philips official parallel port I2C bit-banging interface device ppc0 at isa? port? flags 0x40 net irq 7 controller ppbus0 controller vpo0 at ppbus? device lpt0 at ppbus? device plip0 at ppbus? device ppi0 at ppbus? ##################################################################### # Audio drivers: `snd', `sb', `pas', `gus', `pca' # #controller snd0 #device sb0 at isa? port 0x220 irq 5 drq 1 flags 0x5 #device sbxvi0 at isa? drq 5 #device sbmidi0 at isa? port 0x330 #device awe0 at isa? port 0x620 #device opl0 at isa? port 0x388 #device pcm0 at isa? port ? tty irq 5 drq 1 flags 0x0 ##################################################################### # i4b passive ISDN cards support (isic - I4b Siemens Isdn Chipset driver) # note that the ``options'' and ``device'' lines must BOTH be defined ! # Teles S0/16.3 options "TEL_S0_16_3" device isic0 at isa? port 0xd80 net irq 10 flags 3 # Q.921 / layer 2 - i4b passive cards D channel handling pseudo-device "i4bq921" # # Q.931 / layer 3 - i4b passive cards D channel handling pseudo-device "i4bq931" # # layer 4 - i4b common passive and active card handling pseudo-device "i4b" # # userland driver to do ISDN tracing (for passive cards oly) pseudo-device "i4btrc" 4 # userland driver to control the whole thing pseudo-device "i4bctl" # userland driver for access to raw B channel pseudo-device "i4brbch" 4 # userland driver for telephony pseudo-device "i4btel" 2 # network driver for IP over raw HDLC ISDN pseudo-device "i4bipr" 4 # enable VJ header compression detection for ipr i/f options IPR_VJ # network driver for sync PPP over ISDN pseudo-device "i4bisppp" 4 pseudo-device sppp 4 ##################################################################### # PCI devices ##################################################################### controller pci0 options SCSI_DELAY=8000 # Be pessimistic about Joe SCSI device options SCSI_REPORT_GEOMETRY ##################################################################### # Adaptec 2940[U/UW] SCSI Adapter # controller ahc0 controller ahc1 # Devices connected device ch0 device da0 device sa0 device cd0 device pass0 # The syntax for wiring down devices is: # AH2940 U # controller scbus0 at ahc0 # Seagate ST15150N # disk da0 at scbus0 target 0 disk da1 at scbus0 target 1 # Seagate Python 28388, DDS2 # tape sa0 at scbus0 target 3 # PIONEER CD-ROM DR-433, (target 4) # device cd0 at scbus? # Adic Autochanger mit HP DAT, DDS2 # device ch0 at scbus0 target 5 tape sa1 at scbus0 target 6 # The syntax for wiring down devices is: # AH 2940 UW # controller scbus1 at ahc1 # PIONEER CD-ROM DR-U12X # device cd1 at scbus1 target 6 # Seagate ST32171W # disk da2 at scbus1 target 8 disk da3 at scbus1 target 9 ##################################################################### # POSIX P1003.1B # Real time extensions added int the 1993 Posix # P1003_1B: Infrastructure # _KPOSIX_PRIORITY_SCHEDULING: Build in _POSIX_PRIORITY_SCHEDULING # _KPOSIX_VERSION: Version kernel is built for options "P1003_1B" options "_KPOSIX_PRIORITY_SCHEDULING" options "_KPOSIX_VERSION=199309L" ##################################################################### # Useful pseudo devices ##################################################################### pseudo-device loop pseudo-device ether pseudo-device ccd 4 pseudo-device snp 3 #Snoop device - to look at pty/vty/etc.. pseudo-device bpfilter 4 pseudo-device pty 128 pseudo-device gzip #Exec gzipped a.out's pseudo-device vn #Vnode driver (turns a file into a device) --==_Exmh_-17238871240 Content-Type: text/plain; charset=us-ascii E-Mail: | Lars Koeller Lars.Koeller@Uni-Bielefeld.DE | UNIX Sysadmin lkoeller@cc.fh-lippe.de | Computing Center PGP-key: | University of Bielefeld http://www.nic.surfnet.nl/pgp/pks-toplev.html | Germany ----------- FreeBSD, what else? ---- http://www.freebsd.org ------------- --==_Exmh_-17238871240-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 12:21:58 1999 Delivered-To: freebsd-smp@freebsd.org Received: from dorifer.heim3.tu-clausthal.de (dorifer.heim3.tu-clausthal.de [139.174.243.252]) by hub.freebsd.org (Postfix) with ESMTP id 45CB715091 for ; Sun, 27 Jun 1999 12:21:45 -0700 (PDT) (envelope-from olli@dorifer.heim3.tu-clausthal.de) Received: (from olli@localhost) by dorifer.heim3.tu-clausthal.de (8.8.8/8.8.8) id VAA16464 for freebsd-smp@FreeBSD.ORG; Sun, 27 Jun 1999 21:21:41 +0200 (CEST) (envelope-from olli) Date: Sun, 27 Jun 1999 21:21:41 +0200 (CEST) From: Oliver Fromme Message-Id: <199906271921.VAA16464@dorifer.heim3.tu-clausthal.de> To: freebsd-smp@FreeBSD.ORG Subject: Freezes with 3.2-RELEASE (SMP) Organization: Administration Heim 3 Reply-To: olli@incogni.to MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Newsreader: TIN [version 1.2 RZTUC(3) PL2] Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org First I'm going to reply to a few of Lars' statements, then I will describe my own experiences with freezes on an SMP box running 3.2-RELEASE. These might or might not be related to the problems that Lars is encountering. Lars Koeller wrote: > I hope I'm not too fast with my conclusions (normally I am :-), but > the instability seems to result from th missing option > > options FFS_ROOT #FFS usable as root device [keep this!] You definitely need that option if your root device is an FFS/UFS partition. I think the comment "keep this!" should be pretty clear. ;-) > After fiddeling with all this details, sometimes I wish myself a > kernel configuration utility, which is able to avoid such problems Doing an update properly does avoid such problems, too. For example, it's always a good idea to keep the old GENERIC and LINT files, then make a diff between the old and new ones, and then merge any changes which apply to you into your kernel config file. This takes only a few minutes, but it can save you from hours of debugging strange problems. Apart from that, it brings any new features to your attention which might be useful for you. > The next step is to activate the soundcard again ..... I have an AWE32 in a UP system, it works fine with both the traditional Voxware drivers and Luigi's new PCM driver. If the former doesn't work for you, you might try the latter. > At the moment I'm just running a SMP kernel again and do a 'make > world' for testing. "make world" isn't always enough, unfortunately... See below. > options XSERVER #include code for XFree86 > [...] > options "AUTO_EOI_2" The XSERVER option applies only to the pcvt console driver, not to syscons, so it's not needed. However, more importantly, the AUTO_EOI_2 option broke the system _badly_ when I tried it on a box some time ago. Since then I've never touched it again. (AUTO_EOI_1 seems to work fine, though.) Maybe I just had bad luck. But maybe you should try to remove that option. Now this is my story: When I built an SMP box last week (dual Celeron-466), I also installed 3.2-RELEASE, just because I got the CD set in the mail on the same day, so it was most convenient. Everything went well, even a "make world" (52 minutes, by the way, on a slow old IBM DCAS drive connected to a Symbios-810 Fast-SCSI controller -- this was probably the limiting factor). However, when I installed the netpbm port, the box froze in the middle of compiling. No warning or error message, no panic, no reboot, no ddb prompt. It was as dead as it could be, no ping replies, and pressing the NumLock key did not change the LED anymore. I rebooted, then did a "make clean" and tried to compile netpbm again, which worked fine this time. Then I tried to compile gimp, and again the box froze after a few minutes of compiling. "Shit." I removed everything from the kernel config file that I didn't absolutely need (this box does _not_ have any sound hardware!) and set the most conservative timings in the BIOS setup. It didn't make any difference -- it always froze after some (varying) time of heavy load. (BTW, interestingly, compiling gimp seemed to cause the problem easier/earlier than "make world".) I rebooted with a UP kernel -- no problem, the system was rock- solid. Swapped the CPUs to check the other one -- same thing. Rebooted with SMP kernel, compiled gimp -- freeze after 10 minutes. There was not much left to do, and I tended to think that the mainboard's SMP support was defective. As a last resort, I tried to install a recent 4.0-current snapshot (19990622, to be exact). Guess what -- the problem was gone. The box now runs great -- I even overclocked both CPUs to 525 MHz (FSB running at 75 MHz). I let it making world and compile ports 24h/day to make sure it is stable. I believe that the first thing that's going to fail is the poor disk. :-) So my conclusion is that 3.2-RELEASE does not work reliably with SMP under serious load. At least, that was the case for me. Maybe the problem is also fixed in -stable, but now that I'm running -current without apparent problems, I have no reason to downgrade. Regards Oliver PS: Using Celerons, it is possible to build dual-processor SMP machines with serious processing power at a price that is not much higher than a single-CPU system. So let's all build SMP boxes and help the FreeBSD team to improve the SMP efficiency and kick NT's ass. :) -- Oliver Fromme, Leibnizstr. 18/61, 38678 Clausthal, Germany (Info: finger userinfo:olli@dorifer.heim3.tu-clausthal.de) "In jedem Stück Kohle wartet ein Diamant auf seine Geburt" (Terry Pratchett) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 13:24:11 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 7D0B314E12 for ; Sun, 27 Jun 1999 13:24:01 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id NAA15634; Sun, 27 Jun 1999 13:24:01 -0700 (PDT) (envelope-from dillon) Date: Sun, 27 Jun 1999 13:24:01 -0700 (PDT) From: Matthew Dillon Message-Id: <199906272024.NAA15634@apollo.backplane.com> To: freebsd-smp@FreeBSD.ORG Subject: high-efficiency SMP locks - submission for review Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I would like to know what the SMP gurus think of this code. I have not posted the complete patch, but have included the meat of the code. I am interested in comments relating to the implementation & performance aspects of the code. I would also appreciate it if an inline assembly expert could look at my inline assembly. I have tested it well and I believe it to be correct, but I don't write much inline assembly so... This code is designed to implement extremely low overhead SMP-compatible locks. It will probably be used instead of lockmgr() locks for the buffer cache, and I hope to use it as a basis to replace lockmgr() locks in other modules later on. The same structure can be used to implement both spin locks and normal blocking locks, and can handle shared, exclusive, and recursive-exclusive lock types. The critical path has been optimize down to 20 instructions or so ( compared to the hundreds in lockmgr() ). As many of you may already know, lockmgr() locks are extremely expensive and require interrupt synchronization for their internal simplelocks to operate properly in a mixed process/interrupt environment. The result is a horrendous amount of overhead for very little gain. I am attempting to come up with a general locking module that can eventually replace lockmgr() locks. -Matt ( for /usr/src/sys/i386/include/atomic.h ) atomic_cmpex() returns oldv on success, something else on failure. static __inline int atomic_cmpex(volatile int *pint, int oldv, int newv) { __asm __volatile ("/* %0 %1 */; lock; cmpxchgl %2,(%3)" : "=a" (oldv) : "a" (oldv), "r" (newv), "r" (pint) ); return(oldv); } /* * SYS/QLOCK.H * * (c)Copyright Matthew Dillon, 1999. BSD copyright /usr/src/COPYRIGHT is * to apply with the exception that the author's name, "Matthew Dillon", is to * replace the "Regents of the University of California". * * Implement high efficiency SMP-safe shared/exclusive locks with recursion * capabilities. * * See kern/kern_qlock.c for a detailed explanation * * $Id: Exp $ */ #ifndef _QLOCK_H_ #define _QLOCK_H_ #include struct proc; /* * qlock structure * * count * contains lock count. Positive numbers indicate shared * references, negative numbers indicate exclusive ( and * possibly recursive ) references. 0 indicates that the * lock is not held at all. * * waiting * If non-zero, indicates that someone is waiting for this lock. * Processes waiting for a lock are woken up when the lock count * passes through 0. * * holder * If an exclusive lock is being held, holder contains the process * pointer of the holder. Otherwise holder contains garbage * (but may be useful in crash dumps). If lock debugging is * turned on and shared locks are being held, holder contains * the last process to obtain the shared lock. * * The holder field is not updated atomically, but this does * not matter because it is only tested if an exclusive lock * is already held. We go through a little magic to deal with * the one race that occurs in qlock_try_mwr(). */ struct qlock { volatile int count; /* lock refs: neg=exclusive, pos=shared */ int waiting; /* process(s) are waiting on lock */ struct proc *holder; /* holder of an exclusive lock */ }; #define QLK_SPIN 0 #define QLK_SWITCH 1 #ifdef KERNEL /* * These functions may not be used outside this header file */ extern void qlock_init(struct qlock *lk); extern void _qlock_rd(struct qlock *lk); extern void _qlock_wr(struct qlock *lk); extern void _qlock_mwr(struct qlock *lk); extern int _qlock_sleepfail_rd(struct qlock *lk, char *wmesg, int catch, int timo); extern int _qlock_sleepfail_wr(struct qlock *lk, char *wmesg, int catch, int timo); extern void _qlock_upgrade(struct qlock *lk); extern void _qlock_ovf_panic(struct qlock *lk); extern void _qlock_op_panic(struct qlock *lk); extern void _qunlock_rd_panic(struct qlock *lk); extern void _qunlock_wr_panic(struct qlock *lk); /* * Optimized inlined functions for general use */ /* * qlock_try_rd * qlock_try_wr * qlock_try_mwr * * Attempt to obtain a shared lock, exclusive lock, or recursive-capable * exclusive lock. 0 is returned on success, EBUSY on failure. The loop * exists only to manage SMP collisions, the routine does not block. * * Note that qlock_*wr() needs to deal with the SMP race between the * count going from 0 to -1 and the lock holder getting set. It does * this by running the lock through an intermediate stage that blocks * other attempts to obtain exclusive locks until the holder can be set. */ static __inline int qlock_try_rd(struct qlock *lk) { int n; while ((n = lk->count) >= 0) { #ifdef INVARIANTS if (n == 0x7FFFFFFF) _qlock_ovf_panic(lk); #endif if (atomic_cmpex(&lk->count, n, n + 1) == n) { #ifdef DEBUG_LOCKS lk->holder = curproc; #endif return(0); } } return(EBUSY); } static __inline int qlock_try_wr(struct qlock *lk) { int n; while ((n = lk->count) == 0) { if (atomic_cmpex(&lk->count, 0, -(int)0x7FFFFFFF) == 0) { lk->holder = curproc; lk->count = -1; return(0); } } return(EBUSY); } static __inline int qlock_try_mwr(struct qlock *lk) { int n; while ((n = lk->count) <= 0) { if (n == 0) { if (atomic_cmpex(&lk->count, n, -(int)0x7FFFFFFF) == n) { lk->holder = curproc; lk->count = -1; return(0); } continue; } if (n == -(int)0x7FFFFFFF) continue; if (lk->holder != curproc) break; #ifdef INVARIANTS if (n == -(int)0x7FFFFFFE) _qlock_ovf_panic(lk); #endif if (atomic_cmpex(&lk->count, n, n - 1) == n) return(0); } return(EBUSY); } /* * qlock_spin_rd * qlock_spin_wr * qlock_spin_mwr * * Obtain a shared lock, exclusive lock, or recursive-capable * exclusive lock. These calls spin until they can get the lock. */ static __inline void qlock_spin_rd(struct qlock *lk) { while (qlock_try_rd(lk) != 0) ; } static __inline void qlock_spin_wr(struct qlock *lk) { while (qlock_try_wr(lk) != 0) ; } static __inline void qlock_spin_mwr(struct qlock *lk) { while (qlock_try_mwr(lk) != 0) ; } /* * qlock_require_rd * qlock_require_wr * qlock_require_mwr * * Obtain a shared lock, exclusive lock, or recursive-capable * exclusive lock. We expect to be able to obtain the lock without * having to block, and panic if we cannot. */ static __inline void qlock_require_rd(struct qlock *lk) { if (qlock_try_rd(lk) != 0) panic("qlock: failed to obtain shared lock %p", lk); } static __inline void qlock_require_wr(struct qlock *lk) { if (qlock_try_wr(lk) != 0) panic("qlock: failed to obtain exclusive lock %p", lk); } static __inline void qlock_require_mwr(struct qlock *lk) { if (qlock_try_mwr(lk) != 0) panic("qlock: failed to obtain m-exclusive lock %p", lk); } /* * qlock_rd * qlock_wr * qlock_mwr * * Obtain a shared lock, exclusive lock, or recursive-capable * exclusive lock. These routines will block until they get * the requested lock. */ static __inline void qlock_rd(struct qlock *lk) { if (qlock_try_rd(lk) != 0) _qlock_rd(lk); } static __inline void qlock_wr(struct qlock *lk) { if (qlock_try_wr(lk) != 0) _qlock_wr(lk); } static __inline void qlock_mwr(struct qlock *lk) { if (qlock_try_mwr(lk) != 0) _qlock_mwr(lk); } /* * qlock_sleepfail_rd * qlock_sleepfail_wr * * Obtain a shared lock or an exclusive lock and include options for * catching signals and timeouts. Note that we do not support * recursive exclusive locks (yet). * * These routines will block until they get the requested lock. * 0 is returned if the lock was obtained, and an appropriate * error is returned otherwise (similar to lockmgr() locks). */ static __inline int qlock_sleepfail_rd(struct qlock *lk, char *wmesg, int catch, int timo) { if (qlock_try_rd(lk) != 0) return(_qlock_sleepfail_rd(lk, wmesg, catch, timo)); return(0); } static __inline int qlock_sleepfail_wr(struct qlock *lk, char *wmesg, int catch, int timo) { if (qlock_try_wr(lk) != 0) return(_qlock_sleepfail_wr(lk, wmesg, catch, timo)); return(0); } /* * qunlock_rd * qunlock_wr * qunlock * * Release a shared or exclusive lock. The qunlock() function can release * either type of lock while the qunlock_rd/wr functions can only release * a specific type of lock. * * Note that we do not bother clearing holder when the count transitions * to 0. If we were to do this, note that we would have to go through * the special count state to avoid SMP races with holder. * * These routines do not block. */ static __inline void qunlock_rd(struct qlock *lk) { for (;;) { int n; if ((n = lk->count) <= 0) _qunlock_rd_panic(lk); if (atomic_cmpex(&lk->count, n, n - 1) == n) { if (n == 1 && lk->waiting) { lk->waiting = 0; wakeup(lk); } break; } } } static __inline void qunlock_wr(struct qlock *lk) { for (;;) { int n; if ((n = lk->count) >= 0) _qunlock_wr_panic(lk); if (atomic_cmpex(&lk->count, n, n + 1) == n) { if (n == -1 && lk->waiting) { lk->waiting = 0; wakeup(lk); } break; } } } static __inline void qunlock(struct qlock *lk) { for (;;) { int n; int xn; if ((n = lk->count) == 0) _qunlock_wr_panic(lk); if (n < 0) xn = n + 1; else xn = n - 1; if (atomic_cmpex(&lk->count, n, xn) == n) { if (xn == 0 && lk->waiting) { lk->waiting = 0; wakeup(lk); } break; } } } /* * qlockfree: * * Free a lock. At the moment all we need to do is check that nobody * is holding the lock, and panic if someone is. */ static __inline void qlockfree(struct qlock *lk) { if (lk->count != 0) panic("freed lock %p is locked", lk); } /* * qlockabscount: * * Return the current lock count as a positive value. The number of * held shared or exclusive locks is returned. A value of 0 indicates * that no locks are being held. */ static __inline int qlockabscount(struct qlock *lk) { if (lk->count < 0) return(-lk->count); else return(lk->count); } /* * qlockcount: * * Return the current lock count. A positive value indicates that N * shared locks are being held. A negative value indicates that -N * exclusive locks are being held (N can only be < -1 if recursive * exclusive locks are being held). A value of 0 indicates that no * locks are being held. */ static __inline int qlockcount(struct qlock *lk) { return(lk->count); } /* * qlock_try_upgrade shared -> exclusive * qlock_upgrade shared -> exclusive * qlock_downgrade exclusive -> shared * * Upgrade or downgrade an existing lock. Recursion is NOT supported * for either. Also note that in order to avoid odd deadlock situations, * qlock_upgrade will release the currently held shared lock prior to * attempting to obtain the new exclusive lock. qlock_try_upgrade() * on the otherhand only succeeds if it is able to upgrade the lock * atomically. * * When upgrading a lock, we have to deal with a race between the * lock holder field and the count going negative. We do this by * staging the count through a special value. */ static __inline int qlock_try_upgrade(struct qlock *lk) { if (lk->count == 1) { if (atomic_cmpex(&lk->count, 1, -(int)0x7FFFFFFF) == 1) { lk->holder = curproc; lk->count = -1; return(0); } } return(EBUSY); } static __inline void qlock_upgrade(struct qlock *lk) { if (qlock_try_upgrade(lk) == EBUSY) _qlock_upgrade(lk); } static __inline void qlock_downgrade(struct qlock *lk) { int n; if ( (n = lk->count) != -1 || lk->holder != curproc || atomic_cmpex(&lk->count, n, 1) != n ) { _qlock_op_panic(lk); } if (lk->waiting) { lk->waiting = 0; wakeup(lk); } } #endif #endif /* !_QLOCK_H_ */ /* * KERN/KERN_QLOCK.C * * (c)Copyright Matthew Dillon, 1999. BSD copyright /usr/src/COPYRIGHT is * to apply with the exception that the author's name, "Matthew Dillon", is to * replace the "Regents of the University of California". * * Provides highly efficient SMP-capable shared & exclusive locks. Most of * the meat of the code is in sys/qlock.h. These locks are designed to * operate efficiently in either an SMP or non-SMP environment. A minimum * number of instructions and conditions are utilized to implement the locks. * * Please note that while we do not do it here, it is possible to generate * inlines that support single-entry-point calls with an internal switch. * If the lock type is passed to such an inline the compiler will be able * to optimize each instance to a single case statment in the switch, * producing optimal code. * * These locks are not meant to be complex and additional functionality should * not be added to them if it would effect the efficiency of existing calls. * error returns are close to what lockmgr() gives us. We use hybrid inline * functions to make the common-case as fast and tight as possible. * * "recursion" is defined as being able to obtain multiple locks of the same * type from the same process context. Note that qlocks allow only shared or * only exclusive locks to be associated with the structure at any given * moment, using negative values to count exclusive locks and positive values * to count shared locks. * * "blocks" is defined to mean that the call may block. * * Two types of locks are supported: shared and exclusive. Shared locks * usually support recursion, Exclusive locks can optionally support * recursion. The more esoteric functions (upgrades, downgrades, ...) * generally cannot support recursion. The general unlock code always * supports recursion. * * call recurs blocks type * ----- ----- ----- ----- * * qlock_try_rd inline yes no shared * qlock_try_wr inline no no exclusive * qlock_try_mwr inline yes no exclusive * * qlock_spin_rd inline yes spins shared * qlock_spin_wr inline no spins exclusive * qlock_spin_mwr inline yes spins exclusive * * qlock_require_rd inline yes no(1) shared * qlock_require_wr inline no no(1) exclusive * qlock_require_mwr inline yes no(1) exclusive * * note (1): guarentees lock is obtained non-blocking, * panics if it cannot be obtained. * * qlock_rd hybrid yes yes shared * qlock_wr hybrid no yes exclusive * qlock_mwr hybrid yes yes exclusive * * qlock_sleepfail_rd hybrid no yes(2) shared * qlock_sleepfail_wr hybrid no yes(2) exclusive * * note (2): if the call blocks, it does not attempt to * retry the lock prior to returning. * * qunlock_rd inline yes no shared * qunlock_wr inline yes no exclusive * qunlock inline yes no shared or exclusive * * qlock_upgrade hybrid no(3) yes shared->exclusive * qlock_downgrade inline no(3) no exclusive->shared * * note (3): NOTE! these routines cannot operate with * recursive locks, and the lock may be lost for a period * within a qlock_upgrade() call. * * qlockfree inline * qlockabscount inline */ #include #include #include #include /* * qlock_init: * * Initialize a qlock. At the moment we need only zero the fields, * but this may change in the future. */ void qlock_init(struct qlock *lk) { lk->count = 0; lk->waiting = 0; lk->holder = NULL; } #if 0 /* * Inlines are available for these, but sometimes it is more efficient to * call a subroutine to deal with it. */ void qlock_call_rd(struct qlock *lk) { qlock_rd(lk); } void qlock_call_wr(struct qlock *lk) { qlock_wr(lk); } #endif /* * These core routines are only called from sys/qlock.h and should never be * called directly. They typically implement the non-trivial cases for the * inlines. */ void _qlock_rd(struct qlock *lk) { for (;;) { asleep(lk, PRIBIO + 4, "qlkrd", 0); lk->waiting = 1; if (qlock_try_rd(lk) == 0) return; await(-1, -1); } } void _qlock_wr(struct qlock *lk) { for (;;) { asleep(lk, PRIBIO + 4, "qlkwr", 0); lk->waiting = 1; if (qlock_try_wr(lk) == 0) return; await(-1, -1); } } void _qlock_mwr(struct qlock *lk) { for (;;) { asleep(lk, PRIBIO + 4, "qlkwr", 0); lk->waiting = 1; if (qlock_try_mwr(lk) == 0) return; await(-1, -1); } } int _qlock_sleepfail_rd(struct qlock *lk, char *wmesg, int catch, int timo) { int r = 0; asleep(lk, (PRIBIO + 4) | catch, wmesg, timo); lk->waiting = 1; if (qlock_try_rd(lk) != 0) { r = await(-1, -1); if (r == 0) r = ENOLCK; } return(r); } int _qlock_sleepfail_wr(struct qlock *lk, char *wmesg, int catch, int timo) { int r = 0; asleep(lk, (PRIBIO + 4) | catch, wmesg, timo); lk->waiting = 1; if (qlock_try_wr(lk) != 0) { r = await(-1, -1); if (r == 0) r = ENOLCK; } return(r); } void _qlock_upgrade(struct qlock *lk) { /* * First release the existing shared lock */ for (;;) { int n = lk->count; if (n <= 0) _qlock_op_panic(lk); if (atomic_cmpex(&lk->count, n, n - 1) == n) break; } /* * Then obtain a new exclusive lock */ qlock_wr(lk); } /* * The panic functions collapse the code overhead into one place, reducing * the codespace wasteage of the inlines. */ void _qlock_ovf_panic(struct qlock *lk) { panic("qlock_rd/wr: %p count overflow", lk); } void _qlock_op_panic(struct qlock *lk) { panic("qlock: %p illegal operation on lock state %d", lk, lk->count); } void _qunlock_rd_panic(struct qlock *lk) { panic("qunlock_rd: %p not holding shared lock", lk); } void _qunlock_wr_panic(struct qlock *lk) { panic("qunlock_rd: %p not holding exclusive lock", lk); } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 14:34:49 1999 Delivered-To: freebsd-smp@freebsd.org Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38]) by hub.freebsd.org (Postfix) with ESMTP id EAF9914C0B for ; Sun, 27 Jun 1999 14:34:47 -0700 (PDT) (envelope-from julian@whistle.com) Received: from current1.whistle.com (current1.whistle.com [207.76.205.22]) by alpo.whistle.com (8.9.1a/8.9.1) with SMTP id OAA94120; Sun, 27 Jun 1999 14:34:46 -0700 (PDT) Date: Sun, 27 Jun 1999 14:34:40 -0700 (PDT) From: Julian Elischer To: Matthew Dillon Cc: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review In-Reply-To: <199906272024.NAA15634@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sun, 27 Jun 1999, Matthew Dillon wrote: > I would like to know what the SMP gurus think of this code. I have > not posted the complete patch, but have included the meat of the code. > I am interested in comments relating to the implementation & performance > aspects of the code. I would also appreciate it if an inline assembly > expert could look at my inline assembly. I have tested it well and > I believe it to be correct, but I don't write much inline assembly so... > > This code is designed to implement extremely low overhead SMP-compatible > locks. It will probably be used instead of lockmgr() locks for the > buffer cache, and I hope to use it as a basis to replace lockmgr() locks > in other modules later on. The same structure can be used to implement > both spin locks and normal blocking locks, and can handle shared, > exclusive, and recursive-exclusive lock types. The critical path has > been optimize down to 20 instructions or so ( compared to the hundreds > in lockmgr() ). > > As many of you may already know, lockmgr() locks are extremely expensive > and require interrupt synchronization for their internal simplelocks to > operate properly in a mixed process/interrupt environment. The result is > a horrendous amount of overhead for very little gain. I am attempting > to come up with a general locking module that can eventually replace > lockmgr() locks. > > -Matt > [...] This brings up a couple of points.. 1/ Who wrote the existing lock manager, and why did they do it the way they did?. When soemone goes to so much trouble over something it's usually for a reason. 2/ Before we rush off and implement a new set of locking primatives, it might be a good idea to look at the locking primatives of a few other OS's.. For example Linux and MACH, and if we can get hold of them, Solaris and maybe the exokernel. (and sprite) 3/ This is not to say that what you have done is bad, but that there is something to be said for not being gratuitously different. It might also be the case that the other BSDs might be doing something with locks. 4/ If you want to implement these locks it really is up to you to do a quick scan of what the "state of the art" is in this field. I think I can dig up MACH2.5 (OSF1/digital unix/true64) locks maybe we should make a call for everyone on the freebsd lists to see what they can find in their own back yards. Of course the use of await() gives a good head start for us.. julian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 15:53:37 1999 Delivered-To: freebsd-smp@freebsd.org Received: from chai.torrentnet.com (chai.torrentnet.com [198.78.51.73]) by hub.freebsd.org (Postfix) with ESMTP id 2491F1515D for ; Sun, 27 Jun 1999 15:53:34 -0700 (PDT) (envelope-from bakul@torrentnet.com) Received: from chai.torrentnet.com (localhost [127.0.0.1]) by chai.torrentnet.com (8.8.8/8.8.5) with ESMTP id SAA04385; Sun, 27 Jun 1999 18:53:30 -0400 (EDT) Message-Id: <199906272253.SAA04385@chai.torrentnet.com> To: Julian Elischer Cc: Matthew Dillon , freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review In-reply-to: Your message of "Sun, 27 Jun 1999 14:34:40 PDT." Date: Sun, 27 Jun 1999 18:53:30 -0400 From: Bakul Shah Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > 2/ Before we rush off and implement a new set of locking primatives, > it might be a good idea to look at the locking primatives of a few > other OS's.. For example Linux and MACH, and if we can get hold of > them, Solaris and maybe the exokernel. (and sprite) > > 3/ This is not to say that what you have done is bad, but that there > is something to be said for not being gratuitously different. It > might also be the case that the other BSDs might be doing something > with locks. > > 4/ If you want to implement these locks it really is up to you to do > a quick scan of what the "state of the art" is in this field. > > I think I can dig up MACH2.5 (OSF1/digital unix/true64) locks > > maybe we should make a call for everyone on the freebsd lists to > see what they can find in their own back yards. Of course the use > of await() gives a good head start for us.. Has anyone taken a serious look at wait-free or lock-free or NonBlocking Synchronization (NBS) primitives? There are papers on this going back 27 years or so. Recent papers by Michael Greenwald & David Cheriton, Michael Scott etc. may be more accessible. The basic idea is to rely on a processor/hardware provided atomic compare-and-swap or double compare-and-swap instruction to update data structures atomically. If the update does not succeed you retry or do something else (but the data structure remains consistent either way). You can think of a spinlock as a trivial example of this but the idea is to *not* use any sort of lock when there is a high likelyhood of success with NBS. It is not a panacea but it does seem to have other benefits like no priority inversion, higher degree of usable concurrency etc. You would need to built primitives for mutual exclusion on top of that for tasks that depend on other entities (e.g. disk access). IMHO any use of NBS would likely result in a redesign rather than an addition of yet another way to synchronize so it is not for the faint of heart. Henry Massalin's thesis on the Synthesis kernel would also be useful reading. It is full of so many interesting ideas I am amazed no one has stolen most of them as yet! -- bakul To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 16: 8:35 1999 Delivered-To: freebsd-smp@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 39C7B151DA for ; Sun, 27 Jun 1999 16:08:28 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id TAA21317; Sun, 27 Jun 1999 19:07:33 -0400 (EDT) Date: Sun, 27 Jun 1999 19:07:33 -0400 (EDT) From: Daniel Eischen Message-Id: <199906272307.TAA21317@pcnet1.pcnet.com> To: dillon@apollo.backplane.com, julian@whistle.com Subject: Re: high-efficiency SMP locks - submission for review Cc: freebsd-smp@FreeBSD.ORG Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > This brings up a couple of points.. > > 2/ Before we rush off and implement a new set of locking primatives, > it might be a good idea to look at the locking primatives of a few > other OS's.. For example Linux and MACH, and if we can get hold of > them, Solaris and maybe the exokernel. (and sprite) The Vahalia book (UNIX Internals - The New Frontiers) has a pretty good synopsis of locking systems used by various OSs (see chapter 7). At least from the programmers interface, I really like the Solaris API (kernel mutexes and condition variables) - they are well understood and easy to use. It would be nice to get rid of the spl's and replace them with spl aware kernel mutexes/condtion variables. Dan Eischen eischen@vigrid.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 16:30:19 1999 Delivered-To: freebsd-smp@freebsd.org Received: from osgroup.com (unknown [38.229.41.6]) by hub.freebsd.org (Postfix) with ESMTP id 6F3D91518E for ; Sun, 27 Jun 1999 16:30:16 -0700 (PDT) (envelope-from stan@osgroup.com) Received: from stan166 ([38.229.41.237]) by osgroup.com (8.7.6/8.6.12) with SMTP id SAA31902 for ; Sun, 27 Jun 1999 18:19:18 -0500 Received: by localhost with Microsoft MAPI; Sun, 27 Jun 1999 18:32:20 -0500 Message-ID: <01BEC0CB.642E90E0.stan@osgroup.com> From: Constantine Shkolny Reply-To: "stan@osgroup.com" To: "freebsd-smp@FreeBSD.ORG" Subject: RE: high-efficiency SMP locks - submission for review Date: Sun, 27 Jun 1999 18:32:19 -0500 X-Mailer: Microsoft Internet E-mail/MAPI - 8.0.0.4211 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sunday, June 27, 1999 5:54 PM, Bakul Shah [SMTP:bakul@torrentnet.com] wrote: > Has anyone taken a serious look at wait-free or lock-free or > NonBlocking Synchronization (NBS) primitives? There are > papers on this going back 27 years or so. Recent papers by > Michael Greenwald & David Cheriton, Michael Scott etc. may be > more accessible. > > The basic idea is to rely on a processor/hardware provided > atomic compare-and-swap or double compare-and-swap > instruction to update data structures atomically. If the > update does not succeed you retry or do something else (but > the data structure remains consistent either way). This is how NT does this. The folks who wrote it must have had the papers in their library :-) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 21:22:38 1999 Delivered-To: freebsd-smp@freebsd.org Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by hub.freebsd.org (Postfix) with ESMTP id B4C49152DE for ; Sun, 27 Jun 1999 21:22:34 -0700 (PDT) (envelope-from alc@cs.rice.edu) Received: (from alc@localhost) by cs.rice.edu (8.9.0/8.9.0) id XAA18408 for smp@freebsd.org; Sun, 27 Jun 1999 23:22:33 -0500 (CDT) Date: Sun, 27 Jun 1999 23:22:33 -0500 From: Alan Cox To: smp@freebsd.org Subject: try this... Message-ID: <19990627232233.K2738@cs.rice.edu> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="J2SCkAp4GZ/dPZZf" X-Mailer: Mutt 0.95.5us Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org --J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=us-ascii This patch eliminates pointless lock acquires/releases from critical sections in the ipl code that consist of single atomic read-modify-write instruction. When I hear a few positive reports, I'll commit this patch. Alan --J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ipl_funcs.c.patch" Index: i386/isa/ipl_funcs.c =================================================================== RCS file: /home/ncvs/src/sys/i386/isa/ipl_funcs.c,v retrieving revision 1.20 diff -c -r1.20 ipl_funcs.c *** ipl_funcs.c 1999/05/09 23:40:29 1.20 --- ipl_funcs.c 1999/06/27 18:30:02 *************** *** 35,62 **** /* * The volatile bitmap variables must be set atomically. This normally * involves using a machine-dependent bit-set or `or' instruction. */ - #ifndef SMP - #define DO_SETBITS(name, var, bits) \ void name(void) \ { \ setbits(var, bits); \ } #else /* !SMP */ ! ! #define DO_SETBITS(name, var, bits) \ ! void name(void) \ ! { \ ! IFCPL_LOCK(); \ ! setbits(var, bits); \ ! IFCPL_UNLOCK(); \ } - #endif /* !SMP */ - DO_SETBITS(setdelayed, &ipending, loadandclear(&idelayed)) DO_SETBITS(setsoftast, &ipending, SWI_AST_PENDING) DO_SETBITS(setsoftcamnet,&ipending, SWI_CAMNET_PENDING) DO_SETBITS(setsoftcambio,&ipending, SWI_CAMBIO_PENDING) --- 35,60 ---- /* * The volatile bitmap variables must be set atomically. This normally * involves using a machine-dependent bit-set or `or' instruction. + * + * If setbits is atomic, this is MP-safe. */ #define DO_SETBITS(name, var, bits) \ void name(void) \ { \ setbits(var, bits); \ } + #ifndef SMP + DO_SETBITS(setdelayed, &ipending, loadandclear(&idelayed)) #else /* !SMP */ ! void setdelayed(void) ! { ! IFCPL_LOCK(); ! setbits(&ipending, loadandclear(&idelayed)); ! IFCPL_UNLOCK(); } #endif /* !SMP */ DO_SETBITS(setsoftast, &ipending, SWI_AST_PENDING) DO_SETBITS(setsoftcamnet,&ipending, SWI_CAMNET_PENDING) DO_SETBITS(setsoftcambio,&ipending, SWI_CAMBIO_PENDING) *************** *** 71,84 **** DO_SETBITS(schedsofttty, &idelayed, SWI_TTY_PENDING) DO_SETBITS(schedsoftvm, &idelayed, SWI_VM_PENDING) - #ifndef SMP - unsigned softclockpending(void) { return (ipending & SWI_CLOCK_PENDING); } #define GENSPL(name, set_cpl) \ unsigned name(void) \ { \ --- 69,82 ---- DO_SETBITS(schedsofttty, &idelayed, SWI_TTY_PENDING) DO_SETBITS(schedsoftvm, &idelayed, SWI_VM_PENDING) unsigned softclockpending(void) { return (ipending & SWI_CLOCK_PENDING); } + #ifndef SMP + #define GENSPL(name, set_cpl) \ unsigned name(void) \ { \ *************** *** 142,160 **** #define POSTCODE_LO(X) #define POSTCODE_HI(X) #endif /* SPL_DEBUG_POSTCODE */ - - - unsigned - softclockpending(void) - { - unsigned x; - - IFCPL_LOCK(); - x = ipending & SWI_CLOCK_PENDING; - IFCPL_UNLOCK(); - - return (x); - } /* --- 140,145 ---- --J2SCkAp4GZ/dPZZf-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 22: 4:12 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id C101314F2C for ; Sun, 27 Jun 1999 22:04:08 -0700 (PDT) (envelope-from tlambert@usr04.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id WAA08612; Sun, 27 Jun 1999 22:04:07 -0700 (MST) Received: from usr04.primenet.com(206.165.6.204) via SMTP by smtp03.primenet.com, id smtpd008550; Sun Jun 27 22:03:58 1999 Received: (from tlambert@localhost) by usr04.primenet.com (8.8.5/8.8.5) id WAA26806; Sun, 27 Jun 1999 22:03:54 -0700 (MST) From: Terry Lambert Message-Id: <199906280503.WAA26806@usr04.primenet.com> Subject: Re: high-efficiency SMP locks - submission for review To: dillon@apollo.backplane.com (Matthew Dillon) Date: Mon, 28 Jun 1999 05:03:54 +0000 (GMT) Cc: freebsd-smp@FreeBSD.ORG In-Reply-To: <199906272024.NAA15634@apollo.backplane.com> from "Matthew Dillon" at Jun 27, 99 01:24:01 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I would like to know what the SMP gurus think of this code. I have > not posted the complete patch, but have included the meat of the code. > I am interested in comments relating to the implementation & performance > aspects of the code. I would also appreciate it if an inline assembly > expert could look at my inline assembly. I have tested it well and > I believe it to be correct, but I don't write much inline assembly so... > > This code is designed to implement extremely low overhead SMP-compatible > locks. It will probably be used instead of lockmgr() locks for the > buffer cache, and I hope to use it as a basis to replace lockmgr() locks > in other modules later on. The same structure can be used to implement > both spin locks and normal blocking locks, and can handle shared, > exclusive, and recursive-exclusive lock types. The critical path has > been optimize down to 20 instructions or so ( compared to the hundreds > in lockmgr() ). > > As many of you may already know, lockmgr() locks are extremely expensive > and require interrupt synchronization for their internal simplelocks to > operate properly in a mixed process/interrupt environment. The result is > a horrendous amount of overhead for very little gain. I am attempting > to come up with a general locking module that can eventually replace > lockmgr() locks. The lockmgr() locks are ill suited to use for anything other than advisory/mandatory file locking, IMO. Any move away from them is a good thing. However, I have some comment on the implementation. 1) SMP locking should be done utilizing intention modes. I had a nice discussion with Mike Smith and Julian over the proposed locking mechanism for non-intention mode multiple reader, single writer locks for the buffer cache. This boiled down to: A) Use of non-intention mode locks has a serious serialization penalty. B) Use of locks, where the write queue has a shared intention exclusive (SIX) lock implicit to the enqueueing of synchronization point data (e.g. a soft dependency on a directory entry block with one backed out operation) _significantly_ reduces the serialization overhead. C) There's really no reason that the blocks that are not locked exclusive on the list can't be updated by user processes. This resolves the Ziff-Davis Labs benchmark issue, more cleanly than the way that Mike stated Kirk proposed to solve the problem. 2) In general, as regards, SMP locking, I think that your lock structure is too abbreviated: > struct qlock { > volatile int count; /* lock refs: neg=exclusive, pos=shared */ > int waiting; /* process(s) are waiting on lock */ > struct proc *holder; /* holder of an exclusive lock */ > }; In particular, the use of a single wait count means that you will not have an ordered list for the equivalent of a "wake one". This is a real problem, since it allows for a deadly embrace deadlock to occur when two kernel contexts each hold one lock and want to acquire the other contexts lock. I believe that resolving this as an EWOULDBLOCK, and then backtracking the stack and any partially complete operations is prohibitive. The use of a count, rather than a relationship between the loc holder(s) and the things being locked is also similarly problematic. Finally, there is insufficient mechanism to avoid competition starvation, where a write blocks indefinitely as multiple readers pass the resource between each other. I believe the following is the minimal set of structures required to resolve the blocking operation inheritance and deadlock detection: /* * */ struct lockable; typedef struct lockable LKA; struct lockingentity; typedef struct lockingentity LKE; struct locklistentry; typedef struct locklistentry LKLE; struct lockentitylistentry; typedef struct lockentitylistentry LKEE; struct lockentitylistentry { LKEE *next; /* next list entry*/ LKEE *inherit; /* who inherits this dependency*/ LKE *entity; /* entity waiting on us*/ }; struct locklistentry { LKLE *next; /* next lock in list*/ LKA *held; /* a held lock*/ }; struct lockingentity { LKLE *holding; /* locks that entity is holding*/ LKA *waiton; /* lock that entity is waiting on*/ }; struct lockable { LKEE *holds; /* the locking entities*/ LKEE *waits; /* entities waiting for the lock*/ }; This presumes a model where each context which wishes to acquire a lock is considered a lockingentity (this works equally well for kernel threads vs. processes vs. async call gates, vs. pseudo async call gates ala lazy thread context creation for kernel sleeps ala BSDI). Each object that can be locked is considered a lockable, and the relationship between a locking entity and a lockable is a locklistentry. The lockentitylistentry is used to inherit blocked, pending operations to the root of the lock tree (graph). This allows near instantaneous deadlock detection, and the inheritance list combined with the graph represents the transitive closure patch between any lock you propose, and the locks which already exist. Finally, one could envision a hierarchical relationship between lockingentities, e.g. multiple threads within a process, so as to avoid self-deadlock. This really depends on your threads implemenation; obviously, a user space call conversion mechanism backed by multiple kernel threads, and implemented wholly using asynchronous kernel entries (e.g. an async call gate) would be immune to the requirement. Other implementations would need something like: /* * iml.c * * An intention mode based lock manager * * This software uses graph theory. A graph is a mathematical object * that can accurately model a problem formulated in terms of objects * and the relationships between them. Lock management is one such * problem. I recommend: * * Algorithms in C++ * Robert Sedgewick * Addison-Wesley Publishing Company * ISBN 0-201-51059-6 * * The Art Of computer Programming * _Volume 1: Fundamental Algorithms_ * Donald Knuth * Addison-Wesley Publishing Company * ISBN 0-201-03809-9 * * UNIX Systems for Modern Architectures * _Symmetric Multiprocessing and Caching for Kernel Programmers_ * Curt Schimmel * Addison-Wesley Publishing Company * ISBN 0-201-63338-8 * */ struct lockable; typedef struct lockable IML_L; struct lockentity; typedef struct lockentity IML_E; struct locklock; typedef struct locklock IML_LOCK; /* * A node that can have a lock applied against it */ struct lockable { IML_L *parent; /* our parent locakable*/ IML_L *sibling; /* sibling lockable(s), if any*/ IML_L *child; /* child lockable(s), if any*/ IML_LOCK *locks; /* locks that are held*/ IML_LOCK *waits; /* locks that are waiting*/ unsigned long nesting_lvl; }; /* * A lock entity node for holding locks and entity relationships */ struct lockentity { IML_E *sibling; /* entites with a relationship to us*/ IML_LOCK *locks; /* the list of locks we hold*/ }; /* * A lock instance */ struct locklock { IML_E *entity; /* the entity that applied the lock*/ IML_L *lockable; /* where the lock was applied*/ IML_LOCK *enextlock; /* the next lock by the entity*/ IML_LOCK *lnextlock; /* the next lock on the lockable*/ unsigned long mode; /* the mode of this lock*/ }; Note: This doesn't deal with the inheritance issue for waiters, which would use a structure similar to the structure from the previous example. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 22:10:55 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 9BF7E14F2C for ; Sun, 27 Jun 1999 22:10:52 -0700 (PDT) (envelope-from tlambert@usr04.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id WAA09420; Sun, 27 Jun 1999 22:10:51 -0700 (MST) Received: from usr04.primenet.com(206.165.6.204) via SMTP by smtp02.primenet.com, id smtpd009383; Sun Jun 27 22:10:45 1999 Received: (from tlambert@localhost) by usr04.primenet.com (8.8.5/8.8.5) id WAA27040; Sun, 27 Jun 1999 22:10:44 -0700 (MST) From: Terry Lambert Message-Id: <199906280510.WAA27040@usr04.primenet.com> Subject: Re: high-efficiency SMP locks - submission for review To: julian@whistle.com (Julian Elischer) Date: Mon, 28 Jun 1999 05:10:44 +0000 (GMT) Cc: dillon@apollo.backplane.com, freebsd-smp@FreeBSD.ORG In-Reply-To: from "Julian Elischer" at Jun 27, 99 02:34:40 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > This brings up a couple of points.. > > 1/ Who wrote the existing lock manager, and why did they do > it the way they did?. When soemone goes to so much trouble > over something it's usually for a reason. It appears to be a move toward moving file locking out of the VOP_ADVLOCK area and onto the vnode, instead of hung off the backing object. The abstraction is incomplete, but the reasoning is obvious, given the large number of locking semantics (flock, fcntl, O_EXCL, NFS LEASES, etc.) that need to interoperate and protect the backing object, regardless of the access path or mechanism. > 2/ Before we rush off and implement a new set of locking primatives, > it might be a good idea to look at the locking primatives of a few > other OS's.. For example Linux and MACH, and if we can get hold of > them, Solaris and maybe the exokernel. (and sprite) I would suggest RealTime multiprocessor systems, and embedded database designs, actually. > 3/ This is not to say that what you have done is bad, but that there > is something to be said for not being gratuitously different. It > might also be the case that the other BSDs might be doing something > with locks. BSDI is really the only important work in the BSD arena, if one discounts the 4.3 derived MP stuff from Sequent and SunOS 4.1.3_U2 that was done for Futitsu and some large ISPs. > 4/ If you want to implement these locks it really is up to you to do > a quick scan of what the "state of the art" is in this field. > > I think I can dig up MACH2.5 (OSF1/digital unix/true64) locks > > maybe we should make a call for everyone on the freebsd lists to > see what they can find in their own back yards. Of course the use > of await() gives a good head start for us.. This is a good point that should not be skipped. The lazy task creation in the BSDI kernel is a technique which is over 10 years old right now, and was last published in the public literature about 1991. It would be nice if whatever was finally integrated was less than 10 years out of date... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 22:14: 7 1999 Delivered-To: freebsd-smp@freebsd.org Received: from pallas.veritas.com (pallas.veritas.com [204.177.156.25]) by hub.freebsd.org (Postfix) with ESMTP id 39B1814F2C for ; Sun, 27 Jun 1999 22:14:04 -0700 (PDT) (envelope-from aaron@sigma.veritas.com) Received: from megami.veritas.com (megami.veritas.com [192.203.46.101]) by pallas.veritas.com (8.9.1a/8.9.1) with SMTP id WAA16547; Sun, 27 Jun 1999 22:14:48 -0700 (PDT) Received: from sigma.veritas.com([192.203.46.125]) (2116 bytes) by megami.veritas.com via sendmail with P:esmtp/R:smart_host/T:smtp (sender: ) id for ; Sun, 27 Jun 1999 22:13:56 -0700 (PDT) (Smail-3.2.0.101 1997-Dec-17 #3 built 1999-Jan-25) Received: from sigma (localhost [127.0.0.1]) by sigma.veritas.com (8.9.2/8.9.1) with ESMTP id WAA48515; Sun, 27 Jun 1999 22:13:56 -0700 (PDT) (envelope-from aaron@sigma.veritas.com) Message-Id: <199906280513.WAA48515@sigma.veritas.com> From: Aaron Smith To: Daniel Eischen Cc: dillon@apollo.backplane.com, julian@whistle.com, freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review In-reply-to: Your message of "Sun, 27 Jun 1999 19:07:33 EDT." <199906272307.TAA21317@pcnet1.pcnet.com> Date: Sun, 27 Jun 1999 22:13:56 -0700 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sun, 27 Jun 1999 19:07:33 EDT, Daniel Eischen writes: >> 2/ Before we rush off and implement a new set of locking primatives, >> it might be a good idea to look at the locking primatives of a few >> other OS's.. For example Linux and MACH, and if we can get hold of >> them, Solaris and maybe the exokernel. (and sprite) > >The Vahalia book (UNIX Internals - The New Frontiers) has a pretty >good synopsis of locking systems used by various OSs (see chapter >7). At least from the programmers interface, I really like the >Solaris API (kernel mutexes and condition variables) - they are >well understood and easy to use. i want to chime in and agree with this statement. i work on a commercial filesytem for (among other platforms) solaris; and i'd have to say that of the platforms i have been exposed to, solaris' kernel synch primitives are very comfortable to use. the function of an "rwlock" is immediately understood by anybody who understands reader-writer locks. mutex, condition variables, etc are all very accessible ideas. for this reason i think it's counterproductive to use opaque names such as "qlock". it's the same reason i have an issue with "lockmgr". i'm happy to see activity in this area! aaron To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 22:21:21 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (Postfix) with ESMTP id 8D905150D9 for ; Sun, 27 Jun 1999 22:21:15 -0700 (PDT) (envelope-from tlambert@usr04.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id WAA07042; Sun, 27 Jun 1999 22:21:15 -0700 (MST) Received: from usr04.primenet.com(206.165.6.204) via SMTP by smtp01.primenet.com, id smtpd007015; Sun Jun 27 22:21:07 1999 Received: (from tlambert@localhost) by usr04.primenet.com (8.8.5/8.8.5) id WAA27383; Sun, 27 Jun 1999 22:21:05 -0700 (MST) From: Terry Lambert Message-Id: <199906280521.WAA27383@usr04.primenet.com> Subject: Re: high-efficiency SMP locks - submission for review To: bakul@torrentnet.com (Bakul Shah) Date: Mon, 28 Jun 1999 05:21:05 +0000 (GMT) Cc: julian@whistle.com, dillon@apollo.backplane.com, freebsd-smp@FreeBSD.ORG In-Reply-To: <199906272253.SAA04385@chai.torrentnet.com> from "Bakul Shah" at Jun 27, 99 06:53:30 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > IMHO any use of NBS would likely result in a redesign rather > than an addition of yet another way to synchronize so it is > not for the faint of heart. This is an understatement. Most implementations which I am aware of require that you use the Djikstra soloution to the Bankers Algorithm, either explicitly (reservation of all needed resources as a single call, where no locks are granted if none are), or implicitly (you guarantee that your order of operation will never be inverted in order to avoid the possibility of deadlock. I am rather fond of incrementally precomputing the Warshall's on the lock graph, and then doing deadlock detection by implying an edge between the thing you are proposing to lock and the root of the graph (thus implying a Hamiltonian cycle if allowing the lock would result in a deadlock, with only a single traversal from the terminal object to the root being required to detect that a dealock would occur -- usually no more than 8 pointer dereferences for an average graph depth of 8). I also have a slight problem with relying on a test-and-set instruction any more complicated than that which can be implemented with P/V semaphores. Many processors (e.g. MIPS) don't have an atomic test and set, and you'd want to avoid architecting against them ever working. 8-(. > Henry Massalin's thesis on the Synthesis kernel would also be > useful reading. It is full of so many interesting ideas I am > amazed no one has stolen most of them as yet! Thanks for the referemce! I'm going digging now... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 22:36:53 1999 Delivered-To: freebsd-smp@freebsd.org Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by hub.freebsd.org (Postfix) with ESMTP id 84E5715356 for ; Sun, 27 Jun 1999 22:36:51 -0700 (PDT) (envelope-from alc@cs.rice.edu) Received: (from alc@localhost) by cs.rice.edu (8.9.0/8.9.0) id AAA18924; Mon, 28 Jun 1999 00:36:39 -0500 (CDT) Date: Mon, 28 Jun 1999 00:36:39 -0500 From: Alan Cox To: Terry Lambert Cc: Bakul Shah , julian@whistle.com, dillon@apollo.backplane.com, freebsd-smp@freebsd.org Subject: Re: high-efficiency SMP locks - submission for review Message-ID: <19990628003639.N2738@cs.rice.edu> References: <199906272253.SAA04385@chai.torrentnet.com> <199906280521.WAA27383@usr04.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.5us In-Reply-To: <199906280521.WAA27383@usr04.primenet.com>; from Terry Lambert on Mon, Jun 28, 1999 at 05:21:05AM +0000 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, Jun 28, 1999 at 05:21:05AM +0000, Terry Lambert wrote: > > I also have a slight problem with relying on a test-and-set > instruction any more complicated than that which can be > implemented with P/V semaphores. Many processors (e.g. MIPS) > don't have an atomic test and set, and you'd want to avoid > architecting against them ever working. 8-(. > That is true. They, including MIPS and Alpha, have something better: Load-locked and store conditional. :-) I think this is a non-issue. Alan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jun 27 23:17:31 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 1FAFE15399 for ; Sun, 27 Jun 1999 23:17:29 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id XAA17806; Sun, 27 Jun 1999 23:17:27 -0700 (PDT) (envelope-from dillon) Date: Sun, 27 Jun 1999 23:17:27 -0700 (PDT) From: Matthew Dillon Message-Id: <199906280617.XAA17806@apollo.backplane.com> To: Alan Cox Cc: Terry Lambert , Bakul Shah , julian@whistle.com, freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review References: <199906272253.SAA04385@chai.torrentnet.com> <199906280521.WAA27383@usr04.primenet.com> <19990628003639.N2738@cs.rice.edu> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :On Mon, Jun 28, 1999 at 05:21:05AM +0000, Terry Lambert wrote: :> :> I also have a slight problem with relying on a test-and-set :> instruction any more complicated than that which can be :> implemented with P/V semaphores. Many processors (e.g. MIPS) :> don't have an atomic test and set, and you'd want to avoid :> architecting against them ever working. 8-(. :> : :That is true. They, including MIPS and Alpha, have something :better: Load-locked and store conditional. :-) : :I think this is a non-issue. : :Alan I think the key is to create a reasonably efficient primitive that can be used as a building block for more sophisticated functions. A simple test-and-set isn't quite powerful enough, because using it in a more complex setting requires multiple instances of the primitive. For example, you could use it to implement spin locks to protect counters but you could not use it to implement the lock counters directly. For this reason, I expect that using a compare-and-exchange primitive will be much more useful to us even if it does not devolve into a single instruction on some processors. Being a much more powerful mechanism, the single use of a compare-and-exchange primitive would yield approximately the *same* number of instructions as the multiple use of a test-and-set primitive on those processors that only support test-and-set, for those operations that require sophistication beyond what a test-and-set can give you. So, my feeling is that a compare-and-exchange primitive is optimal across all process types. In fact, both types of primitives are useful. But one should not throw away compare-and-exchange just to try to reach the lowest-common-denominator amoung cpus, because this severely degrades performance on those processors that support the more sophisticated primitives while at the same time does not significantly improve performance on those processors that only support test-and-set. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 0:22:57 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 6650D153B8 for ; Mon, 28 Jun 1999 00:22:56 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id AAA18055; Mon, 28 Jun 1999 00:22:54 -0700 (PDT) (envelope-from dillon) Date: Mon, 28 Jun 1999 00:22:54 -0700 (PDT) From: Matthew Dillon Message-Id: <199906280722.AAA18055@apollo.backplane.com> To: Aaron Smith Cc: Daniel Eischen , julian@whistle.com, freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review References: <199906280513.WAA48515@sigma.veritas.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :i want to chime in and agree with this statement. i work on a commercial :filesytem for (among other platforms) solaris; and i'd have to say that of :the platforms i have been exposed to, solaris' kernel synch primitives are :very comfortable to use. the function of an "rwlock" is immediately :understood by anybody who understands reader-writer locks. mutex, condition :variables, etc are all very accessible ideas. for this reason i think it's :counterproductive to use opaque names such as "qlock". it's the same reason :i have an issue with "lockmgr". : :i'm happy to see activity in this area! :aaron Well, I thought I was being specific in my naming... I don't have a qlock() function. I do have a qlock_rd() and a qlock_wr() function, as well as other flavors. qlock_try_rd(), qlock_spin_rd(), and so forth. But I'm not absolutely set in my naming - just as long as it isn't an all encompassing lockmgr() call, as you said :-) -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 0:42:42 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id A7EEC150D1 for ; Mon, 28 Jun 1999 00:42:39 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id AAA18132; Mon, 28 Jun 1999 00:42:37 -0700 (PDT) (envelope-from dillon) Date: Mon, 28 Jun 1999 00:42:37 -0700 (PDT) From: Matthew Dillon Message-Id: <199906280742.AAA18132@apollo.backplane.com> To: Terry Lambert Cc: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review References: <199906280503.WAA26806@usr04.primenet.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :1) SMP locking should be done utilizing intention modes. I : had a nice discussion with Mike Smith and Julian over the : proposed locking mechanism for non-intention mode multiple : reader, single writer locks for the buffer cache. This : boiled down to: Yes, I agree completely. There are several states in the buffer structure management that could easily be separately locked. The compare-and-exchange primitive I am using as the basis for my qlocks can just as easily be used as the basis for an exclusive intention lock. :> struct qlock { :> volatile int count; /* lock refs: neg=exclusive, pos=shared */ :> int waiting; /* process(s) are waiting on lock */ :> struct proc *holder; /* holder of an exclusive lock */ :> }; : : In particular, the use of a single wait count means that you : will not have an ordered list for the equivalent of a "wake : one". This is a real problem, since it allows for a deadly : embrace deadlock to occur when two kernel contexts each hold : one lock and want to acquire the other contexts lock. : : I believe that resolving this as an EWOULDBLOCK, and then Well, I differ with you here. lockmgr() can't even handle that. Lock primitives that become too complex can no longer be categorized as primitives. That is one of the biggest problems with lockmgr(), in fact. What should be done instead is to separate the functionality of the more complex locking functions - such as those that deal with deadlock situations - because these locking functions have a considerable amount of overhead compared to lower level primitives. : The use of a count, rather than a relationship between : the loc holder(s) and the things being locked is also : similarly problematic. The 'count' can mean anything. In one set of functions it can be a shared/exclusive count, in another it can implement 32 structural zones. I am not attempting to produce an all-encompassing solution, what I am attempting to do is produce a framework which can be used to implement different locking types with a minimum of overhead. I've only implemented one: A basic shared/exclusive mechanism. : Finally, there is insufficient mechanism to avoid competition : starvation, where a write blocks indefinitely as multiple : readers pass the resource between each other. Yes, this is an issue -- but the solution in lockmgr() has only led to more esoteric deadlock situations and, I think, harmed performance as much as it has helped it. That isn't to say that we can't implement the same solution in qlocks, but the way I would do it would be by adding a new function, not modifying existing primitives. For example, qlock_wr() might not try to hold-off other shared locks, but qlock_hipri_wr() could. The biggest problem we face at the moment is that locks are being held for much too long a period of time. For example, locks are being held *through* I/O operations as a means of controlling access. This is precisely the wrong way to use a lock. The lock should be used to protect the data structure and then released. An I/O operation in progress should set a "the data is being messed with" flag and then release its lock, not attempt to hold the lock permanently. Or, as you mentioned, intention locks can be used to separate the I/O op from other types of operations. Many of the shared/exclusive problems go away ( or at least go into hiding ) when the locks are used properly. : I believe the following is the minimal set of structures : required to resolve the blocking operation inheritance and : deadlock detection: : :/* : * : */ : :struct lockable; :typedef struct lockable LKA; : :struct lockingentity; :typedef struct lockingentity LKE; : :struct locklistentry; :typedef struct locklistentry LKLE; : :struct lockentitylistentry; :typedef struct lockentitylistentry LKEE; : :struct lockentitylistentry { : LKEE *next; /* next list entry*/ : LKEE *inherit; /* who inherits this dependency*/ : LKE *entity; /* entity waiting on us*/ :}; : :struct locklistentry { : LKLE *next; /* next lock in list*/ : LKA *held; /* a held lock*/ :}; : : :struct lockingentity { : LKLE *holding; /* locks that entity is holding*/ : LKA *waiton; /* lock that entity is waiting on*/ :}; : :struct lockable { : LKEE *holds; /* the locking entities*/ : LKEE *waits; /* entities waiting for the lock*/ :}; : : This presumes a model where each context which wishes to : acquire a lock is considered a lockingentity (this works : equally well for kernel threads vs. processes vs. async Holy cow, that is very expensive. If I were to implement that sort of locking subsystem it would be at a higher level, it would not be a primitive. I tend to dislike complex locking solutions, but only because I strongly believe that it is possible to avoid the necessity of such by organizing code and algorithms properly. As lockmgr() has shown, trying to implement complex locking solutions in an SMP environment can become *very* expensive. So expensive that the complex locking solution winds up hurting performance more then a simpler solution would help performance. : Finally, one could envision a hierarchical relationship : between lockingentities, e.g. multiple threads within a : process, so as to avoid self-deadlock. This really : depends on your threads implemenation; obviously, a user : space call conversion mechanism backed by multiple kernel : threads, and implemented wholly using asynchronous kernel : entries (e.g. an async call gate) would be immune to the : requirement. Other implementations would need something : like: : : Terry Lambert : terry@lambert.org You are right on the money here. This is precisely the problem that the SMP implementation currently faces and is attempting to solve with spl-aware simple locks. In fact, the way spl*() ops work currently is very similar to obtaining multiple locks simultaniously. Again, I really hate having to implement complex solutions when it may be possible to obtain the same effect by reorganizing the functions that require the locking in the first place. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 3:57: 7 1999 Delivered-To: freebsd-smp@freebsd.org Received: from overcee.netplex.com.au (overcee.netplex.com.au [202.12.86.7]) by hub.freebsd.org (Postfix) with ESMTP id 6C4AD14DCA for ; Mon, 28 Jun 1999 03:57:01 -0700 (PDT) (envelope-from peter@netplex.com.au) Received: from netplex.com.au (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id 8E8F582; Mon, 28 Jun 1999 18:56:59 +0800 (WST) (envelope-from peter@netplex.com.au) X-Mailer: exmh version 2.0.2 2/24/98 To: Alan Cox Cc: Terry Lambert , Bakul Shah , julian@whistle.com, dillon@apollo.backplane.com, freebsd-smp@freebsd.org Subject: Re: high-efficiency SMP locks - submission for review In-reply-to: Your message of "Mon, 28 Jun 1999 00:36:39 EST." <19990628003639.N2738@cs.rice.edu> Date: Mon, 28 Jun 1999 18:56:59 +0800 From: Peter Wemm Message-Id: <19990628105659.8E8F582@overcee.netplex.com.au> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Alan Cox wrote: > On Mon, Jun 28, 1999 at 05:21:05AM +0000, Terry Lambert wrote: > > > > I also have a slight problem with relying on a test-and-set > > instruction any more complicated than that which can be > > implemented with P/V semaphores. Many processors (e.g. MIPS) > > don't have an atomic test and set, and you'd want to avoid > > architecting against them ever working. 8-(. > > > > That is true. They, including MIPS and Alpha, have something > better: Load-locked and store conditional. :-) > > I think this is a non-issue. > > Alan Actually, I have a bigger issue with it.. cmpxchgl etc doesn't exist on all x86 cpus. To make a kernel that boots on the current cpus (including the 486) we either have to conditionalize the inlines or use the universally available (and implicitly locked) xchg instruction - but that's a test-and-set style operation rather than atomic_cmpex. Cheers, -Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 9:11:56 1999 Delivered-To: freebsd-smp@freebsd.org Received: from lelu.lablan.com (banjo.cracktown.com [208.226.218.209]) by hub.freebsd.org (Postfix) with ESMTP id 830C91527F for ; Mon, 28 Jun 1999 09:11:45 -0700 (PDT) (envelope-from joeo@cracktown.com) Received: from localhost (joeo@localhost) by lelu.lablan.com (8.9.2/8.9.2) with SMTP id MAA11769 for ; Mon, 28 Jun 1999 12:15:12 -0400 (EDT) (envelope-from joeo@cracktown.com) X-Authentication-Warning: lelu.lablan.com: joeo owned process doing -bs Date: Mon, 28 Jun 1999 12:15:12 -0400 (EDT) From: Joe Orthoefer X-Sender: joeo@localhost To: freebsd-smp@freebsd.org Subject: netbsd-smp Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org The netbsd folk have recently established a tech-smp mailing list. I have no idea how active it is. Just a pointer. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 9:12: 6 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id F2BE715433 for ; Mon, 28 Jun 1999 09:11:58 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id JAA22481; Mon, 28 Jun 1999 09:11:47 -0700 (PDT) (envelope-from dillon) Date: Mon, 28 Jun 1999 09:11:47 -0700 (PDT) From: Matthew Dillon Message-Id: <199906281611.JAA22481@apollo.backplane.com> To: Peter Wemm Cc: Alan Cox , Terry Lambert , Bakul Shah , julian@whistle.com, freebsd-smp@freebsd.org Subject: Re: high-efficiency SMP locks - submission for review References: <19990628105659.8E8F582@overcee.netplex.com.au> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :Actually, I have a bigger issue with it.. cmpxchgl etc doesn't exist on :all x86 cpus. To make a kernel that boots on the current cpus (including :the 486) we either have to conditionalize the inlines or use the :universally available (and implicitly locked) xchg instruction - but that's :a test-and-set style operation rather than atomic_cmpex. : :Cheers, :-Peter My "Intel486 Processor Family" book - note the 486, lists the cmpxchgl instruction. Of course, I've never actually tried it on a 486. I dunno whether the 386 implements it, though. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 9:17:13 1999 Delivered-To: freebsd-smp@freebsd.org Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38]) by hub.freebsd.org (Postfix) with ESMTP id 1742015143 for ; Mon, 28 Jun 1999 09:17:10 -0700 (PDT) (envelope-from julian@whistle.com) Received: from current1.whistle.com (current1.whistle.com [207.76.205.22]) by alpo.whistle.com (8.9.1a/8.9.1) with SMTP id JAA19971; Mon, 28 Jun 1999 09:16:45 -0700 (PDT) Date: Mon, 28 Jun 1999 09:16:44 -0700 (PDT) From: Julian Elischer To: Matthew Dillon Cc: Peter Wemm , Alan Cox , Terry Lambert , Bakul Shah , freebsd-smp@freebsd.org Subject: Re: high-efficiency SMP locks - submission for review In-Reply-To: <199906281611.JAA22481@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I'd say that we probably wouldn't support SMP on 386 and 486 processors.. and in UP those locks that need atomicity would be optimised away. We WILL need locking in UP when we move to kernel threads, but that doesn't require bus atomicity. julian On Mon, 28 Jun 1999, Matthew Dillon wrote: > :Actually, I have a bigger issue with it.. cmpxchgl etc doesn't exist on > :all x86 cpus. To make a kernel that boots on the current cpus (including > :the 486) we either have to conditionalize the inlines or use the > :universally available (and implicitly locked) xchg instruction - but that's > :a test-and-set style operation rather than atomic_cmpex. > : > :Cheers, > :-Peter > > My "Intel486 Processor Family" book - note the 486, lists the cmpxchgl > instruction. Of course, I've never actually tried it on a 486. I dunno > whether the 386 implements it, though. > > -Matt > Matthew Dillon > > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 10:18:35 1999 Delivered-To: freebsd-smp@freebsd.org Received: from overcee.netplex.com.au (overcee.netplex.com.au [202.12.86.7]) by hub.freebsd.org (Postfix) with ESMTP id 1126014F19 for ; Mon, 28 Jun 1999 10:18:27 -0700 (PDT) (envelope-from peter@netplex.com.au) Received: from netplex.com.au (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id 3445882; Tue, 29 Jun 1999 01:18:23 +0800 (WST) (envelope-from peter@netplex.com.au) X-Mailer: exmh version 2.0.2 2/24/98 To: Matthew Dillon Cc: Alan Cox , Terry Lambert , Bakul Shah , julian@whistle.com, freebsd-smp@freebsd.org Subject: Re: high-efficiency SMP locks - submission for review In-reply-to: Your message of "Mon, 28 Jun 1999 09:11:47 MST." <199906281611.JAA22481@apollo.backplane.com> Date: Tue, 29 Jun 1999 01:18:23 +0800 From: Peter Wemm Message-Id: <19990628171823.3445882@overcee.netplex.com.au> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Matthew Dillon wrote: > :Actually, I have a bigger issue with it.. cmpxchgl etc doesn't exist on > :all x86 cpus. To make a kernel that boots on the current cpus (including > :the 486) we either have to conditionalize the inlines or use the > :universally available (and implicitly locked) xchg instruction - but that's > :a test-and-set style operation rather than atomic_cmpex. > : > :Cheers, > :-Peter > > My "Intel486 Processor Family" book - note the 486, lists the cmpxchgl > instruction. Of course, I've never actually tried it on a 486. I dunno > whether the 386 implements it, though. Ahh, right, silly me. I was thinking of the 8-byte version which is signified by the CX8 bit in cpuid. The 386, I doubt has it. There have been a couple of suggestions for ending the support for the 386 as it will simplify some ugly code for emulating kernel-mode write faults etc, but it's never happened. Apparently the 386 is common in some areas still. Cheers, -Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 10:40:13 1999 Delivered-To: freebsd-smp@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.40.131]) by hub.freebsd.org (Postfix) with ESMTP id 3F5D21539D for ; Mon, 28 Jun 1999 10:40:08 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.9.3/8.9.2) with ESMTP id TAA31904; Mon, 28 Jun 1999 19:38:19 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: Peter Wemm Cc: Matthew Dillon , Alan Cox , Terry Lambert , Bakul Shah , julian@whistle.com, freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review In-reply-to: Your message of "Tue, 29 Jun 1999 01:18:23 +0800." <19990628171823.3445882@overcee.netplex.com.au> Date: Mon, 28 Jun 1999 19:38:19 +0200 Message-ID: <31902.930591499@critter.freebsd.dk> From: Poul-Henning Kamp Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In message <19990628171823.3445882@overcee.netplex.com.au>, Peter Wemm writes: >The 386, I doubt has it. There have been a couple of suggestions for ending >the support for the 386 as it will simplify some ugly code for emulating >kernel-mode write faults etc, but it's never happened. Apparently the >386 is common in some areas still. People in the embedded business would kill us. -- Poul-Henning Kamp FreeBSD coreteam member phk@FreeBSD.ORG "Real hackers run -current on their laptop." FreeBSD -- It will take a long time before progress goes too far! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 12: 6:33 1999 Delivered-To: freebsd-smp@freebsd.org Received: from isbalham.ist.co.uk (isbalham.ist.co.uk [192.31.26.1]) by hub.freebsd.org (Postfix) with ESMTP id 951961542F for ; Mon, 28 Jun 1999 12:06:19 -0700 (PDT) (envelope-from rb@gid.co.uk) Received: from gid.co.uk (uucp@localhost) by isbalham.ist.co.uk (8.9.2/8.8.7) with UUCP id UAA65930; Mon, 28 Jun 1999 20:06:17 +0100 (BST) (envelope-from rb@gid.co.uk) Received: from [194.32.164.2] by seagoon.gid.co.uk; Mon, 28 Jun 1999 20:04:34 +0100 (BST) X-Sender: rb@194.32.164.1 Message-Id: In-Reply-To: <19990628171823.3445882@overcee.netplex.com.au> References: Your message of "Mon, 28 Jun 1999 09:11:47 MST." <199906281611.JAA22481@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Mon, 28 Jun 1999 20:04:31 +0000 To: Peter Wemm From: Bob Bishop Subject: Re: high-efficiency SMP locks - submission for review Cc: Matthew Dillon , Alan Cox , Terry Lambert , Bakul Shah , julian@whistle.com, freebsd-smp@FreeBSD.ORG Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hi, At 1:18 am +0800 29/6/99, Peter Wemm wrote: >[...] Apparently the 386 is common in some areas still. Inter alia, you can get them rad-hard at what some would describe as very reasonable prices. -- Bob Bishop (0118) 977 4017 international code +44 118 rb@gid.co.uk fax (0118) 989 4254 between 0800 and 1800 UK To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 12:29:51 1999 Delivered-To: freebsd-smp@freebsd.org Received: from zibbi.mikom.csir.co.za (zibbi.mikom.csir.co.za [146.64.24.58]) by hub.freebsd.org (Postfix) with ESMTP id F2A3A15175 for ; Mon, 28 Jun 1999 12:29:44 -0700 (PDT) (envelope-from jhay@zibbi.mikom.csir.co.za) Received: (from jhay@localhost) by zibbi.mikom.csir.co.za (8.9.3/8.9.3) id VAA16677; Mon, 28 Jun 1999 21:26:56 +0200 (SAT) (envelope-from jhay) From: John Hay Message-Id: <199906281926.VAA16677@zibbi.mikom.csir.co.za> Subject: Re: high-efficiency SMP locks - submission for review In-Reply-To: <19990628171823.3445882@overcee.netplex.com.au> from Peter Wemm at "Jun 29, 1999 01:18:23 am" To: peter@netplex.com.au (Peter Wemm) Date: Mon, 28 Jun 1999 21:26:55 +0200 (SAT) Cc: dillon@apollo.backplane.com (Matthew Dillon), alc@cs.rice.edu (Alan Cox), tlambert@primenet.com (Terry Lambert), bakul@torrentnet.com (Bakul Shah), julian@whistle.com, freebsd-smp@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > The 386, I doubt has it. There have been a couple of suggestions for ending > the support for the 386 as it will simplify some ugly code for emulating > kernel-mode write faults etc, but it's never happened. Apparently the > 386 is common in some areas still. The 386 is still used a lot in embeded systems. (With FreeBSD running on some of them. :-) John -- John Hay -- John.Hay@mikom.csir.co.za To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 12:47:42 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id C374A14FCB for ; Mon, 28 Jun 1999 12:47:40 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id MAA24287; Mon, 28 Jun 1999 12:47:25 -0700 (PDT) (envelope-from dillon) Date: Mon, 28 Jun 1999 12:47:25 -0700 (PDT) From: Matthew Dillon Message-Id: <199906281947.MAA24287@apollo.backplane.com> To: John Hay Cc: peter@netplex.com.au (Peter Wemm), alc@cs.rice.edu (Alan Cox), tlambert@primenet.com (Terry Lambert), bakul@torrentnet.com (Bakul Shah), julian@whistle.com, freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review References: <199906281926.VAA16677@zibbi.mikom.csir.co.za> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :> The 386, I doubt has it. There have been a couple of suggestions for ending :> the support for the 386 as it will simplify some ugly code for emulating :> kernel-mode write faults etc, but it's never happened. Apparently the :> 386 is common in some areas still. : :The 386 is still used a lot in embeded systems. (With FreeBSD running on :some of them. :-) : :John :-- :John Hay -- John.Hay@mikom.csir.co.za Since 386's are UP systems, I think it would be fairly easy to implement the UP version of the compare-and-exchange primitive trivially with an spl wrapper. We should be able to freely use use the cmpxchg instruction on SMP systems. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 12:57:27 1999 Delivered-To: freebsd-smp@freebsd.org Received: from zibbi.mikom.csir.co.za (zibbi.mikom.csir.co.za [146.64.24.58]) by hub.freebsd.org (Postfix) with ESMTP id 0D43115423 for ; Mon, 28 Jun 1999 12:55:40 -0700 (PDT) (envelope-from jhay@zibbi.mikom.csir.co.za) Received: (from jhay@localhost) by zibbi.mikom.csir.co.za (8.9.3/8.9.3) id VAA17112; Mon, 28 Jun 1999 21:55:31 +0200 (SAT) (envelope-from jhay) From: John Hay Message-Id: <199906281955.VAA17112@zibbi.mikom.csir.co.za> Subject: Re: high-efficiency SMP locks - submission for review In-Reply-To: <199906281947.MAA24287@apollo.backplane.com> from Matthew Dillon at "Jun 28, 1999 12:47:25 pm" To: dillon@apollo.backplane.com (Matthew Dillon) Date: Mon, 28 Jun 1999 21:55:30 +0200 (SAT) Cc: freebsd-smp@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > :> The 386, I doubt has it. There have been a couple of suggestions for ending > :> the support for the 386 as it will simplify some ugly code for emulating > :> kernel-mode write faults etc, but it's never happened. Apparently the > :> 386 is common in some areas still. > : > :The 386 is still used a lot in embeded systems. (With FreeBSD running on > :some of them. :-) > : > > Since 386's are UP systems, I think it would be fairly easy to implement > the UP version of the compare-and-exchange primitive trivially with an > spl wrapper. We should be able to freely use use the cmpxchg instruction > on SMP systems. > Yes, I understand that and my worry wasn't about the lock code, but it sounded like somebody was sharpening an ax for the 386 code and I just wanted to show that it was still used. :-) John -- John Hay -- John.Hay@mikom.csir.co.za To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 15:21: 1 1999 Delivered-To: freebsd-smp@freebsd.org Received: from noc.demon.net (server.noc.demon.net [193.195.224.4]) by hub.freebsd.org (Postfix) with ESMTP id 3EDDB14CE3 for ; Mon, 28 Jun 1999 15:20:57 -0700 (PDT) (envelope-from fanf@demon.net) Received: by noc.demon.net; id XAA01117; Mon, 28 Jun 1999 23:20:55 +0100 (BST) Received: from fanf.noc.demon.net(195.11.55.83) by inside.noc.demon.net via smap (3.2) id xma001098; Mon, 28 Jun 99 23:20:55 +0100 Received: from fanf by fanf.noc.demon.net with local (Exim 1.73 #2) id 10yjlh-0003xj-00; Mon, 28 Jun 1999 23:20:49 +0100 To: lkoeller@cc.fh-lippe.de From: Tony Finch Cc: smp@freebsd.org Subject: Re: New freeze with 3.2-RELEASE (SMP and audio)!! In-Reply-To: <199906270958.LAA91826@cc.fh-lippe.de> References: <199906262103.XAA00889@cc.fh-lippe.de> Message-Id: Date: Mon, 28 Jun 1999 23:20:49 +0100 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Lars =?iso-8859-1?Q?K=F6ller?= wrote: > >I hope I'm not too fast with my conclusions (normally I am :-), but >the instability seems to result from th missing option > >options FFS_ROOT #FFS usable as root device [keep this!] > >which is new in 3.2-RELEASE. Is this possible? No, it's been in -CURRENT / RELENG_3 since January 1998. Tony. -- f.a.n.finch dot@dotat.at fanf@demon.net Winner, International Obfuscated C Code Competition 1998 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 15:46:38 1999 Delivered-To: freebsd-smp@freebsd.org Received: from cimlogic.com.au (cimlog.lnk.telstra.net [139.130.51.31]) by hub.freebsd.org (Postfix) with ESMTP id 1F88114F8A for ; Mon, 28 Jun 1999 15:46:21 -0700 (PDT) (envelope-from jb@cimlogic.com.au) Received: (from jb@localhost) by cimlogic.com.au (8.9.1/8.9.1) id IAA10502; Tue, 29 Jun 1999 08:47:51 +1000 (EST) (envelope-from jb) From: John Birrell Message-Id: <199906282247.IAA10502@cimlogic.com.au> Subject: Re: high-efficiency SMP locks - submission for review In-Reply-To: <19990628171823.3445882@overcee.netplex.com.au> from Peter Wemm at "Jun 29, 1999 1:18:23 am" To: peter@netplex.com.au (Peter Wemm) Date: Tue, 29 Jun 1999 08:47:51 +1000 (EST) Cc: dillon@apollo.backplane.com, alc@cs.rice.edu, tlambert@primenet.com, bakul@torrentnet.com, julian@whistle.com, freebsd-smp@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL43 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Peter Wemm wrote: > Ahh, right, silly me. I was thinking of the 8-byte version which is signified > by the CX8 bit in cpuid. > > The 386, I doubt has it. There have been a couple of suggestions for ending > the support for the 386 as it will simplify some ugly code for emulating > kernel-mode write faults etc, but it's never happened. Apparently the > 386 is common in some areas still. Some of us use the 386EX as an embedded processor in low power (~2W) memory based applications. Dropping support for 386 (UP) from FreeBSD would fork a new *BSD! -- John Birrell - jb@cimlogic.com.au; jb@freebsd.org http://www.cimlogic.com.au/ CIMlogic Pty Ltd, GPO Box 117A, Melbourne Vic 3001, Australia +61 418 353 137 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 16:47: 6 1999 Delivered-To: freebsd-smp@freebsd.org Received: from poboxer.pobox.com (unknown [208.149.16.27]) by hub.freebsd.org (Postfix) with ESMTP id C66AB15483 for ; Mon, 28 Jun 1999 16:46:05 -0700 (PDT) (envelope-from alk@poboxer.pobox.com) Received: (from alk@localhost) by poboxer.pobox.com (8.9.3/8.9.1) id SAA03541; Mon, 28 Jun 1999 18:45:59 -0500 (CDT) (envelope-from alk) From: Anthony Kimball MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Mon, 28 Jun 1999 18:45:58 -0500 (CDT) X-Face: \h9Jg:Cuivl4S*UP-)gO.6O=T]]@ncM*tn4zG);)lk#4|lqEx=*talx?.Gk,dMQU2)ptPC17cpBzm(l'M|H8BUF1&]dDCxZ.c~Wy6-j,^V1E(NtX$FpkkdnJixsJHE95JlhO 5\M3jh'YiO7KPCn0~W`Ro44_TB@&JuuqRqgPL'0/{):7rU-%.*@/>q?1&Ed Reply-To: alk@pobox.com To: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review References: <19990628171823.3445882@overcee.netplex.com.au> <199906282247.IAA10502@cimlogic.com.au> X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <14200.2161.578451.967112@avalon.east> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Quoth John Birrell on Tue, 29 June: : : Some of us use the 386EX as an embedded processor in low power (~2W) : memory based applications. Dropping support for 386 (UP) from FreeBSD : would fork a new *BSD! Just to make explicit the obvious: That's why there's a "cpu I386_CPU" config option; there's no reason to retain 386 compatibility if it isn't present. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 17: 8:49 1999 Delivered-To: freebsd-smp@freebsd.org Received: from cimlogic.com.au (cimlog.lnk.telstra.net [139.130.51.31]) by hub.freebsd.org (Postfix) with ESMTP id 96D2615483 for ; Mon, 28 Jun 1999 17:08:44 -0700 (PDT) (envelope-from jb@cimlogic.com.au) Received: (from jb@localhost) by cimlogic.com.au (8.9.1/8.9.1) id KAA10915; Tue, 29 Jun 1999 10:11:10 +1000 (EST) (envelope-from jb) From: John Birrell Message-Id: <199906290011.KAA10915@cimlogic.com.au> Subject: Re: high-efficiency SMP locks - submission for review In-Reply-To: <14200.2161.578451.967112@avalon.east> from Anthony Kimball at "Jun 28, 1999 6:45:58 pm" To: alk@pobox.com Date: Tue, 29 Jun 1999 10:11:10 +1000 (EST) Cc: freebsd-smp@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL43 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Anthony Kimball wrote: > Quoth John Birrell on Tue, 29 June: > : > : Some of us use the 386EX as an embedded processor in low power (~2W) > : memory based applications. Dropping support for 386 (UP) from FreeBSD > : would fork a new *BSD! > > Just to make explicit the obvious: That's why there's a "cpu I386_CPU" > config option; there's no reason to retain 386 compatibility if it > isn't present. And that is (explicitly and exclusively!) used on my 386 target systems. The point I was trying to make, however, is that the 386 compatibility code must continue to exist in any changes to the kernel sources. -- John Birrell - jb@cimlogic.com.au; jb@freebsd.org http://www.cimlogic.com.au/ CIMlogic Pty Ltd, GPO Box 117A, Melbourne Vic 3001, Australia +61 418 353 137 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 17:48:24 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 37E4E14D0C for ; Mon, 28 Jun 1999 17:48:21 -0700 (PDT) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.8.8/8.8.8) id RAA13257; Mon, 28 Jun 1999 17:48:19 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp04.primenet.com, id smtpd013231; Mon Jun 28 17:48:15 1999 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id RAA07089; Mon, 28 Jun 1999 17:48:12 -0700 (MST) From: Terry Lambert Message-Id: <199906290048.RAA07089@usr05.primenet.com> Subject: Re: high-efficiency SMP locks - submission for review To: phk@critter.freebsd.dk (Poul-Henning Kamp) Date: Tue, 29 Jun 1999 00:48:12 +0000 (GMT) Cc: peter@netplex.com.au, dillon@apollo.backplane.com, alc@cs.rice.edu, tlambert@primenet.com, bakul@torrentnet.com, julian@whistle.com, freebsd-smp@FreeBSD.ORG In-Reply-To: <31902.930591499@critter.freebsd.dk> from "Poul-Henning Kamp" at Jun 28, 99 07:38:19 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >The 386, I doubt has it. There have been a couple of suggestions for ending > >the support for the 386 as it will simplify some ugly code for emulating > >kernel-mode write faults etc, but it's never happened. Apparently the > >386 is common in some areas still. > > People in the embedded business would kill us. No lie. 486GX macrocells are cheap, but 386 macrocells are cheaper. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 17:55:51 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 575A314D9C for ; Mon, 28 Jun 1999 17:55:48 -0700 (PDT) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id RAA28478; Mon, 28 Jun 1999 17:55:47 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp02.primenet.com, id smtpd028376; Mon Jun 28 17:55:38 1999 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id RAA07451; Mon, 28 Jun 1999 17:55:25 -0700 (MST) From: Terry Lambert Message-Id: <199906290055.RAA07451@usr05.primenet.com> Subject: Re: high-efficiency SMP locks - submission for review To: dillon@apollo.backplane.com (Matthew Dillon) Date: Tue, 29 Jun 1999 00:55:25 +0000 (GMT) Cc: jhay@mikom.csir.co.za, peter@netplex.com.au, alc@cs.rice.edu, tlambert@primenet.com, bakul@torrentnet.com, julian@whistle.com, freebsd-smp@FreeBSD.ORG In-Reply-To: <199906281947.MAA24287@apollo.backplane.com> from "Matthew Dillon" at Jun 28, 99 12:47:25 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Since 386's are UP systems, I think it would be fairly easy to implement > the UP version of the compare-and-exchange primitive trivially with an > spl wrapper. We should be able to freely use use the cmpxchg instruction > on SMP systems. Unless this was done at runtime, ala the bcopy code, I think that it would be a terrible idea to balkanize the systems that a generic kernel was capable of running on without recompilation. I think the locking mechanics for SMP are just as applicable to kernel preemption (aka one process Real Time or multiprocess "mushy" Real Time), and that that avenue should not be cut off for older systems. This is doubly true for older systems, in fact, which have a much higher tendency to show up in embedded controllers and other applications that require some small RT capability, than, say, 450MHz Xeon processors. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 18: 0:14 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 8F43614D9C for ; Mon, 28 Jun 1999 18:00:12 -0700 (PDT) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.8.8/8.8.8) id RAA17445; Mon, 28 Jun 1999 17:58:05 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp04.primenet.com, id smtpd017336; Mon Jun 28 17:57:52 1999 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id RAA07616; Mon, 28 Jun 1999 17:57:48 -0700 (MST) From: Terry Lambert Message-Id: <199906290057.RAA07616@usr05.primenet.com> Subject: Re: high-efficiency SMP locks - submission for review To: julian@whistle.com (Julian Elischer) Date: Tue, 29 Jun 1999 00:57:48 +0000 (GMT) Cc: dillon@apollo.backplane.com, peter@netplex.com.au, alc@cs.rice.edu, tlambert@primenet.com, bakul@torrentnet.com, freebsd-smp@freebsd.org In-Reply-To: from "Julian Elischer" at Jun 28, 99 09:16:44 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I'd say that we probably wouldn't support SMP on 386 and 486 processors.. > and in UP those locks that need atomicity would be optimised away. > > We WILL need locking in UP when we move to kernel threads, but that > doesn't require bus atomicity. No one is currently bothering with anything but the Intel MESI coherency model for SMP, anyway, so I don't understand the relevence of bus coherency to the argument. My only point is that the code needs to degrade gracefully (e.g. without rebuilding your kernel with a magic doohickey flipped on or off for no obvious reason). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 18:18: 5 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id 9E23514FCB for ; Mon, 28 Jun 1999 18:16:56 -0700 (PDT) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id SAA04456; Mon, 28 Jun 1999 18:16:55 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp03.primenet.com, id smtpd004443; Mon Jun 28 18:16:52 1999 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id SAA08367; Mon, 28 Jun 1999 18:16:51 -0700 (MST) From: Terry Lambert Message-Id: <199906290116.SAA08367@usr05.primenet.com> Subject: Re: high-efficiency SMP locks - submission for review To: dillon@apollo.backplane.com (Matthew Dillon) Date: Tue, 29 Jun 1999 01:16:51 +0000 (GMT) Cc: tlambert@primenet.com, freebsd-smp@FreeBSD.ORG In-Reply-To: <199906280742.AAA18132@apollo.backplane.com> from "Matthew Dillon" at Jun 28, 99 00:42:37 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > : In particular, the use of a single wait count means that you > : will not have an ordered list for the equivalent of a "wake > : one". This is a real problem, since it allows for a deadly > : embrace deadlock to occur when two kernel contexts each hold > : one lock and want to acquire the other contexts lock. > : > : I believe that resolving this as an EWOULDBLOCK, and then > > Well, I differ with you here. lockmgr() can't even handle that. Well, if you are going to throw out the baby, you might as well throw out the bathwater while you are at it. 8-). > Lock primitives that become too complex can no longer be categorized > as primitives. That is one of the biggest problems with lockmgr(), > in fact. OK. By this definitoin, I would categorize my suggestions as higher level ocking facilities, which should be built on primitives. The big issue to address in the primitives themselves is support for intention modes. > What should be done instead is to separate the functionality of the > more complex locking functions - such as those that deal with deadlock > situations - because these locking functions have a considerable amount > of overhead compared to lower level primitives. I disagree, but that's more of a where-and-what-you-lock argument than we need to get into right this second. > : Finally, there is insufficient mechanism to avoid competition > : starvation, where a write blocks indefinitely as multiple > : readers pass the resource between each other. > > Yes, this is an issue -- but the solution in lockmgr() has only led > to more esoteric deadlock situations and, I think, harmed performance > as much as it has helped it. lockmgr() is a hulking behemoth that does what it was intended to do well; unfortunately, it's being abused for all sorts of strange and not-so-wonderful uses, and it's not graceful about boundary conditions or error recovery. > That isn't to say that we can't implement > the same solution in qlocks, but the way I would do it would be by adding > a new function, not modifying existing primitives. For example, > qlock_wr() might not try to hold-off other shared locks, but > qlock_hipri_wr() could. OK. My main issue is that if there's going to be rearchitecting of the locking facilities, that it be done in a comprehensive fashion. This doesn't mean that every has to be implemented, but that it needs to be thought out in order to not _preclude_ everything from being implemented in the future. I think that there is a need for someone to get their head around the whole thing, and write it down, before it degrades into "change for the sake of change" when someone goes in and "optimizes" code that was written to intentionally allow work in future directions. I'm not accusing you of this (obviously; the code isn't even in yet!), but it's something that should be considered in detail before acting. > The biggest problem we face at the moment is that locks are being held > for much too long a period of time. For example, locks are being held > *through* I/O operations as a means of controlling access. This is > precisely the wrong way to use a lock. The lock should be used to > protect the data structure and then released. An I/O operation in > progress should set a "the data is being messed with" flag and then > release its lock, not attempt to hold the lock permanently. Or, as you > mentioned, intention locks can be used to separate the I/O op from other > types of operations. I think that this type of thing could be fixed without introducing the variable of new primatives to contend with at the same time. This is a song I've sung before, with regard to not introducing intential VM aliases until the unintnetional ones have been eliminated, etc., so I won't sing it longer than this paragraph. > Many of the shared/exclusive problems go away ( or at least go into > hiding ) when the locks are used properly. A bug hiding is worse than it shoving its cold nose into your butt unexpectedly while you are busy doing something else... I think that anything that might cause something to "go into hiding" is inherently evil. 8-(. > : I believe the following is the minimal set of structures > : required to resolve the blocking operation inheritance and > : deadlock detection: [ ... ] > Holy cow, that is very expensive. If I were to implement that sort > of locking subsystem it would be at a higher level, it would not > be a primitive. Agreed. However, I have experience with something similar from back in 1993/1994, and I can tell you this: 20,000 transactions a second on a 66MHz Pentium. > I tend to dislike complex locking solutions, but only because I > strongly believe that it is possible to avoid the necessity of > such by organizing code and algorithms properly. Me too; however, you are unlikely to be able to shove in the necessary rearchitecture to avoid holding context across function call boundaries. If you manage to somehow do this, I expect a crucifixtion or two... > As lockmgr() has > shown, trying to implement complex locking solutions in an SMP > environment can become *very* expensive. So expensive that the > complex locking solution winds up hurting performance more then > a simpler solution would help performance. The lockmgr() is, and always has been, too heavy weight for decent SMP performance. > You are right on the money here. This is precisely the problem > that the SMP implementation currently faces and is attempting > to solve with spl-aware simple locks. In fact, the way spl*() ops > work currently is very similar to obtaining multiple locks > simultaniously. Again, I really hate having to implement complex > solutions when it may be possible to obtain the same effect by > reorganizing the functions that require the locking in the first place. Complexity is a function of observation. For something like lazy task context creation for blocking, it's not really that complex. It's about the same order as a call conversion scheduler, or the use of multiple client contexts and a select loop. It's a matter of discipline (and documentation). I really have seen very few real situations in which threads were truly advantageous compared to implementing the code with implicit or explicit internal scheduling, for example. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 21:26:34 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 9B2F0151AE for ; Mon, 28 Jun 1999 21:26:31 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id VAA26171; Mon, 28 Jun 1999 21:26:28 -0700 (PDT) (envelope-from dillon) Date: Mon, 28 Jun 1999 21:26:28 -0700 (PDT) From: Matthew Dillon Message-Id: <199906290426.VAA26171@apollo.backplane.com> To: Terry Lambert Cc: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review References: <199906290116.SAA08367@usr05.primenet.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I think we are ultimately going to stick to the recursive-shared/ recursive-exclusive model for most of our buffer and VFS/BIO locking. This isn't set in stone, but it turns out that if we can turn the exclusive locks used for I/O ops into shared locks, we can get rid of nearly all the access blocking conditions. With exclusive locks relegated to non-blocking critical-path code (e.g. setting up for a write I/O, but not held during the actual write I/O), most of the parallelization and interlock problems magically go away which I think is really cool. There would then be no need for a more sophisticated (and more complex) intention locks, no need to manage locking chains for deadlock detection, and so forth. I think intention locks can still be used in places where the structural complexity warrants it... like vnode operations, for example. The only part of a vnode that the VM system really cares about is the vnode's file size, so that would be one intention lock. inode updates would be another intention lock, synchronization would be another, and create/destroy would be a another. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 21:29:13 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id B41A71508D for ; Mon, 28 Jun 1999 21:29:11 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id VAA26192; Mon, 28 Jun 1999 21:29:09 -0700 (PDT) (envelope-from dillon) Date: Mon, 28 Jun 1999 21:29:09 -0700 (PDT) From: Matthew Dillon Message-Id: <199906290429.VAA26192@apollo.backplane.com> To: Terry Lambert Cc: julian@whistle.com (Julian Elischer), peter@netplex.com.au, alc@cs.rice.edu, tlambert@primenet.com, bakul@torrentnet.com, freebsd-smp@freebsd.org Subject: Re: high-efficiency SMP locks - submission for review References: <199906290057.RAA07616@usr05.primenet.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :> and in UP those locks that need atomicity would be optimised away. :> :> We WILL need locking in UP when we move to kernel threads, but that :> doesn't require bus atomicity. : :No one is currently bothering with anything but the Intel MESI :coherency model for SMP, anyway, so I don't understand the :relevence of bus coherency to the argument. : :My only point is that the code needs to degrade gracefully (e.g. :without rebuilding your kernel with a magic doohickey flipped on :or off for no obvious reason). : : Terry Lambert : terry@lambert.org I'm pretty sure that we need bus coherency for general RMW instructions such as add, and, or, etc... Any given cpu will not take an interrupt in the middle of an instruction, but in an SMP environment I do not believe those instructions use indivisible cache-coherent bus cycles. Thus the assembly lock prefix is necesary. I am not absolutely sure of that, but I believe that to be the case for Intel. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jun 28 23:40:50 1999 Delivered-To: freebsd-smp@freebsd.org Received: from overcee.netplex.com.au (overcee.netplex.com.au [202.12.86.7]) by hub.freebsd.org (Postfix) with ESMTP id A9C19151C4 for ; Mon, 28 Jun 1999 23:40:22 -0700 (PDT) (envelope-from peter@netplex.com.au) Received: from netplex.com.au (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id 660BE82; Tue, 29 Jun 1999 14:40:11 +0800 (WST) (envelope-from peter@netplex.com.au) X-Mailer: exmh version 2.0.2 2/24/98 To: Matthew Dillon Cc: Terry Lambert , julian@whistle.com (Julian Elischer), alc@cs.rice.edu, bakul@torrentnet.com, freebsd-smp@freebsd.org Subject: Re: high-efficiency SMP locks - submission for review In-reply-to: Your message of "Mon, 28 Jun 1999 21:29:09 MST." <199906290429.VAA26192@apollo.backplane.com> Date: Tue, 29 Jun 1999 14:40:11 +0800 From: Peter Wemm Message-Id: <19990629064011.660BE82@overcee.netplex.com.au> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Matthew Dillon wrote: > > :> and in UP those locks that need atomicity would be optimised away. > :> > :> We WILL need locking in UP when we move to kernel threads, but that > :> doesn't require bus atomicity. > : > :No one is currently bothering with anything but the Intel MESI > :coherency model for SMP, anyway, so I don't understand the > :relevence of bus coherency to the argument. > : > :My only point is that the code needs to degrade gracefully (e.g. > :without rebuilding your kernel with a magic doohickey flipped on > :or off for no obvious reason). > : > : Terry Lambert > : terry@lambert.org > > I'm pretty sure that we need bus coherency for general RMW instructions > such as add, and, or, etc... Any given cpu will not take an interrupt > in the middle of an instruction, but in an SMP environment I do not > believe those instructions use indivisible cache-coherent bus cycles. > Thus the assembly lock prefix is necesary. I am not absolutely sure of > that, but I believe that to be the case for Intel. As I understand it, an Intel cpu won't take an interrupt except on an instruction boundary and the same goes for traps. However, I have memories of other cpus that could (I think) take faults mid-instruction etc. (68k family for example) Under SMP, you are correct. You have to do an explicit lock prefix to get an atomic read-modify-write cycle that no other cpu or bus master will interfere with. Some instructions are implicitly locked, xchg for example, and cannot have a lock prefix. (Remember the F00F bug anyone?) So, yes, you must do a 'lock; addl ....' etc if you want it to be coherent under SMP. > -Matt > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-smp" in the body of the message > > Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 1:19:13 1999 Delivered-To: freebsd-smp@freebsd.org Received: from cygnus.rush.net (cygnus.rush.net [209.45.245.133]) by hub.freebsd.org (Postfix) with ESMTP id 04941151DE for ; Tue, 29 Jun 1999 01:19:09 -0700 (PDT) (envelope-from bright@rush.net) Received: from localhost (bright@localhost) by cygnus.rush.net (8.9.3/8.9.3) with SMTP id EAA08488; Tue, 29 Jun 1999 04:20:19 -0400 (EDT) Date: Tue, 29 Jun 1999 03:20:16 -0500 (EST) From: Alfred Perlstein To: Terry Lambert Cc: Matthew Dillon , jhay@mikom.csir.co.za, peter@netplex.com.au, alc@cs.rice.edu, bakul@torrentnet.com, julian@whistle.com, freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review In-Reply-To: <199906290055.RAA07451@usr05.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 29 Jun 1999, Terry Lambert wrote: > > Since 386's are UP systems, I think it would be fairly easy to implement > > the UP version of the compare-and-exchange primitive trivially with an > > spl wrapper. We should be able to freely use use the cmpxchg instruction > > on SMP systems. > > Unless this was done at runtime, ala the bcopy code, I think that > it would be a terrible idea to balkanize the systems that a generic > kernel was capable of running on without recompilation. We have a nifty loader now, why not have it determine the CPU type and boot the appropriate kernel? It could easily been done in forth if the loader exported the CPU type in an enviorment flag... Doesn't Sun do this? Of course you would also need to have kldload use a sysctl to know where to grab klds from by default. Klds should also be branded to make sure only the appropriate ones were loaded on each step of the intel processor. Another question, why aren't some syscalls implemented like so: getpid would read the contents of a page mapped into the process' address space, ie, the kernel sharing info with processes through shared mappings? -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 10: 5: 6 1999 Delivered-To: freebsd-smp@freebsd.org Received: from wrath.cs.utah.edu (wrath.cs.utah.edu [155.99.198.100]) by hub.freebsd.org (Postfix) with ESMTP id 5FD1814BEF for ; Tue, 29 Jun 1999 10:05:03 -0700 (PDT) (envelope-from vanmaren@cs.utah.edu) Received: from zane.cs.utah.edu (zane.cs.utah.edu [155.99.212.93]) by wrath.cs.utah.edu (8.8.8/8.8.8) with ESMTP id LAA12220 for ; Tue, 29 Jun 1999 11:05:02 -0600 (MDT) From: Kevin Van maren Received: (from vanmaren@localhost) by zane.cs.utah.edu (8.9.1/8.9.1) id LAA20627 for freebsd-smp@FreeBSD.ORG; Tue, 29 Jun 1999 11:05:02 -0600 (MDT) (envelope-from vanmaren@cs.utah.edu) Date: Tue, 29 Jun 1999 11:05:02 -0600 (MDT) Message-Id: <199906291705.LAA20627@zane.cs.utah.edu> To: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I'm really glad to see that there is so much activity on the list! Just a quick summary (from my point of view): 1) Multi-threading the kernel will require locking, not just spl(). 2) Locking will likely slow down uni-processor systems. 3) Removing the BGL will allow more parallelism in the kernel for multiple system-bound applications under SMP. 4) It will be a LOT of work to re-write the kernel to be thread-safe. Changing the execution environment will violate a lot of assumptions. 5) Very few people both know enough about the kernel internals and multiprocessor/multi-threaded locking to do the job "right". 6) The method most likely to succeed will be evolutionary; there is simply too much code to change everything at once and get it all working. SMP will scale with a BGL as long as we minimize the system time. If system time with 4 CPUs is under 10%, the kernel is not the problem. Since it is not always practical to make the kernel fast enough to do that for many applications/workloads, we need to move enough out of the BGL so that we can get the necessary parallelism. Linux lost to NT because of the slow Linux protocol stack tested. The review said that a multi-threaded stack would have made a difference. However, Linux lost on the UP case too, so I doubt that's the only problem. The fact is, if the protocol stack was fast enough, it wouldn't need to be multi-threaded. I would like to see how FreeBSD does on that same test -- it has a much faster TCP/IP stack than Linux (especially using sendfile for the static pages!) Terry Lambert said: > No one is currently bothering with anything but the Intel MESI > coherency model for SMP, anyway, so I don't understand the > relevence of bus coherency to the argument. This is mostly true. Even on the IA64. Section 4.4.6.2 of the manual says that I-caches are not coherent with other I-caches or D-caches. But at least the D-caches are coherent (on my first glance, I thought they weren't either, which really worried me!) On the x86, you do need to lock the bus to guarantee operations are atomic, with the exception of xchg (but not the variants), which is guaranteed to be atomic. They also must be naturally-aligned. To clarify: there DO exist some dual-processor 486 systems. They use APICs, and in *theory* can run FreeBSD without too much difficulty (there is no mptable, the processor APICs are at different addresses, so you have to know which processor you are on to access the APIC, and the AP initialization is a little different). I don't think anyone cares enough to implement the support code, and Intel discontinued the parts necessary to build them, so it probably won't be too painful to break possible support for them. Alfred Perlstein said: > getpid would read the contents of a page mapped into the process' > address space, ie, the kernel sharing info with processes through > shared mappings? Because the cost of setting up the mappings, and the wasted page of memory, greatly exceeds the cost of a system call that is used at most (usually) once per process. The only time getpid() is called several times is during (bad) synthetic benchmarks. Having libc cache the value would be a more viable solution; it would have to trap fork() calls, however, to invalidate the stored pid. Compiling and storing 8 kernels for the loader to choose from sounds like a bad idea as well; it may be practical for CD-ROM installation, although I think it is more likely the user will select the right one. Kevin Van Maren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 10:49:20 1999 Delivered-To: freebsd-smp@freebsd.org Received: from uruk.org (unknown [209.180.166.90]) by hub.freebsd.org (Postfix) with ESMTP id CEE8314FBE for ; Tue, 29 Jun 1999 10:49:18 -0700 (PDT) (envelope-from erich@uruk.org) Received: from localhost ([127.0.0.1] helo=uruk.org) by uruk.org with esmtp (Exim 2.05 #1) id 10z21f-0002AZ-00; Tue, 29 Jun 1999 10:50:31 -0700 To: Kevin Van maren Cc: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review In-reply-to: Your message of "Tue, 29 Jun 1999 11:05:02 MDT." <199906291705.LAA20627@zane.cs.utah.edu> Date: Tue, 29 Jun 1999 10:50:31 -0700 From: Erich Boleyn Message-Id: Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Kevin Van maren wrote: > I'm really glad to see that there is so much activity on the list! Yeah... I'm tempted to de-lurk. ;) ... > Terry Lambert said: > > No one is currently bothering with anything but the Intel MESI > > coherency model for SMP, anyway, so I don't understand the > > relevence of bus coherency to the argument. > > This is mostly true. Even on the IA64. Section 4.4.6.2 of the manual > says that I-caches are not coherent with other I-caches or D-caches. > But at least the D-caches are coherent (on my first glance, I thought > they weren't either, which really worried me!) I- and D-caches are coherent on IA32. The instruction stream itself is guaranteed to be synchronized after jumps. Most other processor architectures (including IA64) require the use of special instructions to synchronize the I- and D-sides. > On the x86, you do need to lock the bus to guarantee operations > are atomic, with the exception of xchg (but not the variants), > which is guaranteed to be atomic. They also must be naturally-aligned. This is true for read/modify/write operations, but individual reads and writes are guaranteed to be atomic on IA32 as long as they are naturally aligned. (at least, for existing implementations) > To clarify: there DO exist some dual-processor 486 systems. They > use APICs, and in *theory* can run FreeBSD without too much difficulty > (there is no mptable, the processor APICs are at different addresses, > so you have to know which processor you are on to access the APIC, > and the AP initialization is a little different). I don't think > anyone cares enough to implement the support code, and Intel discontinued > the parts necessary to build them, so it probably won't be too painful > to break possible support for them. I'd be really surprised if anyone wanted to support MP 486 systems in FreeBSD as anything but a hobby. They were going out of style 3-4 years ago, and were expensive systems with special OS support. -- Erich Stefan Boleyn \_ Mad but Happy Scientist \__ http://www.uruk.org/ Motto: "I'll live forever or die trying" --------------------------- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 10:59:21 1999 Delivered-To: freebsd-smp@freebsd.org Received: from cygnus.rush.net (cygnus.rush.net [209.45.245.133]) by hub.freebsd.org (Postfix) with ESMTP id F396B14FC8 for ; Tue, 29 Jun 1999 10:59:13 -0700 (PDT) (envelope-from bright@rush.net) Received: from localhost (bright@localhost) by cygnus.rush.net (8.9.3/8.9.3) with SMTP id OAA23822; Tue, 29 Jun 1999 14:03:08 -0400 (EDT) Date: Tue, 29 Jun 1999 13:03:06 -0500 (EST) From: Alfred Perlstein To: Kevin Van maren Cc: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review In-Reply-To: <199906291705.LAA20627@zane.cs.utah.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 29 Jun 1999, Kevin Van maren wrote: > Alfred Perlstein said: > > getpid would read the contents of a page mapped into the process' > > address space, ie, the kernel sharing info with processes through > > shared mappings? > > Because the cost of setting up the mappings, and the wasted page > of memory, greatly exceeds the cost of a system call that is used > at most (usually) once per process. The only time getpid() is called > several times is during (bad) synthetic benchmarks. Having libc cache > the value would be a more viable solution; it would have to trap fork() > calls, however, to invalidate the stored pid. Well there are other things you could accomplish by this, since there are usermode test/set you can check for and manipulate signal stuff. The amount of help this would lend to user threads could really only be gauged by the authors of it, but I guess it would really help context switching... Getpid is a bad example and the idea should be thought out to see if it is worth it, however I just wanted to bring it up in case someone thought it would provide a super fast API to certain kernel structures. Also the mappings could be lazy, basically only calls to functions that use the shared API would map them in on first call.. Just an idea, implementing it solely for getpid() would be dumb, but maybe there are cases where it's a big win? > Compiling and storing 8 kernels for the loader to choose from sounds > like a bad idea as well; it may be practical for CD-ROM installation, > although I think it is more likely the user will select the right one. That's what I meant, for the first install, also the ability to see if the motherboard is smp to automagically boot an smp kernel so some benchmark weenie can't "accidentally" test against an UP kernel on a SMP system... It would also help with installs so that people doing installs that are happy with GENERIC would still get SMP. There is no need to provide a kernel for each i386 step, just the following cases: 386 (slow copyin/copyout), 486-pII(optimized copyin/out), SMP (smp enabled and pentium optimizations enabled) For some reason I think it would be just kinda cool if the loader checked the /kernel for some sort of signature (it's elf) if not it attempts to load it as a boot script to select the kernel to boot via variables exported by the loader. -Alfred Perlstein - [bright@rush.net|bright@wintelcom.net] systems administrator and programmer Win Telecom - http://www.wintelcom.net/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 11:24: 9 1999 Delivered-To: freebsd-smp@freebsd.org Received: from par28.ma.ikos.com (par28.ma.ikos.com [137.103.105.228]) by hub.freebsd.org (Postfix) with ESMTP id 583B314CD3 for ; Tue, 29 Jun 1999 11:24:06 -0700 (PDT) (envelope-from tich@par28.ma.ikos.com) Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by par28.ma.ikos.com (8.8.7/8.8.7) id OAA21260; Tue, 29 Jun 1999 14:23:31 -0400 From: Richard Cownie To: Kevin Van maren , freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review Date: Tue, 29 Jun 1999 13:44:36 -0400 X-Mailer: KMail [version 1.1.0] Content-Type: text/plain References: <199906291705.LAA20627@zane.cs.utah.edu> MIME-Version: 1.0 Message-Id: <99062914233100.20670@par28.ma.ikos.com> Content-Transfer-Encoding: 8bit X-KMail-Mark: Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 29 Jun 1999, Kevin Van maren wrote: > On the x86, you do need to lock the bus to guarantee operations > are atomic, with the exception of xchg (but not the variants), > which is guaranteed to be atomic. They also must be naturally-aligned. No, you can have a non-aligned locked access - there's a bunch of complex and ugly stuff ("split locks", the SPLCK# bus signal) in the P6 bus protocol to support this. But don't do it if you can possibly avoid it - it's inefficient, and since it exercises arcane features of the hardware, it could be buggy. It's also a little inaccurate to talk of "locking the bus" for these instructions. If the memory region is cacheable, the atomic access is implemented by locking the line in the cache until the read-modify-write is completed - this doesn't require any locked transactions on the P6 bus. So in general it doesn't have much/any performance penalty (try timing it if you doubt this). In general, the hardware implementation is now so complex that you shouldn't think about it too much; the instructions with a LOCK prefix (or the XCHG instruction) will give you an atomic read-modify-write, other instructions won't necessarily be atomic. Note in particular that CMPXCHG is not atomic, you need LOCK CMPXCHG - I wasted a couple of weeks with that bug in my own code. A non-atomic CMPXCHG seems like a particularly useless instruction ... LOCK CMPXCHG is great though, you can use it to synthesize an arbitrarily complex atomic update like this: retry: oldval = *lockp; // normal read of lock variable newval = SomeFunc(oldval); // arbitrarily complex function LOCK CMPXCHG // if (*lockp == oldval) *lockp = newval; else goto retry; Also the claim that x86 takes interrupts only at instruction boundaries is only half-true - I believe you can take an interrupt in the middle of a string instruction, it will leave the registers in a suitable state so that restarting the string instruction will resume where it left off. For the OS, the big question is whether you have to look at any special state or do any special fixup to get back to the correct state when resuming after an interrupt, and for the x86 the answer is no. I've never used the 68000, but I believe it's screwed up in this respect, hence early 68000 workstations had 2 cpu's, one to run programs and the other just to handle the page faults, because you couldn't safely resume after the fault ... Richard Cownie (tich@ma.ikos.com) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 12: 8:35 1999 Delivered-To: freebsd-smp@freebsd.org Received: from fast.cs.utah.edu (fast.cs.utah.edu [155.99.212.1]) by hub.freebsd.org (Postfix) with ESMTP id 3EC2D14BFC for ; Tue, 29 Jun 1999 12:08:29 -0700 (PDT) (envelope-from vanmaren@fast.cs.utah.edu) Received: (from vanmaren@localhost) by fast.cs.utah.edu (8.9.1/8.9.1) id NAA08028; Tue, 29 Jun 1999 13:08:28 -0600 (MDT) Date: Tue, 29 Jun 1999 13:08:28 -0600 (MDT) From: Kevin Van Maren Message-Id: <199906291908.NAA08028@fast.cs.utah.edu> To: freebsd-smp@FreeBSD.ORG, tich@ma.ikos.com, vanmaren@cs.utah.edu Subject: Re: high-efficiency SMP locks - submission for review Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > No, you can have a non-aligned locked access - there's a bunch of > complex and ugly stuff ("split locks", the SPLCK# bus signal) in > the P6 bus protocol to support this. But don't do it if you can > possibly avoid it - it's inefficient, and since it exercises arcane > features of the hardware, it could be buggy. My understanding is that it is only guaranteed to be atomic for the processor family if it is naturally-aligned, although current processors ALSO provide atomic operatings for unaligned accesses. My point is that we should not rely on unaligned accesses being atomic, as per intel documentation. > Also the claim that x86 takes interrupts only at instruction boundaries is > only half-true - I believe you can take an interrupt in the middle of a > string instruction, it will leave the registers in a suitable state so that > restarting the string instruction will resume where it left off. Yes, because the string instructions are not a single instrucition. It looks like one, but it is really a micro-coded loop. This can be interrupted after each loop iteration (decrement/increment and store/copy are done atomically). This wasn't always the case: I believe on the 8088, it was non- interruptable, however, they had to fix it to deal with crossing page boundaries on the i386. Kevin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 12: 8:50 1999 Delivered-To: freebsd-smp@freebsd.org Received: from uruk.org (unknown [209.180.166.90]) by hub.freebsd.org (Postfix) with ESMTP id 1E40B15456 for ; Tue, 29 Jun 1999 12:08:43 -0700 (PDT) (envelope-from erich@uruk.org) Received: from localhost ([127.0.0.1] helo=uruk.org) by uruk.org with esmtp (Exim 2.05 #1) id 10z3GU-0002Fv-00; Tue, 29 Jun 1999 12:09:54 -0700 To: Richard Cownie Cc: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review In-reply-to: Your message of "Tue, 29 Jun 1999 13:44:36 EDT." <99062914233100.20670@par28.ma.ikos.com> Date: Tue, 29 Jun 1999 12:09:53 -0700 From: Erich Boleyn Message-Id: Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Richard Cownie wrote: > On Tue, 29 Jun 1999, Kevin Van maren wrote: > > On the x86, you do need to lock the bus to guarantee operations > > are atomic, with the exception of xchg (but not the variants), > > which is guaranteed to be atomic. They also must be naturally-aligned. > > No, you can have a non-aligned locked access - there's a bunch of > complex and ugly stuff ("split locks", the SPLCK# bus signal) in > the P6 bus protocol to support this. But don't do it if you can > possibly avoid it - it's inefficient, and since it exercises arcane > features of the hardware, it could be buggy. Whops, you're right. This is sufficiently slow that I avoid it like the plague in code and tend to forget it's supported architecturally. But, as mentioned in the other email, the other advantage of natural alignment is that for just reads and writes you don't have to lock at all to be MP-safe. -- Erich Stefan Boleyn \_ Mad but Happy Scientist \__ http://www.uruk.org/ Motto: "I'll live forever or die trying" --------------------------- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 12:15:59 1999 Delivered-To: freebsd-smp@freebsd.org Received: from uruk.org (unknown [209.180.166.90]) by hub.freebsd.org (Postfix) with ESMTP id 256D715236 for ; Tue, 29 Jun 1999 12:15:56 -0700 (PDT) (envelope-from erich@uruk.org) Received: from localhost ([127.0.0.1] helo=uruk.org) by uruk.org with esmtp (Exim 2.05 #1) id 10z3NU-0002Gf-00; Tue, 29 Jun 1999 12:17:08 -0700 To: Kevin Van Maren Cc: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review In-reply-to: Your message of "Tue, 29 Jun 1999 13:08:28 MDT." <199906291908.NAA08028@fast.cs.utah.edu> Date: Tue, 29 Jun 1999 12:17:08 -0700 From: Erich Boleyn Message-Id: Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Kevin Van Maren wrote: > [from Richard Cownie]: > > No, you can have a non-aligned locked access - there's a bunch of > > complex and ugly stuff ("split locks", the SPLCK# bus signal) in > > the P6 bus protocol to support this. But don't do it if you can > > possibly avoid it - it's inefficient, and since it exercises arcane > > features of the hardware, it could be buggy. > > My understanding is that it is only guaranteed to be atomic for the > processor family if it is naturally-aligned, although current > processors ALSO provide atomic operatings for unaligned accesses. > My point is that we should not rely on unaligned accesses being > atomic, as per intel documentation. It is guaranteed, and is in the Intel documentation. But as mentioned, you would really prefer locking on aligned addresses... ;) -- Erich Stefan Boleyn \_ Mad but Happy Scientist \__ http://www.uruk.org/ Motto: "I'll live forever or die trying" --------------------------- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 12:50:16 1999 Delivered-To: freebsd-smp@freebsd.org Received: from gatekeeper.ra.pae.osd.mil (gatekeeper.ra.pae.osd.mil [134.152.113.157]) by hub.freebsd.org (Postfix) with ESMTP id 6A0F114BCD for ; Tue, 29 Jun 1999 12:50:11 -0700 (PDT) (envelope-from patton@sysnet.net) Received: (from smtpd@localhost) by gatekeeper.ra.pae.osd.mil (8.9.1a/8.9.2) id PAA10917; Tue, 29 Jun 1999 15:34:14 -0400 (EDT) Received: from monsoon.ra.pae.osd.mil(192.168.100.10), claiming to be "monsoon" via SMTP by services.ra.pae.osd.mil, id smtpdtX5878; Tue Jun 29 15:34:11 1999 Message-Id: <3.0.5.32.19990629155648.009b3750@mail.sysnet.net> X-Sender: patton@mail.sysnet.net X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.5 (32) Date: Tue, 29 Jun 1999 15:56:48 -0400 To: Kevin Van maren From: Matthew Patton Subject: Linux vs Win issue [was: Re: high-efficiency SMP locks - submission for review] Cc: freebsd-smp@freebsd.org In-Reply-To: <199906291705.LAA20627@zane.cs.utah.edu> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org One point that people are missing on the "linux threaded stack" issue is that it's not really the stack at all!! The machines were tested with 2 NICs on the same subnet. The trick NT does is bind each NIC's IRQ to a specific processor. Thus the processing of traffic is more efficient. In the Unix world, putting 2 nics on the same subnet is not exactly kosher and in any event, the Linux kernel doesn't pull the same tricks. Equip both machines with just a SINGLE ethernet card and the results may have been rather different. ===== Need a secure, robust, open-source, multi-platform (9 architectures) OS? Try OpenBSD. The others simply can't compete. Matthew Patton, 1LT USAF Webmaster, Resource Analysis PGP Fingerprint: 17D4 98B1 51F1 BCD9 D815 5F3D 3B1C 5C26 762C C9C9 Key ID: 0x762CC9C9 Expires: 7/31/99 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 13:19:51 1999 Delivered-To: freebsd-smp@freebsd.org Received: from cygnus.rush.net (cygnus.rush.net [209.45.245.133]) by hub.freebsd.org (Postfix) with ESMTP id D4CFE14D73 for ; Tue, 29 Jun 1999 13:19:42 -0700 (PDT) (envelope-from bright@rush.net) Received: from localhost (bright@localhost) by cygnus.rush.net (8.9.3/8.9.3) with SMTP id QAA26446; Tue, 29 Jun 1999 16:23:35 -0400 (EDT) Date: Tue, 29 Jun 1999 15:23:34 -0500 (EST) From: Alfred Perlstein To: Kevin Van Maren Cc: freebsd-smp@FreeBSD.ORG, tich@ma.ikos.com, vanmaren@cs.utah.edu Subject: Re: high-efficiency SMP locks - submission for review In-Reply-To: <199906291908.NAA08028@fast.cs.utah.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 29 Jun 1999, Kevin Van Maren wrote: re: rep prefix on x86 string instructions... > This wasn't always the case: I believe on the 8088, it was non- > interruptable, however, they had to fix it to deal with crossing > page boundaries on the i386. early x86 (i think 8088) wouldn't properly save the state of the string instrucions during an interupt, just some useless info i remember from "PC Intern" A resume wouldn't continue the op. -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 13:24:29 1999 Delivered-To: freebsd-smp@freebsd.org Received: from par28.ma.ikos.com (par28.ma.ikos.com [137.103.105.228]) by hub.freebsd.org (Postfix) with ESMTP id 9B63B15111 for ; Tue, 29 Jun 1999 13:24:12 -0700 (PDT) (envelope-from tich@par28.ma.ikos.com) Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by par28.ma.ikos.com (8.8.7/8.8.7) id QAA23577; Tue, 29 Jun 1999 16:23:46 -0400 From: Richard Cownie To: Kevin Van Maren , freebsd-smp@FreeBSD.ORG, tich@ma.ikos.com, vanmaren@cs.utah.edu Subject: Re: high-efficiency SMP locks - submission for review Date: Tue, 29 Jun 1999 16:05:57 -0400 X-Mailer: KMail [version 1.1.0] Content-Type: text/plain References: <199906291908.NAA08028@fast.cs.utah.edu> MIME-Version: 1.0 Message-Id: <99062916234600.22336@par28.ma.ikos.com> Content-Transfer-Encoding: 8bit X-KMail-Mark: Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 29 Jun 1999, Kevin Van Maren wrote: > > No, you can have a non-aligned locked access - there's a bunch of > > complex and ugly stuff ("split locks", the SPLCK# bus signal) in > > the P6 bus protocol to support this. But don't do it if you can > > possibly avoid it - it's inefficient, and since it exercises arcane > > features of the hardware, it could be buggy. > > My understanding is that it is only guaranteed to be atomic for the > processor family if it is naturally-aligned, although current > processors ALSO provide atomic operatings for unaligned accesses. > My point is that we should not rely on unaligned accesses being > atomic, as per intel documentation. I've never really been convinced that there's a rigorous definition of the x86 family architecture, as distinct from what the particular implementations happen to do. Or where there *is* a specification, Intel feels quite free to violate it (e.g. the MPS spec says that you can mix different CPU's - but the truth for the P6/PentiumII/PentiumIII is that all the CPU's in an SMP system need to be the exact same stepping). On the other hand, there's so much weird software out there that Intel will probably never allow themselves to remove any feature that might possibly be used in some application somewhere. So I'd bet that future cpu's will continue to support atomic misaligned read-modify-write. However, it doesn't seem that there's any good reason to use this particular feature. It's easy to get stuff aligned, and much more efficient. Richard Cownie To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 14:15:59 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (Postfix) with ESMTP id B37E814BCD for ; Tue, 29 Jun 1999 14:15:54 -0700 (PDT) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id OAA16292; Tue, 29 Jun 1999 14:15:54 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp01.primenet.com, id smtpd016266; Tue Jun 29 14:15:48 1999 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id OAA26919; Tue, 29 Jun 1999 14:15:47 -0700 (MST) From: Terry Lambert Message-Id: <199906292115.OAA26919@usr08.primenet.com> Subject: Re: high-efficiency SMP locks - submission for review To: vanmaren@cs.utah.edu (Kevin Van maren) Date: Tue, 29 Jun 1999 21:15:47 +0000 (GMT) Cc: freebsd-smp@FreeBSD.ORG In-Reply-To: <199906291705.LAA20627@zane.cs.utah.edu> from "Kevin Van maren" at Jun 29, 99 11:05:02 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I'm really glad to see that there is so much activity on the list! > > Just a quick summary (from my point of view): > 1) Multi-threading the kernel will require locking, not just spl(). > 2) Locking will likely slow down uni-processor systems. Locking in combination with multithreading should actually speed up uniprocessor systems. The SVR4.2 (UnixWare 2.0) release was ~15% faster on UP, *with* locking, due to the fact that a lot of th UP code benefitted from deserialization of multiple simultaneous kernel operations. > 3) Removing the BGL will allow more parallelism in the kernel > for multiple system-bound applications under SMP. And for UP systems, wich may wish to have multiple asynchronous operations outstanding simultaneously. Admittedly, the POSIX implementation of this leaves a lot to be desired (specifically, POSIX only deigned to "lower" itself to making asynchronous versions of a subset of file I/O related system calls). > 4) It will be a LOT of work to re-write the kernel to be thread-safe. > Changing the execution environment will violate a lot of assumptions. Yep. > 5) Very few people both know enough about the kernel internals and > multiprocessor/multi-threaded locking to do the job "right". I don't think that's true. I count at least 5 people people in the FreeBSD camp, not including myself (Simon Shapiro, Bakul Shah, John Dyson, et. al.), just off the top of my head (e.g. if your name belong here and I didn't put you here, I'm not snubbing you). > 6) The method most likely to succeed will be evolutionary; there is > simply too much code to change everything at once and get it all working. I doubt this. It's unlikely to be possible to achieve Solaris or even NT level SMP performance without a willingness to rewire the guts of things. It's possible to break the tasks down by subsystem, but after you do that, there's a limit to how divisible the tasks are. > SMP will scale with a BGL as long as we minimize the system time. > If system time with 4 CPUs is under 10%, the kernel is not the problem. > Since it is not always practical to make the kernel fast enough to > do that for many applications/workloads, we need to move enough out > of the BGL so that we can get the necessary parallelism. You need to distinguish "system time" from "active system time", as a first approximation. Things that are sleeping are acounted system time from both areas. The only one that really matters is active system time. I think a good approach would be to divorce the idea of user space process context (user stack and memory map) and kernel space process context (kernel stack per kernel entry, user space memory map, sleep context) as a first run. Once we have the idea that the things that run in the kernel aren't the same as the things in user space, then kernel work can be scheduled sepeerately from user space work to achive parallelism wins (e.g. your async read request is serviced by CPU 2 while your program continues to run on CPU 0). > Linux lost to NT because of the slow Linux protocol stack tested. This was their presumption. > The review said that a multi-threaded stack would have made a difference. > However, Linux lost on the UP case too, so I doubt that's the only problem. > The fact is, if the protocol stack was fast enough, it wouldn't need > to be multi-threaded. I would like to see how FreeBSD does on > that same test -- it has a much faster TCP/IP stack than Linux > (especially using sendfile for the static pages!) FreeBSD did worse than Linux, both SMP and UP. > On the x86, you do need to lock the bus to guarantee operations > are atomic, with the exception of xchg (but not the variants), > which is guaranteed to be atomic. They also must be naturally-aligned. You lock the bus to mutex the soft lock on objects. In general, it's a good idea to use xchg for this, and not get too complicated with the soft lock access. The soft locks themselves can be as complicated as necessary to support the architecture. For a mostly unchanged FreeBSD kernel, this would mean, minimally, intention mode shared/exclusive multiple reader single write locks with reader draining and queueing after writer pending. > To clarify: there DO exist some dual-processor 486 systems. They > use APICs, and in *theory* can run FreeBSD without too much difficulty > (there is no mptable, the processor APICs are at different addresses, > so you have to know which processor you are on to access the APIC, > and the AP initialization is a little different). I don't think > anyone cares enough to implement the support code, and Intel discontinued > the parts necessary to build them, so it probably won't be too painful > to break possible support for them. I don't think Intel discontinued production of external APIC's. They are useful in embedded systems for non-Intel coprocessors. 8-). The most interesting place is SMP SPARC, but I think that if someone wants to port FreeBSD to their Sequent box, it should be possible (e.g. don't architect against it). Likewise, the BeBox, which uses an MEI coherency model based on removing the L2 cache chips and replacing them with specific arbitration logic. > Because the cost of setting up the mappings, and the wasted page > of memory, greatly exceeds the cost of a system call that is used > at most (usually) once per process. The only time getpid() is called > several times is during (bad) synthetic benchmarks. Having libc cache > the value would be a more viable solution; it would have to trap fork() > calls, however, to invalidate the stored pid. Several libc implementations do exactly this... > Compiling and storing 8 kernels for the loader to choose from sounds > like a bad idea as well; it may be practical for CD-ROM installation, > although I think it is more likely the user will select the right one. I definitely agree. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 14:23: 1 1999 Delivered-To: freebsd-smp@freebsd.org Received: from ns.mt.sri.com (unknown [206.127.79.91]) by hub.freebsd.org (Postfix) with ESMTP id 4C7A315111 for ; Tue, 29 Jun 1999 14:22:58 -0700 (PDT) (envelope-from nate@mt.sri.com) Received: from mt.sri.com (rocky.mt.sri.com [206.127.76.100]) by ns.mt.sri.com (8.8.8/8.8.8) with SMTP id PAA25199; Tue, 29 Jun 1999 15:22:56 -0600 (MDT) (envelope-from nate@rocky.mt.sri.com) Received: by mt.sri.com (SMI-8.6/SMI-SVR4) id PAA09761; Tue, 29 Jun 1999 15:22:56 -0600 Date: Tue, 29 Jun 1999 15:22:56 -0600 Message-Id: <199906292122.PAA09761@mt.sri.com> From: Nate Williams MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: Terry Lambert Cc: vanmaren@cs.utah.edu (Kevin Van maren), freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review In-Reply-To: <199906292115.OAA26919@usr08.primenet.com> References: <199906291705.LAA20627@zane.cs.utah.edu> <199906292115.OAA26919@usr08.primenet.com> X-Mailer: VM 6.34 under 19.16 "Lille" XEmacs Lucid Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Just a quick summary (from my point of view): > > 1) Multi-threading the kernel will require locking, not just spl(). > > 2) Locking will likely slow down uni-processor systems. > > Locking in combination with multithreading should actually speed > up uniprocessor systems. > > The SVR4.2 (UnixWare 2.0) release was ~15% faster on UP, *with* > locking, due to the fact that a lot of th UP code benefitted > from deserialization of multiple simultaneous kernel operations. I'm in extreme doubt that any of this was due to locking, and was probably due to re-coding some of the less effecient algorithms and or better design. Re-design can cause this, but locking is *never* an optimization, just a necessary evil. Re-design your system to do a multi-threaded design, but the 'simple' solution of adding locking is in no way going to speed up the kernel. And, I don't see FreeBSD doing a completely new kernel design anytime in near future, that's what John Dyson's G2 kernel stuff is supposed to be. :) Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 14:44:24 1999 Delivered-To: freebsd-smp@freebsd.org Received: from fast.cs.utah.edu (fast.cs.utah.edu [155.99.212.1]) by hub.freebsd.org (Postfix) with ESMTP id 7464D15370 for ; Tue, 29 Jun 1999 14:44:16 -0700 (PDT) (envelope-from vanmaren@fast.cs.utah.edu) Received: (from vanmaren@localhost) by fast.cs.utah.edu (8.9.1/8.9.1) id PAA27790; Tue, 29 Jun 1999 15:44:16 -0600 (MDT) Date: Tue, 29 Jun 1999 15:44:16 -0600 (MDT) From: Kevin Van Maren Message-Id: <199906292144.PAA27790@fast.cs.utah.edu> To: tlambert@primenet.com, vanmaren@cs.utah.edu Subject: Re: high-efficiency SMP locks - submission for review Cc: freebsd-smp@FreeBSD.ORG Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > 5) Very few people both know enough about the kernel internals and > > multiprocessor/multi-threaded locking to do the job "right". > > I don't think that's true. I count at least 5 people people in > the FreeBSD camp, not including myself (Simon Shapiro, Bakul Shah, > John Dyson, et. al.), just off the top of my head (e.g. if your > name belong here and I didn't put you here, I'm not snubbing you). You are agreeing with me! I was implying the number was small, not zero. There are maybe 5 to 50, not 500-5000. Perhaps a dozen people are really capable of doing the work at this point; the question is how many of the handful have the time and motivation to work on it? This is the downside of an "open source" OS: big projects that take a long time are less likely to be fully staffed. Sun could afford 10 man-years (or whatever it was) to multi-thread solaris/SVR4. We probably can't... Still, a few man-months can get a lot done, if the right person is working on it. > > 6) The method most likely to succeed will be evolutionary; there is > > simply too much code to change everything at once and get it all working. > > I doubt this. It's unlikely to be possible to achieve Solaris > or even NT level SMP performance without a willingness to rewire > the guts of things. > > It's possible to break the tasks down by subsystem, but after you > do that, there's a limit to how divisible the tasks are. I agree with all this. I am saying that starting from where we are now, and going straight into a fully-preemptable, multi-threaded kernel with fine-grained locking isn't likely to be the most fruitful approach in the short term. I also recall hearing that Solaris was slower on a uniprocessor than FreeBSD, partly due to the locking/synchronization in the kernel. > I think a good approach would be to divorce the idea of user space > process context (user stack and memory map) and kernel space > process context (kernel stack per kernel entry, user space memory > map, sleep context) as a first run. Once we have the idea that > the things that run in the kernel aren't the same as the things in > user space, then kernel work can be scheduled sepeerately from user > space work to achive parallelism wins (e.g. your async read request > is serviced by CPU 2 while your program continues to run on CPU 0). That is an interesting idea. Add multiple kernel stacks per user process, and you have threading ;-) > FreeBSD did worse than Linux, both SMP and UP. I didn't see any FreeBSD numbers; I'll have to go look again. It wasn't that long ago FreeBSD was beating the pants off Linux. I guess we've been standing too still for too long. Kevin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 15:19:43 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (Postfix) with ESMTP id CCEF514C10 for ; Tue, 29 Jun 1999 15:19:40 -0700 (PDT) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id PAA20560; Tue, 29 Jun 1999 15:19:39 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp01.primenet.com, id smtpd020393; Tue Jun 29 15:19:27 1999 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id PAA00305; Tue, 29 Jun 1999 15:19:23 -0700 (MST) From: Terry Lambert Message-Id: <199906292219.PAA00305@usr08.primenet.com> Subject: Re: high-efficiency SMP locks - submission for review To: vanmaren@fast.cs.utah.edu (Kevin Van Maren) Date: Tue, 29 Jun 1999 22:19:22 +0000 (GMT) Cc: tlambert@primenet.com, vanmaren@cs.utah.edu, freebsd-smp@FreeBSD.ORG In-Reply-To: <199906292144.PAA27790@fast.cs.utah.edu> from "Kevin Van Maren" at Jun 29, 99 03:44:16 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I also recall hearing that Solaris was slower on a uniprocessor > than FreeBSD, partly due to the locking/synchronization in the kernel. I credit this to their VFS reentrancy model, which I think is The Wrong Way To Do It. I think that locking objects instead of locking entrancy to the small bits of code that modify such objects is probably the culprit. When you lock object with fine granularity, you aren't saying enough about how long you hold the locks. The object lock cost is much higher than necessary or desirable, IMO. > > FreeBSD did worse than Linux, both SMP and UP. > > I didn't see any FreeBSD numbers; I'll have to go look again. The FreeBSD numbers are from Ziff-Davis via Mike Smith. > It wasn't that long ago FreeBSD was beating the pants off Linux. > I guess we've been standing too still for too long. Time for FreeBSD to wake up... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 18:40:43 1999 Delivered-To: freebsd-smp@freebsd.org Received: from noop.colo.erols.net (noop.colo.erols.net [207.96.1.150]) by hub.freebsd.org (Postfix) with ESMTP id 6F426150AA for ; Tue, 29 Jun 1999 18:40:39 -0700 (PDT) (envelope-from gjp@noop.colo.erols.net) Received: from localhost ([127.0.0.1] helo=noop.colo.erols.net) by noop.colo.erols.net with esmtp (Exim 2.12 #1) id 10z9N3-000A2a-00; Tue, 29 Jun 1999 21:41:05 -0400 To: Alfred Perlstein Cc: freebsd-smp@FreeBSD.ORG From: "Gary Palmer" Subject: Re: high-efficiency SMP locks - submission for review In-reply-to: Your message of "Tue, 29 Jun 1999 03:20:16 CDT." Date: Tue, 29 Jun 1999 21:41:01 -0400 Message-ID: <38599.930706861@noop.colo.erols.net> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org [ CC Trimmed ] Alfred Perlstein wrote in message ID : > We have a nifty loader now, why not have it determine the CPU > type and boot the appropriate kernel? It could easily been done > in forth if the loader exported the CPU type in an enviorment > flag... > > Doesn't Sun do this? Nope. Suns base `kernel' image (/kernel/genunix, from memory) has all the threading and locking primitives in it, as well as some syscall handlers (I forget which). Everything dynamically loaded after that is the same across most platforms, with only machine-dependant drivers really changing between systems. The boot loader differs between hardware platforms, but thats mostly an instruction set issue I believe. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 21:41:53 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 9C45B154B3 for ; Tue, 29 Jun 1999 21:41:42 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id VAA34206; Tue, 29 Jun 1999 21:41:40 -0700 (PDT) (envelope-from dillon) Date: Tue, 29 Jun 1999 21:41:40 -0700 (PDT) From: Matthew Dillon Message-Id: <199906300441.VAA34206@apollo.backplane.com> To: Kevin Van maren Cc: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review References: <199906291705.LAA20627@zane.cs.utah.edu> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :I'm really glad to see that there is so much activity on the list! : :Just a quick summary (from my point of view): :1) Multi-threading the kernel will require locking, not just spl(). :2) Locking will likely slow down uni-processor systems. :3) Removing the BGL will allow more parallelism in the kernel :for multiple system-bound applications under SMP. :4) It will be a LOT of work to re-write the kernel to be thread-safe. :Changing the execution environment will violate a lot of assumptions. :5) Very few people both know enough about the kernel internals and :multiprocessor/multi-threaded locking to do the job "right". :6) The method most likely to succeed will be evolutionary; there is :simply too much code to change everything at once and get it all working. : :SMP will scale with a BGL as long as we minimize the system time. :If system time with 4 CPUs is under 10%, the kernel is not the problem. :... :Kevin Van Maren I think we have a real good chance of moving the kernel onto an SMP track. BGL creates serious limitations on performance. Linux is able to outperform FreeBSD in critical areas because they have moved their tcp stack and buffer copying code outside their BGL. We can do much better. The work done by the BSDi folks has shown that moving interrupts into kernel threads is considerably less difficult then we had previously thought. John Dyson has shown that it is possible to release the BGL in certain areas of the code, such as in uiomove(), though at the moment it doesn't help performance. Warner Losh intends to integrate a kernel thread implementation from NetBSD, which will give us the basis to build upon. FreeBSD's VM subsystem is already very close to being able to operate in kernel thread environment. VFS/BIO is a big stumbling block but there is hope there too. A lot of things are starting to come together. I think by the end of the year the most critical pieces of the kernel will be threaded. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jun 29 22: 6:23 1999 Delivered-To: freebsd-smp@freebsd.org Received: from fast.cs.utah.edu (fast.cs.utah.edu [155.99.212.1]) by hub.freebsd.org (Postfix) with ESMTP id 75AA4151E8 for ; Tue, 29 Jun 1999 22:06:14 -0700 (PDT) (envelope-from vanmaren@fast.cs.utah.edu) Received: (from vanmaren@localhost) by fast.cs.utah.edu (8.9.1/8.9.1) id XAA03405; Tue, 29 Jun 1999 23:06:11 -0600 (MDT) Date: Tue, 29 Jun 1999 23:06:11 -0600 (MDT) From: Kevin Van Maren Message-Id: <199906300506.XAA03405@fast.cs.utah.edu> To: dillon@apollo.backplane.com, vanmaren@cs.utah.edu Subject: Re: high-efficiency SMP locks - submission for review Cc: freebsd-smp@FreeBSD.ORG Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > A lot of things are starting to come together. I think by the end of > the year the most critical pieces of the kernel will be threaded. That's great news, indeed! Kevin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jun 30 8:43:49 1999 Delivered-To: freebsd-smp@freebsd.org Received: from cygnus.rush.net (cygnus.rush.net [209.45.245.133]) by hub.freebsd.org (Postfix) with ESMTP id AD3C3153A9 for ; Wed, 30 Jun 1999 08:43:38 -0700 (PDT) (envelope-from bright@rush.net) Received: from localhost (bright@localhost) by cygnus.rush.net (8.9.3/8.9.3) with SMTP id LAA25411; Wed, 30 Jun 1999 11:47:56 -0400 (EDT) Date: Wed, 30 Jun 1999 10:47:54 -0500 (EST) From: Alfred Perlstein To: smp@freebsd.org Cc: tlambert@primenet.com Subject: async call gates Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Mr Lambert, I noticed you've been pushing for async call gates functionality in FreeBSD. I have several assumptions about how this would work that makes it expensive to accomplish, perhaps you or someone on this list can clarify by explanation and/or a URL to some paper or a book reference. Ok, here we go. I assume that an async call gate referes to having kernel threads. When a process performs a system call, once it enters the kernel it spinlocks on a queue then adds a "work node" to it describing what work it needs done. After the work is queued a kernel thread that just finished on another process' "work node" may pick it up and run the appropriate code to accomplish the work detailed in the node. The work node that is built would also describe certain things for the kernel thread: 1) process ID 2) needs a wakeup after completion 3) operation to perform 4) operation args 5) pointer to process statistics (notably u-area and such) The way I think that you are anticipating this improving things is that it's easier for these kernel threads to sync with each other than if all processes were able to enter deep into the kernel, it would also reduce lock contention, and as you said "process issues an async read, after the syscal the process finds itself on CPU0 while the kernel thread processing this on CPU1" While I agree that kernel threads would work to make aio_* work better there's the problem of many additional context switches to get work done per syscall, but that depends somewhat on the implementation. 1) process issues system call (context switch) 2) process must sleep waiting for kernel thread to return with work 3) process is woken up after work is completed 4) process goes back to usermode right now switches 2 and 3 don't happen, can this methodology really save so much in the future? *this has been a test of the emergency assumption test* *if the writer really understood what you were talking* *about he woulda kept quiet* Just an ISBN number, or URL to some paper explaining this would be really great. Thanks. -Alfred Perlstein - [bright@rush.net|bright@wintelcom.net] systems administrator and programmer Win Telecom - http://www.wintelcom.net/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jun 30 9: 5: 8 1999 Delivered-To: freebsd-smp@freebsd.org Received: from overcee.netplex.com.au (overcee.netplex.com.au [202.12.86.7]) by hub.freebsd.org (Postfix) with ESMTP id F30EE14D81 for ; Wed, 30 Jun 1999 09:05:00 -0700 (PDT) (envelope-from peter@netplex.com.au) Received: from netplex.com.au (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id 569AD79; Thu, 1 Jul 1999 00:04:59 +0800 (WST) (envelope-from peter@netplex.com.au) X-Mailer: exmh version 2.0.2 2/24/98 To: Alfred Perlstein Cc: smp@freebsd.org, tlambert@primenet.com Subject: Re: async call gates In-reply-to: Your message of "Wed, 30 Jun 1999 10:47:54 EST." Date: Thu, 01 Jul 1999 00:04:59 +0800 From: Peter Wemm Message-Id: <19990630160459.569AD79@overcee.netplex.com.au> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Alfred Perlstein wrote: > > Mr Lambert, I noticed you've been pushing for async call gates functionality > in FreeBSD. I have several assumptions about how this would work that > makes it expensive to accomplish, perhaps you or someone on this list > can clarify by explanation and/or a URL to some paper or a book reference. [..] > While I agree that kernel threads would work to make aio_* > work better there's the problem of many additional context > switches to get work done per syscall, but that depends somewhat > on the implementation. > > 1) process issues system call (context switch) > 2) process must sleep waiting for kernel thread to return with work > 3) process is woken up after work is completed > 4) process goes back to usermode > > right now switches 2 and 3 don't happen, can this methodology > really save so much in the future? [..] Well, the trick is to not do #2 and #3 unless the kernel thread parts block or are preempted and *really* need to sleep. You switch stack and registers, but otherwise keep running in calling process context until you need otherwise.. If the "lightweight" kthread blocks or sleeps, instead of saving the thread's context, you save the original process's context, update the registers in the kthread context, and put both to sleep. The same kind of thing goes for interrupts too. This is (of course) an over simplification, but should be enough to give a general idea. The beauty is that you don't have to mess with run queues, priorities, etc etc in the majority of cases - just registers, including a new stack pointer. (BSDI have this for interrupts already) Cheers, -Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jun 30 17:35:34 1999 Delivered-To: freebsd-smp@freebsd.org Received: from mail.rdc1.sfba.home.com (ha1.rdc1.sfba.home.com [24.0.0.66]) by hub.freebsd.org (Postfix) with ESMTP id D21EF15970 for ; Wed, 30 Jun 1999 17:35:21 -0700 (PDT) (envelope-from adsharma@c62443-a.frmt1.sfba.home.com) Received: from c62443-a.frmt1.sfba.home.com ([24.0.69.165]) by mail.rdc1.sfba.home.com (InterMail v4.01.01.00 201-229-111) with ESMTP id <19990701003519.KZTN8807.mail.rdc1.sfba.home.com@c62443-a.frmt1.sfba.home.com> for ; Wed, 30 Jun 1999 17:35:19 -0700 Received: (from adsharma@localhost) by c62443-a.frmt1.sfba.home.com (8.8.7/8.8.7) id RAA23676 for freebsd-smp@freebsd.org; Wed, 30 Jun 1999 17:35:19 -0700 Date: Wed, 30 Jun 1999 17:35:19 -0700 From: Arun Sharma To: freebsd-smp@freebsd.org Subject: BSDI Lazy threading Message-ID: <19990630173519.A23653@home.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.5i Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Can someone post a definitive reference to the BSDI work ? Julian Elischer wrote earlier: > BSDI have also done some of this.their approach has been very > interesting: e.g. re: interrupts, They define a thread for each > interrupt source (e.g. irq6,irg7, etc.) When the interrupt occurs they > save regs and transfer to the stack associated with that thread. However > all extra thread context switching is delayed (in the hope that it wont > have to be done). If a lock is encountered, the rest of the context > switch is done, and the thread sleeps. (and control is passed back to > the holder of the lock (if they are runnable) or the original process. Are interrupt handlers allowed to sleep in the kernel ? -Arun To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jun 30 17:51:17 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 4B93C153CC for ; Wed, 30 Jun 1999 17:51:12 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id RAA41944; Wed, 30 Jun 1999 17:51:06 -0700 (PDT) (envelope-from dillon) Date: Wed, 30 Jun 1999 17:51:06 -0700 (PDT) From: Matthew Dillon Message-Id: <199907010051.RAA41944@apollo.backplane.com> To: Arun Sharma Cc: freebsd-smp@FreeBSD.ORG Subject: Re: BSDI Lazy threading References: <19990630173519.A23653@home.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :> all extra thread context switching is delayed (in the hope that it wont :> have to be done). If a lock is encountered, the rest of the context :> switch is done, and the thread sleeps. (and control is passed back to :> the holder of the lock (if they are runnable) or the original process. : :Are interrupt handlers allowed to sleep in the kernel ? : : -Arun Currently: No, but sometimes they do anyway. Proposed (w/ kernel threads): Yes, they will be allowed to sleep but the idea is for it to only occur in special (and not oft-occuring) situations. What we do currently is attempt to disable interrupts in mainline code to avoid contention. Theoretically such disablements are not supposed to occur for long periods of time but the reality is that they often do, especially for network-related things. One advantage of moving interrupts to kernel threads is that the latency issue can be more easily managed. Rather then disable an interrupt entirely we instead allow the interrupt thread to preempt the kernel and then block if necessary. The manageability here is that the interrupt thread can now explicitly check to see if it would block and, if it is a really critical interrupt, can do something about it. I see this as a big advantage because it would allow us to run the most critical interrupts without any real latency at all(1). The serial and keyboard interrupts come to mind. note(1): cavet: interrupts must often be truely disabled when I/O to non-DMA IDE drives is occuring due to bugs in many IDE controllers. The other advantage of running interrupts in threads is that you can run several interrupts simultaniously in an SMP system. For example, a gigabit ethernet device internet would be able to run concurrently with the TCP stack. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jun 30 18: 0:49 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id BE76B15770 for ; Wed, 30 Jun 1999 18:00:45 -0700 (PDT) (envelope-from tlambert@usr09.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id SAA05320; Wed, 30 Jun 1999 18:00:43 -0700 (MST) Received: from usr09.primenet.com(206.165.6.209) via SMTP by smtp02.primenet.com, id smtpd005219; Wed Jun 30 18:00:37 1999 Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id SAA13352; Wed, 30 Jun 1999 18:00:36 -0700 (MST) From: Terry Lambert Message-Id: <199907010100.SAA13352@usr09.primenet.com> Subject: Re: async call gates To: bright@rush.net (Alfred Perlstein) Date: Thu, 1 Jul 1999 01:00:36 +0000 (GMT) Cc: smp@freebsd.org, tlambert@primenet.com In-Reply-To: from "Alfred Perlstein" at Jun 30, 99 10:47:54 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Mr Lambert, I noticed you've been pushing for async call gates functionality > in FreeBSD. I have several assumptions about how this would work that > makes it expensive to accomplish, perhaps you or someone on this list > can clarify by explanation and/or a URL to some paper or a book reference. It's my design, from around 1993. I wanted to think of CPU's as resources which crawled over code, like spiders over a web. It steals liberally from the VMS idea of asynchronous system traps. Peter's explanation is pretty much spot-on. The assumption that the call contexts (you called them 'threads') have to exist for the calls to be made in the first place is a wrong turn. At the simplest level, there are three kinds of system calls: 1) Calls that will never block 2) Calls that will always block 3) Calls that may or may not block It would be trivial to add a flag to the sysent[] structure for each system call to indicate what category it was in, if you had a mind to optimize behaviour for type #1 calls. Now for each system call, there is some context: 1) The calling process' address space 2) Certain aspects of the calling process, unrelated to the address space, per se. This is a subset of the contents of the proc structure: credentials, PID, etc. 3) A kernel stack 4) A list of system resources, accumulated as they are held This arrangement is symmetric between the kernel and user space, as we will see below. Now for the process model in user space. Let's assume that all existing system calls which are of type "will always block" or of type "may or may not block" can now be executed asynchronously, and that this is not an attribute of a particular system call, but is an attribute of the call mechanism itself. We might also place in this category some of the "will never block" calls, if they take "a long time" to complete. This, in a nutshell, is the "async call gate". This provides us with two interesting concurrency models which we can use in user space: 1) We can have a process with multiple outstanding asynchronous system calls pending simultaneously. This is the "aio" model, except we are not limited to the POSIX subset "aioread" and "aiowrite". We can, instead, take "takes a long time" operations, such as "open", "sync", "getdents", "connect", "syslog", and so on, and also make *them* asynchronously. 2) We can implement on top of the asynchronous (non-blocking) calls, a set of primitives that emulate blocking calls. This emulation is done by making an async call and then putting the caller to sleep until the async call returns. Rather than giving up our quantum, however, we change to another set of registers, another program counter, and another stack. This implementation is called "user space threading using a call conversion scheduler". The advantage of this threads model over a kernel threads model is that, so long as we have work pending, we can utilize the full measure of our CPU quantum without taking a full context switch overhead. This was the original promise that the threads model made to us when it first appeared, and then renigged upon when threads moved into the kernel. The main problem with kernel threads is that kernel threads are not really threads, per se, they are KSE's -- Kernel Schedulable Entities. We call them KSE's to make no distinction between the work-to-do for a thread vs. a traditional process. The result is that kernel threads compete for quantum as processes, both with other threads in the same thread group ("multithreaded process"), and with other threads in other thread groups (a thread group with but a single member is a traditional UNIX process). The main benefit of kernel threads is SMP scalability for programmer parallelized processes: processes which have been intentionally multithreaded (as opposed to using asynchronous calls) in order to increase their concurrency. There is really no benefit (and in fact, a large amount of cache busting and context switch overhead) to using kernel threads as opposed to asynchronous calls in order to achieve SMP scaling. It is a lot of work to try and get the scheduler to intentionally minimize address space changes. You can do opportunistic state change avoidance when you are switching between one thread in a process group and another thread in the same process group, but that really buys you very little. More complex soloutions lead to starvation deadlocks and other bad behaviour. This leaves one area where kernel threads still have better SMP scalability: when most of the cycles are spent in user space code. Right now, in SMP FreeBSD, each CPU can be in user space at the same time; in fact, given the Big Giant Lock(tm), multiple CPU's can only be in the kernel simultaneously under some rigidly controlled circumstances. BSDI gets around the reentrancy issue for interrupts by moving the interrupt handling to kernel threads; that's one way to do it, but it has some real bad behaviour compared to "top" and "bottom" drivers, like in NT and Solaris' SMP implementations. What kernel threads promise, and what make people willing to put up with their obvious drawbacks, is the ability for multiple CPU's to be in user space _in the same process_ at the same time. This really impresses some people, when it probably shouldn't. This seems to really be a feat of legerdemain; in fact, it's not that hard, and you don't need kernel threads to get there, only the ability to queue scheduler reservations for user space code already in a ready-to-run state. This is best accomplished with a user space hook into a scheduler activation mechanism. We can see this rather handily by asking ourselves "What is a user space process?"; the answer is: 1) A set of registers, including: o A stack o A program counter 2) An address space In other words, the same mechanism that is used by the call conversion to switch between one "blocked" user space thread and another "ready to run" user space thread can be used to dispatch a second (or third or fourth or Nth) CPU into the user space code as well. All we are missing is a tiny bit of glue in the user space call conversion scheduler. This glue says: 1) When you get a chance, "return" a quantum to user space to this address, which happens to be my user space scheduler dispatcher. 2) I have an aversion to this request being enqueued on any of the ready-to-run lists for CPU's where I already have one of these requests enqueued, or which are currently running in my user space at my behest. The first is trivial to code up: it's called "vfork". One might make certain optimizations based on the system call architecture divorcing the kernel stack and other context from the process structure itself; I would advise this. The result is significantly lighter weight than the current vfork, since the entirety of the context is actually just the user space context. The only real "trick" is to ensure that your scheduler dispatcher has a stack available for every one of these which may return to user space; you wouldn't want to use a handy user space thread's stack for this, since you want to eliminate quantum bias. The second takes a bit more work. A very rough implementation is trivial, however: a 32 bit bitmask associated with the process context, with one bit per available processor. What's the result? The result is a highly SMP scalable multiple CPU utilizing user space threading system, which neatly sidesteps the context switch address space thrashing and N:M kernel:user space thread mapping problems. The kernel reentrancy issues for fault/call/interrupt can be handled seperately, without a lot of overly complicated (by kernel threads) effort. Ideally, the BSDI approach would *NOT* be used; it's dated. Here's a reference from 1991: Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs E. Mohr, D.A. Kranz, and R.H. Halstead, Jr. IEEE Transactions on Parallel and Distributed Systems, July 1991, pages 264-280 And there are other references (which I don't have immediately at hand) showing this approach to be more than a decade and a half behind the current state of the art. Some people have already referenced more recent work on the FreeBSD SMP list. Late-binding resources is a useful technique, but it's not the only technique, and for the particular implementation (interrupts -- yes, I know that I'm suggesting lazy binding of blocking contexts when I suggest an async call gate), it's probably not the best technique. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jun 30 18:33:31 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 4254215163 for ; Wed, 30 Jun 1999 18:33:27 -0700 (PDT) (envelope-from tlambert@usr09.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.8.8/8.8.8) id SAA17249; Wed, 30 Jun 1999 18:33:26 -0700 (MST) Received: from usr09.primenet.com(206.165.6.209) via SMTP by smtp04.primenet.com, id smtpd017185; Wed Jun 30 18:33:16 1999 Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id SAA14411; Wed, 30 Jun 1999 18:33:13 -0700 (MST) From: Terry Lambert Message-Id: <199907010133.SAA14411@usr09.primenet.com> Subject: Re: BSDI Lazy threading To: dillon@apollo.backplane.com (Matthew Dillon) Date: Thu, 1 Jul 1999 01:33:12 +0000 (GMT) Cc: adsharma@home.com, freebsd-smp@FreeBSD.ORG In-Reply-To: <199907010051.RAA41944@apollo.backplane.com> from "Matthew Dillon" at Jun 30, 99 05:51:06 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Proposed (w/ kernel threads): > > Yes, they will be allowed to sleep but the idea is for it to only > occur in special (and not oft-occuring) situations. I would strongly recommend _not_ implementing the BSDI soloution, at least not without further investigation of alternatives. I'm actually rather taken with the NT approach to achieving better network performance; those guys are sneaky. What they do is to, with multiple CPU's and multiple Ethernet cards, bind the interrupts for each card to a different processor. > What we do currently is attempt to disable interrupts in mainline code > to avoid contention. Theoretically such disablements are not supposed > to occur for long periods of time but the reality is that they often do, > especially for network-related things. One advantage of moving interrupts > to kernel threads is that the latency issue can be more easily managed. > Rather then disable an interrupt entirely we instead allow the interrupt > thread to preempt the kernel and then block if necessary. The advantage to this approach, which makes it appear attractive, is the ability to have multiple simultaneous interrupt stacks; however, I think that it's possible to have an implementation that has this capability, _without_ needing kernel threads to implement it. I think that what you are wanting to fix is actually the NETISR concurrency and latency issues. I think the threads approach addresses concurrency, but fails to address latency. Again, per my previous posting in the async call gates thread, one could imagine a trivial implementation using a 32 bit service mask, with one bit per CPU, in order to hand out interrupts. A per CPU interrupt stack would be enough to deal with the issue, without needing kernel threads. One real problem is loss of determinism that occurs with kernel thread scheduling, unless you greatly complicate the implementation of the scheduler. I think kernel threads are a really bad idea. People use threads in user space programs, either because they lack the asynchronous primitives necessary to get concurrency, or because they are lazy thinkers, and don't want to deal with the issue of organizing their data structures so that there can be multiple instances of them (i.e. migrating their context out of global variables and off the stack). I think that there is sufficient brain-power available that it's not necessary to do this to the kernel to achieve the performance goals. In the limit, multiple simultaneous asynchronous operations beat kernel threads, due to allocation, deallocation, context, and scheduling overhead that exist with threads, but not with asynchronous operations. In this case, because you *are* the kernel, there's not the same SMP scalability issue -- that issue only exists when migrating CPU's from kernel to user space. Give fault or interrupt based kernel entry, these issues don't exist for kernel code. > The manageability here is that the interrupt thread can now > explicitly check to see if it would block and, if it is a > really critical interrupt, can do something about it. I see > this as a big advantage because it would allow us to run the > most critical interrupts without any real latency at all(1). > The serial and keyboard interrupts come to mind. > > note(1): cavet: interrupts must often be truely disabled when I/O to > non-DMA IDE drives is occuring due to bugs in many IDE controllers. > > The other advantage of running interrupts in threads is that you can > run several interrupts simultaniously in an SMP system. For example, > a gigabit ethernet device internet would be able to run concurrently > with the TCP stack. You should be able to do this anyway, without resorting to threads. The only real issue that has prevented the (trivial) code necessary to implement this is the potential for inter-handler resource contention, which is a pending problem with the more complicated threads approach as well. This is mostly a driver architecture problem for individual drivers servicing multiple cards from a common (instead of seperately allocated) driver-global context. I also _seriously_ dislike the idea that it's premissible to block in an interrupt handler under any circumstances, especially if it isn't a panic condition to block before ack'ing the interrupt that got you there. 8-(. You need to ask Julian about the Cyrix 55xx chips, and what you have to do to make sure you don't lose an interrupt under some perverse circumstances. Blocking prior to ack'ing the interrupt would _greatly_ exacerbate the problem. I don't know about the Alpha (I haven't cracked my architecture manual for a year), but long experience with the VAX tells me that it would be Bad(tm) to leave a high priority interrupt un-ack'ed on the theory that you could service a lower priority interrupt (i.e. you're not going to get one). The BSDI approach precludes operation on some architectures (I assume that you will want to use the same implementation in UP kernels for the same concurrency reasons; I think a better approach to UP concurrency issues is to -- finally -- resolve the kernel preeemption problem, once and for all). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jun 30 18:54:47 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id AC2E714D62 for ; Wed, 30 Jun 1999 18:54:42 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id SAA42283; Wed, 30 Jun 1999 18:54:39 -0700 (PDT) (envelope-from dillon) Date: Wed, 30 Jun 1999 18:54:39 -0700 (PDT) From: Matthew Dillon Message-Id: <199907010154.SAA42283@apollo.backplane.com> To: Terry Lambert Cc: bright@rush.net (Alfred Perlstein), smp@FreeBSD.ORG, tlambert@primenet.com Subject: Re: async call gates References: <199907010100.SAA13352@usr09.primenet.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In one of my embedded operating systems I came up with a great idea in how to handle system calls involving I/O. Actually, the idea has its roots with the Amiga's I/O subsystem. Basically instead of making a system call per-say, you build a message and send it, then wait for a reply. Now one might think that would be rather slow for system calls that can run synchronously. This is where the idea from the Amiga's I/O subsystem comes into play. Simply put, the Amiga had the notion of userland being able to request asynchronous or synchronous operation, and then having deviceland be able to run a request either asynchronously or synchronously, separate from what the userland requested. In otherwords, the type of operation is decoupled. When you make the I/O call you ask it to run either synchronously or asynchronously. The device takes the request into account and then decides how to actually run it, depending on whether the device thinks it will need to block or not. In the Amiga, the device's decision was *independant* of the request. So in my embedded OS I had a domsg() and a sendmsg() API. The user process constructs the message and calls one or the other entry point. If the user requested a synchronous call AND the device happens to run the request synchronously, the OS doesn't bother even queueing the message up to the device. It calls the device entry, sees that the device ran the message synchronously, and returns. The result is an operation that runs just as quickly as a normal system call might since the added complexity of queueing and dequeueing the message just doesn't happen. If the API code sees that the device has done something different, it makes the appropriate adjustments to the message to make it appear to the user process that the device did it the way the user process requested. So, for example, if the user process calls the domsg() entry point and the device entry indicates that it is running the request asynchronously, the domsg() entry point will then block waiting for the request to complete. If the user process calls the sendmsg() entry point and the device entry indicates that it ran the request synchronously, the sendmsg() entry point will then queue the message on the return list which is what the user process is requesting. So: int domsg(msg) { int r; ... msg->flags |= MSGF_SYNCHRONOUS; device->entry(msg); if ((msg->flags & MSGF_SYNCHRONOUS) == 0) waitmsg(msg); return(msg->result); } void sendmsg(msg, replyport) { int r; ... msg->flags &= ~(MSGF_RETURNED | MSGF_SYNCHRONOUS); msg->rport = replyport; device->entry(msg); if (msg->flags & MSGF_SYNCHRONOUS) { replymsg(msg, msg->result); /* note: clears MSGF_SYNCHRONOUS */ } } int waitmsg(msg) { if ((msg->flags & MSGF_SYNCHRONOUS) == 0) { while ((msg->flags & MSGF_RETURNED) == 0) ... block on message ... } return(msg->result); } void replymsg(msg, result) { msg->result = result; if (msg->flags & MSGF_SYNCHRONOUS) { msg->rport = NULL; msg->flags &= ~MSGF_SYNCHRONOUS; } if (msg->rport) { switch(msg->rport->type) { case PTYPE_NORMAL: addlist(&msg->rport->queue, &msg->qnode); wakeup(msg->rport); break; case PTYPE_INTERRUPT: ... queue interrupt thread ... break; case PTYPE_USER_VECTORED_INTERRUPT: ... queue task software interrupt ... break; case PTYPE_RETCALL: msg->rport->retfunc(msg); break; ... } } msg->flags |= MSGF_RETURNED; } Pretty simple, eh? Yet it completely decouples the requester from the requestee. The requester can even request that upon completion the message be returned to a completion port, or generate a software interrupt, or even run a supervisory function synchronously. The requestee doesn't care what the requestor wants, it is all encapsulated in the API. struct port { struct list queue; int type; int pri; void (*func)(struct msg *msg); }; struct msg { struct node qnode; struct port *rport; int flags; off_t result; int cmd; void *data; int len; off_t arg1; off_t arg2; int arg3; }; -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jun 30 18:56:22 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp11.bellglobal.com (smtp11.bellglobal.com [204.101.251.53]) by hub.freebsd.org (Postfix) with ESMTP id 51C77154CF for ; Wed, 30 Jun 1999 18:56:14 -0700 (PDT) (envelope-from hoek@FreeBSD.org) Received: from localhost.nowhere (ppp18415.on.bellglobal.com [206.172.130.95]) by smtp11.bellglobal.com (8.8.5/8.8.5) with ESMTP id VAA20219; Wed, 30 Jun 1999 21:59:14 -0400 (EDT) Received: (from tim@localhost) by localhost.nowhere (8.9.3/8.9.1) id VAA07429; Wed, 30 Jun 1999 21:56:58 -0400 (EDT) (envelope-from tim) Date: Wed, 30 Jun 1999 21:56:57 -0400 From: Tim Vanderhoek To: Terry Lambert Cc: Alfred Perlstein , smp@FreeBSD.org Subject: Re: async call gates Message-ID: <19990630215657.C7269@mad> References: <199907010100.SAA13352@usr09.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95i In-Reply-To: <199907010100.SAA13352@usr09.primenet.com>; from Terry Lambert on Thu, Jul 01, 1999 at 01:00:36AM +0000 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, Jul 01, 1999 at 01:00:36AM +0000, Terry Lambert wrote: > > 2) We can implement on top of the asynchronous (non-blocking) [...] > Rather than giving up our quantum, however, we change to > another set of registers, another program counter, and Is this the famous "It's my damn quantum" debate? -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jun 30 19:19:23 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id AB19F14F22 for ; Wed, 30 Jun 1999 19:19:21 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id TAA42502; Wed, 30 Jun 1999 19:19:18 -0700 (PDT) (envelope-from dillon) Date: Wed, 30 Jun 1999 19:19:18 -0700 (PDT) From: Matthew Dillon Message-Id: <199907010219.TAA42502@apollo.backplane.com> To: Matthew Dillon Cc: Terry Lambert , bright@rush.net (Alfred Perlstein), smp@FreeBSD.ORG, tlambert@primenet.com Subject: Re: async call gates References: <199907010100.SAA13352@usr09.primenet.com> <199907010154.SAA42283@apollo.backplane.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org : So in my embedded OS I had a domsg() and a sendmsg() API. The user : process constructs the message and calls one or the other entry point. : If the user requested a synchronous call AND the device happens to run : the request synchronously, the OS doesn't bother even queueing the : message up to the device. It calls the device entry, sees that the : device ran the message synchronously, and returns. The result is an : operation that runs just as quickly as a normal system call might since : the added complexity of queueing and dequeueing the message just doesn't : happen. Plus it would be fairly easy to make the message-entry itself SMP-capable (i.e. allow several threads to attempt to sendmsg() the same message simultaniously, in which one succeeds and the other fails due to the message being in-progress. This would be useful for dispatching software interrupts). -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Jun 30 22:30:10 1999 Delivered-To: freebsd-smp@freebsd.org Received: from soda.CSUA.Berkeley.EDU (soda.CSUA.Berkeley.EDU [128.32.43.52]) by hub.freebsd.org (Postfix) with ESMTP id 95DDA14D46 for ; Wed, 30 Jun 1999 22:30:06 -0700 (PDT) (envelope-from jwm@CSUA.Berkeley.EDU) Received: from soda.CSUA.Berkeley.EDU (localhost [127.0.0.1]) by soda.CSUA.Berkeley.EDU (8.8.8/) via ESMTP id WAA07072; Wed, 30 Jun 1999 22:27:17 -0700 (PDT) env-from (jwm@CSUA.Berkeley.EDU) Message-Id: <199907010527.WAA07072@soda.CSUA.Berkeley.EDU> To: Terry Lambert Cc: smp@FreeBSD.ORG Subject: Re: async call gates In-reply-to: Message from Terry Lambert of "Thu, 01 Jul 1999 01:00:36 -0000." <199907010100.SAA13352@usr09.primenet.com> Date: Wed, 30 Jun 1999 22:27:17 -0700 From: John Milford Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Terry Lambert wrote: > > At the simplest level, there are three kinds of system calls: > > 1) Calls that will never block > 2) Calls that will always block > 3) Calls that may or may not block > > It would be trivial to add a flag to the sysent[] structure for > each system call to indicate what category it was in, if you had > a mind to optimize behaviour for type #1 calls. > > > Now for each system call, there is some context: > > 1) The calling process' address space > 2) Certain aspects of the calling process, unrelated to the > address space, per se. This is a subset of the contents > of the proc structure: credentials, PID, etc. > 3) A kernel stack > 4) A list of system resources, accumulated as they are held > > > This arrangement is symmetric between the kernel and user space, > as we will see below. > > > Now for the process model in user space. > > Let's assume that all existing system calls which are of type "will > always block" or of type "may or may not block" can now be executed > asynchronously, and that this is not an attribute of a particular > system call, but is an attribute of the call mechanism itself. We > might also place in this category some of the "will never block" > calls, if they take "a long time" to complete. > > This, in a nutshell, is the "async call gate". > . . . > > The advantage of this threads model over a kernel threads model > is that, so long as we have work pending, we can utilize the > full measure of our CPU quantum without taking a full context > switch overhead. This was the original promise that the threads > model made to us when it first appeared, and then renigged upon > when threads moved into the kernel. > I have been working recently on an async syscall mechanism that sounds somewhat like what you are discussing here. Part of the reason I would like to have it is that I am unsure of exactly how much it will buy performance wise. The implementation I am working on does what I call "lazy async", meaning that the syscall proceeds until it is about to block, and at this time it does a customized fork, the parent returns to user space, and the child proceeds to wait, and finish the call. This involves adding an extra paramater to all the syscalls for notification, bu that is an implementation detail. When I first started on this I thought the performance increase in using this mechanism with an appropriate userland thread package over using KSE's would be substantial. But now I'm not so sure, the reason being that this still creates a lot of precesses/kernel-threads (assuming a large sizable percentage of syscalls are going to block at some point). It may however save us from making as many VM switches. The only other optimizations I could come up with were to not save FP state when context switching from kernel async worker threads (and conversely not restore it when switching back), and more importantly to do lazy VM switches for the async workers meaning that instead of switching the VM space when switching to an async worker, the switch could be done when/if a copyin/copyout is done. Does this sound like a reasonable approach? Has anyone tried this before and gotten positive results? Is there a known better approach to this? --John To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Jul 1 0: 3:14 1999 Delivered-To: freebsd-smp@freebsd.org Received: from shell.futuresouth.com (shell.futuresouth.com [198.78.58.28]) by hub.freebsd.org (Postfix) with ESMTP id C5B9614C83 for ; Thu, 1 Jul 1999 00:03:09 -0700 (PDT) (envelope-from tim@futuresouth.com) Received: (from tim@localhost) by shell.futuresouth.com (8.9.3/8.9.3) id CAA25721; Thu, 1 Jul 1999 02:02:52 -0500 (CDT) Date: Thu, 1 Jul 1999 02:02:52 -0500 From: Tim Tsai To: Matthew Dillon Cc: Terry Lambert , Alfred Perlstein , smp@FreeBSD.ORG Subject: Re: async call gates Message-ID: <19990701020252.A24971@futuresouth.com> References: <199907010100.SAA13352@usr09.primenet.com> <199907010154.SAA42283@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.3i In-Reply-To: <199907010154.SAA42283@apollo.backplane.com>; from Matthew Dillon on Wed, Jun 30, 1999 at 06:54:39PM -0700 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Basically instead of making a system call per-say, you build a message > and send it, then wait for a reply. Have you looked at the QNX design? http://www.qnx.com Basically it's a message passing microkernel that is POSIX compliant. Most BSD programs port easily to it. It has too many cool features for me to list here, including trivial device driver API (each device driver is a separate process), transparent distributed computing/networking and a lightweight GUI. Tim To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Jul 1 14: 9:13 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id A3D9614C32 for ; Thu, 1 Jul 1999 14:09:06 -0700 (PDT) (envelope-from tlambert@usr06.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id LAA25015; Thu, 1 Jul 1999 11:33:16 -0700 (MST) Received: from usr06.primenet.com(206.165.6.206) via SMTP by smtp02.primenet.com, id smtpd022462; Thu Jul 1 11:29:10 1999 Received: (from tlambert@localhost) by usr06.primenet.com (8.8.5/8.8.5) id LAA07360; Thu, 1 Jul 1999 11:27:10 -0700 (MST) From: Terry Lambert Message-Id: <199907011827.LAA07360@usr06.primenet.com> Subject: Re: async call gates To: vanderh@ecf.utoronto.ca (Tim Vanderhoek) Date: Thu, 1 Jul 1999 18:27:10 +0000 (GMT) Cc: tlambert@primenet.com, bright@rush.net, smp@FreeBSD.org In-Reply-To: <19990630215657.C7269@mad> from "Tim Vanderhoek" at Jun 30, 99 09:56:57 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > On Thu, Jul 01, 1999 at 01:00:36AM +0000, Terry Lambert wrote: > > > > 2) We can implement on top of the asynchronous (non-blocking) > [...] > > Rather than giving up our quantum, however, we change to > > another set of registers, another program counter, and > > Is this the famous "It's my damn quantum" debate? Yes. With appologies to the USMC: The Creed of the UNIX Process This is my quantum. There are many like it, but this one is mine. My quantum is my best friend. It is my life. I must master it as I master my life. My quantum, without me is useless. Without my quantum, I am useless. I must utilize my quantum true. I must preempt more frequently than the other processes on the system who are trying to starve me. I must preeempt them before they preempt me. I will... My quantum and my address space know that what counts in execution is not the number of system calls we make, the count of our instructions, nor the context switches we are involved in. We know that it is the elapsed wall time to completion. We will complete... ... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jul 2 0: 7: 8 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp5.jps.net (smtp5.jps.net [209.63.224.55]) by hub.freebsd.org (Postfix) with ESMTP id E749A1502A for ; Fri, 2 Jul 1999 00:07:06 -0700 (PDT) (envelope-from ulairi@jps.net) Received: from ulairi (208-237-196-86.irv.jps.net [208.237.196.86]) by smtp5.jps.net (8.9.0/8.8.5) with SMTP id AAA01094 for ; Fri, 2 Jul 1999 00:07:02 -0700 (PDT) From: "Ulairi" To: "Smp" Subject: Got a curious question from someone about re-entrant vs non-re-entrant kernels Date: Fri, 2 Jul 1999 00:06:52 -0700 Message-ID: <007601bec459$765b9740$56c4edd0@ulairi> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V4.72.2106.4 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 | -----Original Message----- | From: Len Huppe [mailto:huppe@execpc.com] | Sent: Thursday, July 01, 1999 20:02 | To: Ulairi | Subject: Re: reentrant kernel | | | If I were to go out and buld an SMP system tomorrow, what kind of | performance increase should I realistically expect from a | fully reentrant | kernel vs. a non-reentrant kernel? My own experience in that area is too limited to give a decent answer. Perhaps you guys could tell Len (and myself please :) ) General Purpose Computer Geek California State University, Northridge College of Engineering and Computer Science 18111 Nordhoff St, Post Stop 8295 Northridge, CA 91330 ulairi@jps.net ulairi@ecs.csun.edu ntadmin@ecs.csun.edu secadmin@ecs.csun.edu -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 6.0.2i iQA/AwUBN3xjVlR8Yh25VFLEEQKKJgCgkhWurxNiazYrM4CETwJNjyO1xrgAn2Xt z0tU1qzQ0FY5tSMRrZg5EWUs =yvj6 -----END PGP SIGNATURE----- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sat Jul 3 0:57: 1 1999 Delivered-To: freebsd-smp@freebsd.org Received: from wwdg.com (mail.wwdg.com [209.181.65.217]) by hub.freebsd.org (Postfix) with ESMTP id 51E9D15290 for ; Sat, 3 Jul 1999 00:56:57 -0700 (PDT) (envelope-from dvwd@wwdg.com) Received: (from web@localhost) by wwdg.com (8.8.5/8.8.0) id BAA20875; Sat, 3 Jul 1999 01:49:22 -0600 Date: Sat, 3 Jul 1999 01:49:22 -0600 Message-Id: <199907030749.BAA20875@wwdg.com> From: dvwd@wwdg.com To: freebsd-smp@freebsd.org, wjwen@engr.ucdavis.edu Full-Name: Dave Wood Subject: Another SMP motherboard Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Just thought I'd let you know, that I have the Supermicro P6DBE (Dual PII/II 440BX chipset) working with FreeBSD 3.2-Release SMP. All I needed to do was to turn on SMP and APIC_IO. Worked first time :). Dave To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sat Jul 3 1: 4: 5 1999 Delivered-To: freebsd-smp@freebsd.org Received: from wwdg.com (mail.wwdg.com [209.181.65.217]) by hub.freebsd.org (Postfix) with ESMTP id 7838214D41 for ; Sat, 3 Jul 1999 01:04:01 -0700 (PDT) (envelope-from dvwd@wwdg.com) Received: (from web@localhost) by wwdg.com (8.8.5/8.8.0) id BAA21036 for freebsd-smp@freebsd.org; Sat, 3 Jul 1999 01:56:26 -0600 Date: Sat, 3 Jul 1999 01:56:26 -0600 Message-Id: <199907030756.BAA21036@wwdg.com> From: dvwd@wwdg.com To: freebsd-smp@freebsd.org Full-Name: Dave Wood Subject: Mptable output hosed Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hello, I have SMP running on my SuperMicro P6DBE mother board, however when I run mptable, I only get some of the normal output. It ends with: MP Config Extended Table Entries: Extended Table HOSED! I am running FreeBSD 3.2-RELEASE. When I rebuilt the kernel, I turned on SMP, and forced memory size to 256megs. Any ideas? Dave -------- Full output of mptable: =============================================================================== MPTable, version 2.0.15 ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: BIOS physical address: 0x000fb4f0 signature: '_MP_' length: 16 bytes version: 1.4 checksum: 0xdd mode: Virtual Wire ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x000f2490 signature: 'PCMP' base table length: 268 version: 1.4 checksum: 0xd7 OEM ID: 'INTEL ' Product ID: '440BX ' OEM table pointer: 0x00000000 OEM table size: 0 entry count: 25 local APIC address: 0xfee00000 extended table length: 16 extended table checksum: 234 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- Processors: APIC ID Version State Family Model Step Flags 0 0x11 BSP, usable 6 5 2 0x183fbff 1 0x11 AP, usable 6 5 2 0x183fbff -- Bus: Bus ID Type 0 PCI 1 PCI 2 ISA -- I/O APICs: APIC ID Version State Address 2 0x11 usable 0xfec00000 -- I/O Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# ExtINT conforms conforms 2 0 2 0 INT conforms conforms 2 1 2 1 INT conforms conforms 2 0 2 2 INT conforms conforms 2 3 2 3 INT conforms conforms 2 4 2 4 INT conforms conforms 2 5 2 5 INT conforms conforms 2 6 2 6 INT conforms conforms 2 7 2 7 INT active-hi edge 2 8 2 8 INT conforms conforms 2 9 2 9 INT conforms conforms 2 10 2 10 INT conforms conforms 2 12 2 12 INT conforms conforms 2 13 2 13 INT conforms conforms 2 14 2 14 INT conforms conforms 2 15 2 15 INT active-lo level 2 11 2 18 SMI conforms conforms 2 0 2 23 -- Local Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# ExtINT conforms conforms 0 0:A 255 0 NMI conforms conforms 0 0:A 255 1 ------------------------------------------------------------------------------- MP Config Extended Table Entries: Extended Table HOSED! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sat Jul 3 9:43:47 1999 Delivered-To: freebsd-smp@freebsd.org Received: from gera.nix.nns.ru (ns.nns.ru [194.135.102.10]) by hub.freebsd.org (Postfix) with ESMTP id 40F1B14C02 for ; Sat, 3 Jul 1999 09:43:38 -0700 (PDT) (envelope-from dflit@nns.ru) Received: (from dflit@localhost) by gera.nix.nns.ru (8.9.1a/8.7.3) id UAA08458 for freebsd-smp@freebsd.org; Sat, 3 Jul 1999 20:43:37 +0400 (MSD) To: freebsd-smp@freebsd.org Message-ID: Organization: National Electronic Library Date: Sat, 3 Jul 1999 20:43:37 +0400 (MSD) X-Mailer: Mail/@ [v2.45 FreeBSD] From: Dmitry Flitmann Reply-To: dflit@nns.ru Error-to: dflit@nns.ru Subject: Intel SC450NX hangs under high disk/memory load Lines: 41 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hi there! We've got a "fast" computer for our database: Intel SC450NX, 2xXeon/500MHz/512K cache, 1G RAM (4x256 50ns ECC EDO Buffered DRAM from Samsung), SymBios U2W SCSI onboard, 2xPCI, 3x18G Seagate Cheetah, OS - FreeBSD 3.2-STABLE - also tried 3.1,3.2-RELEASE, 4.0-CURRENT. At first, we had to patch NCR driver - then it worked fine for some time. Under high load disks/memory load (copying a large directory tree from one disk to another - ~200Mb, ~150K files) a problem appears - after ~15 minutes of hard work the system hangs - it does not create any new processes anymore. When we try ktrace, it shows last operation "namei" (while opening file for reading). 3.2-RELEASE & -STABLE & 4.0-CURRENT die silently, 3.1 reports "Page fault while in kernel mode". fault virtual address diffes, once it was 0x0. Our first idea was that the problem is in a patched ncr driver, so we have replaced SymBios with Adaptec 2940U2W, but effect persists. CPU load is not very high, there are not a lot of processes, and no one keeps a lot of files open simultaneously. MAXUSERS is 512 (or 256) We tried both SMP and single-processor kernels. sorry for poor English. Sincerely, Dmitry Flitman National News Service/National Electronic Library. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message