From owner-freebsd-hackers Tue May 21 10:27:15 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id KAA12056 for hackers-outgoing; Tue, 21 May 1996 10:27:15 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id KAA12050; Tue, 21 May 1996 10:27:13 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id KAA01411; Tue, 21 May 1996 10:22:17 -0700 From: Terry Lambert Message-Id: <199605211722.KAA01411@phaeton.artisoft.com> Subject: Re: Congrats on CURRENT 5/1 SNAP... To: davem@caip.rutgers.edu (David S. Miller) Date: Tue, 21 May 1996 10:22:17 -0700 (MST) Cc: terry@lambert.org, jehamby@lightside.com, jkh@time.cdrom.com, current@freebsd.org, hackers@freebsd.org In-Reply-To: <199605210823.EAA07997@huahaga.rutgers.edu> from "David S. Miller" at May 21, 96 04:23:57 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > > The SunOS LWP's are pretty easy. > > Actually SunOS does do lwp scheduling where it checks for AST's etc. > although I don't know how relevant that is to whats being discussed. Yes. It uses aioread/aiowrite/aiowait/aiocancel; these are closer to an event flag cluster than AST's. > Furthermore, the way Solaris does threads in the kernel has been > proven to be a lose (pre-emption, a billion mutexes in the kernel, > another thousand read writer locks) and expect the industry to move in > "another" direction. Computer science has proven that current smp > technology (read as: what SVR4.2MP based kernels do right now) cannot > scale past 32 cpu's without an exponential loss in performance. This is an artifact of their VM implementation, and the number is generally acknowledged to be 8 processors. It's possible to get a modified NUMA for an SMP environment using per processor page allocation pools. You're free to put SLAB allocators on top of those pages. This means that the allocation mutex need only be held when the per processor page pool is refilled/released to the general page pool. Using a hierarchical lock manager and computation of transitive closure over the lock hierarchy (treating it as a directed graph), coupled with intention mode locking, there should be a significant decrease in bus overhead. I *don't* think you'd want a non-symmetric implementation. It should be noted that multithreading UFS in SVR4 (UnixWare) resulted in a 160% performance improvement -- even after the performance loss for using mutexes for locking was subtracted out. > Clustering is the answer and can scale to more CPU's than you can > count in an unsigned char. ;-) So can the scheme described above. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.