From owner-freebsd-smp Mon Sep 9 03:54:50 1996 Return-Path: owner-smp Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id DAA14851 for smp-outgoing; Mon, 9 Sep 1996 03:54:50 -0700 (PDT) Received: from spinner.DIALix.COM (spinner.DIALix.COM [192.203.228.67]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id DAA14846 for ; Mon, 9 Sep 1996 03:54:44 -0700 (PDT) Received: from spinner.DIALix.COM (localhost.DIALix.oz.au [127.0.0.1]) by spinner.DIALix.COM (8.7.5/8.7.3) with ESMTP id SAA08111; Mon, 9 Sep 1996 18:43:58 +0800 (WST) Message-Id: <199609091043.SAA08111@spinner.DIALix.COM> X-Mailer: exmh version 1.6.7 5/3/96 To: rv@groa.uct.ac.za (Russell Vincent) cc: freebsd-smp@freebsd.org Subject: Re: Intel XXpress - some SMP benchmarks In-reply-to: Your message of "Sat, 09 Sep 1996 11:25:31 +0200." Date: Mon, 09 Sep 1996 18:43:58 +0800 From: Peter Wemm Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Russell Vincent wrote: > 'lmbench 1.0' results for: Ahem.. Enough said. :-) But regardless of the accuracy issue, it certainly gives an indication of the various bottlenecks. > o Option (3), although not that good in the benchmarks, certainly > appears faster in interactive use. That could just be my imagination, > though. :-) Several things to consider: - the second cpu is never pre-empted while running. This is bad (obviously :-) since a process that does a while(1); till run on the cpu forever unless it gets killed or paged. And on that note, we don't make any allowance for the page tables being changed while one cpu is in user mode. (we flush during the context switch, but that doesn't help if a page is stolen). I've been trying to decipher some of the more obscure parts of the apic docs, and it appears that we can sort-of simulate a round-robin approach on certain interrupts without too much reliability, but it's better than nothing I think. (I have in mind setting all the cpu "priorities" the same, and let the apic's use their internal tie-breaking weighting. I've not read enough on it yet, but I think it's possible...) - the smp_idleloop is currently killing the performance when one process is running, because the idleloop is constantly bouncing back and forwards between the two idle procs. ie: _whichidqs is always true, so it's constantly locking, and unlocking causing extreme congestion on that lock. There has got to be a better way to do the locking (I have ideas). When one process leaves kernel mode, it's got a fight on it's hands to get back in. It's got to try and get the MESI cache line in a favourable state so that it can try a lock. I'm suprised this hasn't turned up before now that I think about it. I would expect the system would not do too well under heavy paging load... :-( - several major subsystems run a fair bit of code without spl protection (I'm thinking of VFS and VM). If we could ever figure out how to clean the trap/exception/interrupt handling up enough to cleanly enter and exit a "locked" state, we could probably do wonders like having some parts of the kernel reentrant on both cpus. Unfortunately, the trap code is extremely optimised for the single-processor case (and I do mean extreme.. :-), and is quite difficult to follow. We had to introduce reference counting on the kernel mutex lock some time ago simply because parts of the kernel are reentered via the trap code from within the kernel. A rethink needs to happen here to figure out how we can cut downt he locking overheads without penalising the uniprocessor case much. That may mean having a seperate lock for the trap layer and the kernel, where only one cpu can be within the trap layer (with a simple, non-stacking lock), and the "kernel proper" lock is reference counted. The "kernel proper" lock could probably then have the vfs and perhaps vm split off into seperate locks or locking strategies. (and if somebody starts spouting jargon from his graph-theory book, that I for one don't understand a word of, I'll scream. :-) - "less debug code".. Have you looked very closely at the implications of your chipset bios settings? Is it possible that some of the speedups are deferring cpu cache writebacks too long and one cpu is getting data from RAM that has just been entered into the other chipset's "write buffer"? (ie: cache thinks it's been written back, but it's not in RAM yet, so the MESI protocol is defeated? I have no idea if this is possible or not.. just a wild guess. if "lock cmpxchg" is truely atomic, then the problem you see should not be happening... I presume you have tried the motherboard on "maximum pessimitic settings"? Anyway, I've got a deadline in a few hours, I've already spent way too long on this.. :-] Cheers, -Peter