From owner-cvs-all@FreeBSD.ORG Tue Nov 9 18:58:03 2004 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 16F9D16A4CE; Tue, 9 Nov 2004 18:58:03 +0000 (GMT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id BE08B43D48; Tue, 9 Nov 2004 18:58:02 +0000 (GMT) (envelope-from peter@wemm.org) Received: from fw.wemm.org (canning.wemm.org [192.203.228.65]) by canning.wemm.org (Postfix) with ESMTP id A0D7C2A8D5; Tue, 9 Nov 2004 10:58:02 -0800 (PST) (envelope-from peter@wemm.org) Received: from overcee.wemm.org (overcee.wemm.org [10.0.0.3]) by fw.wemm.org (Postfix) with ESMTP id 39762E2B5; Tue, 9 Nov 2004 10:58:02 -0800 (PST) (envelope-from peter@wemm.org) Received: from overcee.wemm.org (localhost [127.0.0.1]) by overcee.wemm.org (8.13.1/8.13.1) with ESMTP id iA9Iw16m011688; Tue, 9 Nov 2004 10:58:01 -0800 (PST) (envelope-from peter@wemm.org) Received: from localhost (localhost [[UNIX: localhost]]) by overcee.wemm.org (8.13.1/8.13.1/Submit) id iA9Ivtb1011684; Tue, 9 Nov 2004 10:57:55 -0800 (PST) (envelope-from peter@wemm.org) X-Authentication-Warning: overcee.wemm.org: peter set sender to peter@wemm.org using -f From: Peter Wemm To: Stephan Uphoff Date: Tue, 9 Nov 2004 10:57:54 -0800 User-Agent: KMail/1.7 References: <4191062A.6090009@elischer.org> <1100024464.29384.30.camel@palm.tree.com> In-Reply-To: <1100024464.29384.30.camel@palm.tree.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200411091057.54867.peter@wemm.org> cc: src-committers@freebsd.org cc: John Baldwin cc: Alan Cox cc: cvs-src@freebsd.org cc: Mike Silbersack cc: cvs-all@freebsd.org cc: Robert Watson cc: Julian Elischer Subject: Re: cvs commit: src/sys/i386/i386 pmap.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Nov 2004 18:58:03 -0000 On Tuesday 09 November 2004 10:21 am, Stephan Uphoff wrote: > On Tue, 2004-11-09 at 13:02, Julian Elischer wrote: > > Robert Watson wrote: > > >This change made a large difference, and eliminates the > > > unexplained costs. Here's a revised table as compared to the > > > above: > > > > > > sleep mutex crit section spin mutex new spin mutex > > > UP SMP UP SMP UP SMP UP SMP > > >PIII 21 81 83 81 112 141 95 141 > > >P4 39 260 120 119 274 342 132 231 > > > > > >So it basically cut 140 cycles off the P4 UP spin lock, 15 off the > > > PIII UP spin lock, and 110 cycles off the P4 SMP spin lock. The > > > PIII SMP spin lock looks the same. Keep in mind that all of > > > these measurements have a standard deviation of between 0 and 3 > > > cycles, most in the 1 range. Also keep in mind that these are > > > entirely uncontended measurements. > > > > > >Assuming that these changes are correct, and pass whatever tests > > > people have in mind, this would be a very strong merge candidate > > > for performance reasons. The difference is visible in packet > > > send tests from user space as a percentage or two improvement on > > > UP on my P4, although it's a litte hard to tell due to the noise. > > > > Can you explain why a spin mutex is more expensive than a sleep > > mutex (I assume this is uncontested)? > > cli() and sti() used for the critical section are expensive. ... on INTEL cpus! Don't make the mistake of assuming that all x86 cpus are as slow as Intel's P4 family on this stuff. Other cpus don't have the same massive microcode penalty. My recollection is that athlon (and athlon64 cpus in 32 bit mode) take about 8-12 clocks to do a cli or sti, compared to 300+ for a P4 cpu. And things like 50-90 clocks for an invlpg vs 1200-1600 clocks for a P4. Please don't accidently penalize those of us with cpus that were designed for good all-round performance. The P4 family was designed for games and 3d graphics, not all-round performance. (This isn't aimed at anybody in particular.. I just wanted to remind people that the P4 code is a particularly pathological case (and the writing is on the wall for that core). Other cpus, including intel's newer non-P4 cores, dont have the same pathological problems.) -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5