From owner-cvs-src@FreeBSD.ORG Tue Nov 9 19:12:02 2004 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3B92516A4CE for ; Tue, 9 Nov 2004 19:12:02 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id BC4FE43D55 for ; Tue, 9 Nov 2004 19:12:01 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 22368 invoked by uid 89); 9 Nov 2004 19:11:58 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 9 Nov 2004 19:11:58 -0000 Received: (qmail 22351 invoked by uid 89); 9 Nov 2004 19:11:58 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 9 Nov 2004 19:11:58 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id iA9JBw5R030417; Tue, 9 Nov 2004 14:11:58 -0500 (EST) (envelope-from ups@tree.com) From: Stephan Uphoff To: Robert Watson In-Reply-To: References: Content-Type: text/plain Message-Id: <1100027518.29384.87.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Tue, 09 Nov 2004 14:11:58 -0500 Content-Transfer-Encoding: 7bit cc: src-committers@FreeBSD.org cc: John Baldwin cc: Alan Cox cc: cvs-src@FreeBSD.org cc: Mike Silbersack cc: cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/i386/i386 pmap.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Nov 2004 19:12:02 -0000 On Tue, 2004-11-09 at 08:03, Robert Watson wrote: > On Tue, 9 Nov 2004, Robert Watson wrote: > > > > I've tried changing the store_rel() to just do a simple store since writes are > > > ordered on x86, but benchmarks on SMP showed that it actually hurt. However, > > > it would probably be good to at least do that for UP. The current patch to > > > do it for all kernels is: > > Interestingly, I've now run through some more "macro" benchmarks. I saw a > couple of percent improvement on UP from the change, but indeed, I saw a > slight decrease in performance for the rapid packet send benchmark on SMP. > > So I guess my recommendation is to get this in the tree for UP, and see if > we can figure out why it's having the slow-down effect on SMP. We are probably talking cache line effects here. My guess is that we should: 1) Make sure that important spin mutexes are alone in a cache line. 2) Take care not to dirty the cache line unnecessarily. I think for 2 we need to change the spin mutex slightly (for SMP) to never call LOCK cmpxchgl before a simple load operation finds m->mtx_lock == MTX_UNOWNED since LOCK cmpxchgl always seems to dirty the cache line. I have a dual Xeon (p4) where I can run some tests. Please let me know if there are any tests that you can recommend - I don't want to reinvent the wheel here. Interestingly enough the linux spin locks implementation is mentioning some PPRO errata that seem to require a locked operation. Guess that means we should take a look at the errata of all SMP able processors out there :-( Intel also recommends a locked operation (or SFENCE) for future processors. Guess this means either non optimal code, lots of compile options or self modifying code. Stephan