Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Nov 2004 14:11:58 -0500
From:      Stephan Uphoff <ups@tree.com>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/i386/i386 pmap.c
Message-ID:  <1100027518.29384.87.camel@palm.tree.com>
In-Reply-To: <Pine.NEB.3.96L.1041109130229.73102V-100000@fledge.watson.org>

index | next in thread | previous in thread | raw e-mail

On Tue, 2004-11-09 at 08:03, Robert Watson wrote:
> On Tue, 9 Nov 2004, Robert Watson wrote:
> 
> > > I've tried changing the store_rel() to just do a simple store since writes are 
> > > ordered on x86, but benchmarks on SMP showed that it actually hurt.  However, 
> > > it would probably be good to at least do that for UP.  The current patch to 
> > > do it for all kernels is:
> 
> Interestingly, I've now run through some more "macro" benchmarks.  I saw a
> couple of percent improvement on UP from the change, but indeed, I saw a
> slight decrease in performance for the rapid packet send benchmark on SMP. 
> 
> So I guess my recommendation is to get this in the tree for UP, and see if
> we can figure out why it's having the slow-down effect on SMP.

We are probably talking cache line effects here.
My guess is that we should:

1) Make sure that important spin mutexes are alone in a cache line.
2) Take care not to dirty the cache line unnecessarily.

I think for 2 we need to change the spin mutex slightly (for SMP) to
never call LOCK cmpxchgl before a simple load operation finds
m->mtx_lock == MTX_UNOWNED since LOCK cmpxchgl always seems to dirty the
cache line.

I have a dual Xeon (p4) where I can run some tests. Please let me know
if there are any tests that you can recommend - I don't want to reinvent
the wheel here.
Interestingly enough the linux spin locks implementation is mentioning
some PPRO errata that seem to require a locked operation.
Guess that means we should take a look at the errata of all SMP able
processors out there :-(
Intel also recommends a locked operation (or SFENCE) for future
processors.
Guess this means either non optimal code, lots of compile options or
self modifying code.

	Stephan





home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1100027518.29384.87.camel>