Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Nov 2004 14:11:58 -0500
From:      Stephan Uphoff <ups@tree.com>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/i386/i386 pmap.c
Message-ID:  <1100027518.29384.87.camel@palm.tree.com>
In-Reply-To: <Pine.NEB.3.96L.1041109130229.73102V-100000@fledge.watson.org>
References:  <Pine.NEB.3.96L.1041109130229.73102V-100000@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2004-11-09 at 08:03, Robert Watson wrote:
> On Tue, 9 Nov 2004, Robert Watson wrote:
> 
> > > I've tried changing the store_rel() to just do a simple store since writes are 
> > > ordered on x86, but benchmarks on SMP showed that it actually hurt.  However, 
> > > it would probably be good to at least do that for UP.  The current patch to 
> > > do it for all kernels is:
> 
> Interestingly, I've now run through some more "macro" benchmarks.  I saw a
> couple of percent improvement on UP from the change, but indeed, I saw a
> slight decrease in performance for the rapid packet send benchmark on SMP. 
> 
> So I guess my recommendation is to get this in the tree for UP, and see if
> we can figure out why it's having the slow-down effect on SMP.

We are probably talking cache line effects here.
My guess is that we should:

1) Make sure that important spin mutexes are alone in a cache line.
2) Take care not to dirty the cache line unnecessarily.

I think for 2 we need to change the spin mutex slightly (for SMP) to
never call LOCK cmpxchgl before a simple load operation finds
m->mtx_lock == MTX_UNOWNED since LOCK cmpxchgl always seems to dirty the
cache line.

I have a dual Xeon (p4) where I can run some tests. Please let me know
if there are any tests that you can recommend - I don't want to reinvent
the wheel here.
Interestingly enough the linux spin locks implementation is mentioning
some PPRO errata that seem to require a locked operation.
Guess that means we should take a look at the errata of all SMP able
processors out there :-(
Intel also recommends a locked operation (or SFENCE) for future
processors.
Guess this means either non optimal code, lots of compile options or
self modifying code.

	Stephan






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1100027518.29384.87.camel>