Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 May 2004 17:23:30 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Gerrit Nagelhout <gnagelhout@sandvine.com>
Subject:   RE: 4.7 vs 5.2.1 SMP/UP bridging performance
Message-ID:  <16537.23378.375946.857908@grasshopper.cs.duke.edu>
In-Reply-To: <20040505222636.H15444@gamplex.bde.org>
References:  <FE045D4D9F7AED4CBFF1B3B813C85337021AB377@mail.sandvine.com> <20040505222636.H15444@gamplex.bde.org>

index | next in thread | previous in thread | raw e-mail


Bruce Evans writes:

 > 
 > Athlon XP2600 UP system:  !SMP case: 22 cycles   SMP case: 37 cycles
 > Celeron 366 SMP system:              35                    48
 > 
 > The extra cycles for the SMP case are just the extra cost of a one lock
 > instruction.  Note that SMP should cost twice as much extra, but the
 > non-SMP atomic_store_rel_int(&slock, 0) is pessimized by using xchgl
 > which always locks the bus.  After fixing this:
 > 
 > Athlon XP2600 UP system:  !SMP case:  6 cycles   SMP case: 37 cycles
 > Celeron 366 SMP system:              10                    48
 > 
 > Mutexes take longer than simple locks, but not much longer unless the
 > lock is contested.  In particular, they don't lock the bus any more
 > and the extra cycles for locking dominate (even in the !SMP case due
 > to the pessimization).
 > 
 > So there seems to be something wrong with your benchmark.  Locking the
 > bus for the SMP case always costs about 20+ cycles, but this hasn't
 > changed since RELENG_4 and mutexes can't be made much faster in the
 > uncontested case since their overhead is dominated by the bus lock
 > time.
 > 

Actually, I think his tests are accurate and bus locked instructions
take an eternity on P4.  See
http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html 

For example, with your test above, I see 212 cycles for the UP case on
a 2.53GHz P4.  Replacing the atomic_store_rel_int(&slock, 0) with a
simple slock = 0; reduces that count to 18 cycles.

If its really safe to remove the xchg* from non-SMP atomic_store_rel*,
then I think you should do it.  Of course, that still leaves mutexes
as very expensive on SMP (253 cycles on the 2.53GHz from above).

Drew


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?16537.23378.375946.857908>