Date: Thu, 6 May 2004 14:11:27 -0400 From: John Baldwin <jhb@FreeBSD.org> To: freebsd-current@FreeBSD.org Cc: 'Andrew Gallatin' <gallatin@cs.duke.edu> Subject: Re: 4.7 vs 5.2.1 SMP/UP bridging performance Message-ID: <200405061411.27216.jhb@FreeBSD.org> In-Reply-To: <20040506184749.R19447@gamplex.bde.org> References: <FE045D4D9F7AED4CBFF1B3B813C85337021AB38C@mail.sandvine.com> <20040506184749.R19447@gamplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 06 May 2004 06:18 am, Bruce Evans wrote: > On Wed, 5 May 2004, Gerrit Nagelhout wrote: > > Andrew Gallatin wrote: > > > If its really safe to remove the xchg* from non-SMP atomic_store_rel*, > > > then I think you should do it. Of course, that still leaves mutexes > > > as very expensive on SMP (253 cycles on the 2.53GHz from above). > > See my other reply [1 memory barrier but not 2 seems to be needed for > each lock/unlock pair in the !SMP case, and the xchgl accidentally (?) > provides it; perhaps [lms]fence would give a faster memory barrier]. > More ideas on this: > - compilers should probably now generate memory barrier instructions foe > volatile variables (so volatile variables would be even slower :-). I > haven't seen gcc on i386's do this. > - jhb once tried changing mtx_lolock_spin(mtx)/mtx_unlock_spin(mtx) to > crticial_enter()/critical_exit(). This didn't work because it broke > mtx_assert(). It might also not work because it removes the memory > barrier. criticial_enter() only has the very weak memory barrier in > disable_intr() on i386's. That was only for the UP case, in which case you don't need the membar's. A single CPU always consistently sees what it has written. The only case when it doesn't is for memory that can be written to by device DMA, and that doesn't apply to kernel data structures, esp. not to ones for scheduling, etc. I actually have (untested) patches in the smpng branch to remove the one use of mtx_owned() (mtx_assert is not as big of a deal, that one can work fine by checking td_critnest) on sched_lock (the TSS munging code). The problem with the [lms]fence instructions is that sfence is only one PIII+, and lfence is only on PIV+. I don't recall when mfence first appeared.. perhaps PII? If the lock is really expensive, then perhaps we could make atomic_cmpset() be actual functions (ugh) rather than inlines that did a branch to use foofence for PIV rather than the default. The branches would suck, but it might be faster than the lock. Of course, this would greatly pessimize non-PIV. -- John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200405061411.27216.jhb>