Date: Mon, 17 Apr 2006 01:14:40 -0400 From: Kris Kennaway <kris@obsecurity.org> To: current@FreeBSD.org, smp@FreeBSD.org Subject: Anomalous performance increase from mutex profiling Message-ID: <20060417051440.GB60956@xor.obsecurity.org>
next in thread | raw e-mail | index | archive | help
--s/l3CgOIzMHHjg/5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I have been measuring performance with the supersmack benchmark on a 12-cpu sparc64 E4500. This is in the context of a patch I made to do exponential backoff during mutex spins, which has a large positive effect (~17%) in this case. However, even with CVS sources I found something very odd: when MUTEX_PROFILING is compiled in but not enabled (which usually still makes things slower), I see a clear 38% +/- 3% performance *increase* on this benchmark. The effects of MUTEX_PROFILING are (as far as I can see): - Increases size of struct mtx - Enables MUTEX_NOINLINE - Enables unconditional atomic_add_int at kern_mutex.c:484 (this may be unintentional; I think it should be conditional on mutex_prof_enable = 1) - Adds extra reads/writes of the struct mtx in the mutex lock/unlock ops - Adds a chunk of code to _mtx_unlock_flags But: - Removing the atomic_add_int from kern_mutex.c:484 seems to make it about 5% *slower*. I didn't measure this carefully yet though. When MUTEX_PROFILING is disabled: - MUTEX_NOINLINE doesn't help performance noticeably - Unconditionally changing size of struct mtx doesn't help - Compiling -O2 instead of -O doesn't help - Gutting the bulk of the mutex profiling code and leaving only the extra struct mtx accesses did not help - Here is the readelf output of the function table in kern_mutex.o in the non-mutex profiling case: Num: Value Size Type Bind Vis Ndx Name 9: 0000000000000900 404 FUNC LOCAL DEFAULT 1 db_show_mtx 18: 0000000000000000 68 FUNC GLOBAL DEFAULT 1 _mtx_lock_flags 19: 0000000000000220 420 FUNC GLOBAL DEFAULT 1 _mtx_lock_sleep 20: 0000000000000060 64 FUNC GLOBAL DEFAULT 1 _mtx_unlock_flags 21: 0000000000000500 508 FUNC GLOBAL DEFAULT 1 _mtx_unlock_sleep 22: 00000000000000a0 112 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin_flags 24: 00000000000003e0 236 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin 25: 0000000000000120 92 FUNC GLOBAL DEFAULT 1 _mtx_unlock_spin_flags 27: 0000000000000180 132 FUNC GLOBAL DEFAULT 1 _mtx_trylock 30: 00000000000004e0 8 FUNC GLOBAL DEFAULT 1 do_nothing 42: 0000000000000700 36 FUNC GLOBAL DEFAULT 1 mtx_sysinit 43: 0000000000000740 140 FUNC GLOBAL DEFAULT 1 mtx_init 45: 00000000000007e0 96 FUNC GLOBAL DEFAULT 1 mtx_destroy 48: 0000000000000840 188 FUNC GLOBAL DEFAULT 1 mutex_init and in the mutex profiling case: 9: 0000000000000ec0 404 FUNC LOCAL DEFAULT 1 db_show_mtx 44: 0000000000000000 40 FUNC LOCAL DEFAULT 1 nanoseconds 46: 0000000000000040 584 FUNC LOCAL DEFAULT 1 dump_mutex_prof_stats 51: 00000000000002a0 192 FUNC LOCAL DEFAULT 1 reset_mutex_prof_stats 68: 0000000000000ce0 36 FUNC GLOBAL DEFAULT 1 mtx_sysinit 69: 0000000000000dc0 96 FUNC GLOBAL DEFAULT 1 mtx_destroy 71: 0000000000000740 92 FUNC GLOBAL DEFAULT 1 _mtx_unlock_spin_flags 75: 00000000000006c0 112 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin_flags 81: 0000000000000360 132 FUNC GLOBAL DEFAULT 1 _mtx_lock_flags 82: 0000000000000840 484 FUNC GLOBAL DEFAULT 1 _mtx_lock_sleep 83: 0000000000000400 684 FUNC GLOBAL DEFAULT 1 _mtx_unlock_flags 86: 0000000000000b60 360 FUNC GLOBAL DEFAULT 1 _mtx_unlock_sleep 88: 0000000000000a40 236 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin 90: 00000000000007a0 132 FUNC GLOBAL DEFAULT 1 _mtx_trylock 93: 0000000000000b40 8 FUNC GLOBAL DEFAULT 1 do_nothing 105: 0000000000000d20 160 FUNC GLOBAL DEFAULT 1 mtx_init 109: 0000000000000e20 160 FUNC GLOBAL DEFAULT 1 mutex_init Most mutex operations grow a bit with MUTEX_PROFILING (except _mtx_unlock_sleep which shrinks because of the MUTEX_NOINLINE, see above), but the biggest growth is from the extra blob of unused (in this case) code in _mtx_unlock_flags(). This function is 0x100 aligned in the mutex profiling case. * I tried forcing 0x100 alignment on kern_mutex.o in the !mutex profiling case, which did not help (also _mtx_unlock_flags() should not be taking much CPU time; the bulk of it should be in the mutex spins since there are some very heavily contended mutexes) * Kernel profiling is apparently not supported on sparc, so I can't easily work out where it is spending time. * Our best guess is that mutex profiling is doing something that reduces contention on this very heavily contended mutex (unp), but I'd like to know what is happening precisely so I can maybe make use of it. Can anyone think of what may be happening that I've missed? Kris --s/l3CgOIzMHHjg/5 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEQyQ/Wry0BWjoQKURAv24AKC37QSA9t8kT53QwZAUj9oeFjwl5gCgodOY oDwnnMAMxSnVY3Nb+dYOdGE= =l38l -----END PGP SIGNATURE----- --s/l3CgOIzMHHjg/5--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060417051440.GB60956>