From owner-freebsd-smp@FreeBSD.ORG Mon Apr 17 05:16:12 2006 Return-Path: X-Original-To: smp@freebsd.org Delivered-To: freebsd-smp@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BD5F716A400; Mon, 17 Apr 2006 05:16:12 +0000 (UTC) (envelope-from matusita@violin.ocn.ne.jp) Received: from mv-osn-hcb003.ocn.ad.jp (mv-osn-hcb003.ocn.ad.jp [60.37.51.8]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8584F43D46; Mon, 17 Apr 2006 05:16:11 +0000 (GMT) (envelope-from matusita@violin.ocn.ne.jp) Received: from vcviolin.ocn.ne.jp (mv-osn-hcb003 [60.37.51.8]) by mv-osn-hcb003.ocn.ad.jp (Postfix) with ESMTP id B1B42341AD; Mon, 17 Apr 2006 14:16:10 +0900 (JST) Received: from violin.ocn.ne.jp (p1178-ipad68osakakita.osaka.ocn.ne.jp [221.185.124.178]) by vcviolin.ocn.ne.jp (Postfix) with ESMTP; Mon, 17 Apr 2006 14:16:10 +0900 (JST) Received: from syvalion.matatabi.or.jp (3k0jfth5rci9069j@localhost [127.0.0.1]) by syvalion.matatabi.or.jp (8.13.4/8.13.4) with ESMTP id k3H5G9hO026061 for ; Mon, 17 Apr 2006 14:16:09 +0900 (JST) (envelope-from owner-freebsd-current@freebsd.org) Received: (from cyrus@localhost) by syvalion.matatabi.or.jp (8.13.4/8.13.4/Submit) id k3H5G9Bj026060 for makoto.matusita.home@gmail.com; Mon, 17 Apr 2006 14:16:09 +0900 (JST) (envelope-from owner-freebsd-current@freebsd.org) X-Authentication-Warning: galtvalion.matatabi.or.jp: cyrus set sender to owner-freebsd-current@freebsd.org using -f Received: from galtvalion.matatabi.or.jp ([unix socket]) by galtvalion.matatabi.or.jp (Cyrus v2.3.1) with LMTPA; Mon, 17 Apr 2006 14:16:09 +0900 X-Sieve: CMU Sieve 2.3 Received: from localhost [::1] by galtvalion.matatabi.or.jp with POP3 (fetchmail-6.3.2) for (single-drop); Mon, 17 Apr 2006 14:16:09 +0900 (JST) Received: from castle.jp.FreeBSD.org (castle.jp.FreeBSD.org [210.226.20.15]) by ns.matatabi.or.jp (8.12.11.20060308/3.7W/MATATABI-1.0v9-NS1.2) with ESMTP id k3H5FLsg068105 for ; Mon, 17 Apr 2006 14:15:22 +0900 (JST) (envelope-from owner-freebsd-current@freebsd.org) Received: from mx2.freebsd.org (mx2.freebsd.org [216.136.204.119]) by castle.jp.FreeBSD.org (8.11.6p2+3.4W/8.11.3) with ESMTP/inet id k3H5FKQ21819 for ; Mon, 17 Apr 2006 14:15:20 +0900 (JST) (envelope-from owner-freebsd-current@freebsd.org) Received: from hub.freebsd.org (hub.freebsd.org [216.136.204.18]) by mx2.freebsd.org (Postfix) with ESMTP id 8AC5C583CC for ; Mon, 17 Apr 2006 05:14:53 +0000 (GMT) (envelope-from owner-freebsd-current@freebsd.org) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 98F5816A445; Mon, 17 Apr 2006 05:14:52 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) X-Original-To: current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9189416A408; Mon, 17 Apr 2006 05:14:42 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C57843D4C; Mon, 17 Apr 2006 05:14:41 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 2BB801A4E30; Sun, 16 Apr 2006 22:14:41 -0700 (PDT) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 720795169B; Mon, 17 Apr 2006 01:14:40 -0400 (EDT) Date: Mon, 17 Apr 2006 01:14:40 -0400 From: Kris Kennaway To: current@freebsd.org, smp@freebsd.org Message-ID: <20060417051440.GB60956@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="s/l3CgOIzMHHjg/5" Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-current@freebsd.org Errors-To: owner-freebsd-current@freebsd.org X-UIDL: [X@!!PWh"!-#p"!2/%"! X-Text-Classification: ok X-POPFile-Link: http://syvalion.matatabi.or.jp:11080/jump_to_message?view=475783 Cc: Subject: Anomalous performance increase from mutex profiling X-BeenThere: freebsd-smp@freebsd.org List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Apr 2006 05:16:12 -0000 --s/l3CgOIzMHHjg/5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I have been measuring performance with the supersmack benchmark on a 12-cpu sparc64 E4500. This is in the context of a patch I made to do exponential backoff during mutex spins, which has a large positive effect (~17%) in this case. However, even with CVS sources I found something very odd: when MUTEX_PROFILING is compiled in but not enabled (which usually still makes things slower), I see a clear 38% +/- 3% performance *increase* on this benchmark. The effects of MUTEX_PROFILING are (as far as I can see): - Increases size of struct mtx - Enables MUTEX_NOINLINE - Enables unconditional atomic_add_int at kern_mutex.c:484 (this may be unintentional; I think it should be conditional on mutex_prof_enable = 1) - Adds extra reads/writes of the struct mtx in the mutex lock/unlock ops - Adds a chunk of code to _mtx_unlock_flags But: - Removing the atomic_add_int from kern_mutex.c:484 seems to make it about 5% *slower*. I didn't measure this carefully yet though. When MUTEX_PROFILING is disabled: - MUTEX_NOINLINE doesn't help performance noticeably - Unconditionally changing size of struct mtx doesn't help - Compiling -O2 instead of -O doesn't help - Gutting the bulk of the mutex profiling code and leaving only the extra struct mtx accesses did not help - Here is the readelf output of the function table in kern_mutex.o in the non-mutex profiling case: Num: Value Size Type Bind Vis Ndx Name 9: 0000000000000900 404 FUNC LOCAL DEFAULT 1 db_show_mtx 18: 0000000000000000 68 FUNC GLOBAL DEFAULT 1 _mtx_lock_flags 19: 0000000000000220 420 FUNC GLOBAL DEFAULT 1 _mtx_lock_sleep 20: 0000000000000060 64 FUNC GLOBAL DEFAULT 1 _mtx_unlock_flags 21: 0000000000000500 508 FUNC GLOBAL DEFAULT 1 _mtx_unlock_sleep 22: 00000000000000a0 112 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin_flags 24: 00000000000003e0 236 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin 25: 0000000000000120 92 FUNC GLOBAL DEFAULT 1 _mtx_unlock_spin_flags 27: 0000000000000180 132 FUNC GLOBAL DEFAULT 1 _mtx_trylock 30: 00000000000004e0 8 FUNC GLOBAL DEFAULT 1 do_nothing 42: 0000000000000700 36 FUNC GLOBAL DEFAULT 1 mtx_sysinit 43: 0000000000000740 140 FUNC GLOBAL DEFAULT 1 mtx_init 45: 00000000000007e0 96 FUNC GLOBAL DEFAULT 1 mtx_destroy 48: 0000000000000840 188 FUNC GLOBAL DEFAULT 1 mutex_init and in the mutex profiling case: 9: 0000000000000ec0 404 FUNC LOCAL DEFAULT 1 db_show_mtx 44: 0000000000000000 40 FUNC LOCAL DEFAULT 1 nanoseconds 46: 0000000000000040 584 FUNC LOCAL DEFAULT 1 dump_mutex_prof_stats 51: 00000000000002a0 192 FUNC LOCAL DEFAULT 1 reset_mutex_prof_stats 68: 0000000000000ce0 36 FUNC GLOBAL DEFAULT 1 mtx_sysinit 69: 0000000000000dc0 96 FUNC GLOBAL DEFAULT 1 mtx_destroy 71: 0000000000000740 92 FUNC GLOBAL DEFAULT 1 _mtx_unlock_spin_flags 75: 00000000000006c0 112 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin_flags 81: 0000000000000360 132 FUNC GLOBAL DEFAULT 1 _mtx_lock_flags 82: 0000000000000840 484 FUNC GLOBAL DEFAULT 1 _mtx_lock_sleep 83: 0000000000000400 684 FUNC GLOBAL DEFAULT 1 _mtx_unlock_flags 86: 0000000000000b60 360 FUNC GLOBAL DEFAULT 1 _mtx_unlock_sleep 88: 0000000000000a40 236 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin 90: 00000000000007a0 132 FUNC GLOBAL DEFAULT 1 _mtx_trylock 93: 0000000000000b40 8 FUNC GLOBAL DEFAULT 1 do_nothing 105: 0000000000000d20 160 FUNC GLOBAL DEFAULT 1 mtx_init 109: 0000000000000e20 160 FUNC GLOBAL DEFAULT 1 mutex_init Most mutex operations grow a bit with MUTEX_PROFILING (except _mtx_unlock_sleep which shrinks because of the MUTEX_NOINLINE, see above), but the biggest growth is from the extra blob of unused (in this case) code in _mtx_unlock_flags(). This function is 0x100 aligned in the mutex profiling case. * I tried forcing 0x100 alignment on kern_mutex.o in the !mutex profiling case, which did not help (also _mtx_unlock_flags() should not be taking much CPU time; the bulk of it should be in the mutex spins since there are some very heavily contended mutexes) * Kernel profiling is apparently not supported on sparc, so I can't easily work out where it is spending time. * Our best guess is that mutex profiling is doing something that reduces contention on this very heavily contended mutex (unp), but I'd like to know what is happening precisely so I can maybe make use of it. Can anyone think of what may be happening that I've missed? Kris --s/l3CgOIzMHHjg/5 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEQyQ/Wry0BWjoQKURAv24AKC37QSA9t8kT53QwZAUj9oeFjwl5gCgodOY oDwnnMAMxSnVY3Nb+dYOdGE= =l38l -----END PGP SIGNATURE----- --s/l3CgOIzMHHjg/5--