Date: Wed, 23 May 2007 19:53:50 -0400 From: Kris Kennaway <kris@obsecurity.org> To: Jeff Roberson <jroberson@chesapeake.net> Cc: arch@freebsd.org Subject: Re: sched_lock && thread_lock() Message-ID: <20070523235349.GA66762@xor.obsecurity.org> In-Reply-To: <20070523155236.U9443@10.0.0.1> References: <20070520155103.K632@10.0.0.1> <20070523155236.U9443@10.0.0.1>
next in thread | previous in thread | raw e-mail | index | archive | help
--vtzGhvizbBRQ85DL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 23, 2007 at 03:56:35PM -0700, Jeff Roberson wrote: > Resuming the original intent of this thread; >=20 > http://www.chesapeake.net/~jroberson/threadlock.diff >=20 > I have updated this patch to the most recent current. I have included a= =20 > scheduler called sched_smp.c that is a copy of ULE using per-cpu schedule= r=20 > spinlocks. There are also changes to be slightly more agressive with=20 > updating the td_lock pointer when it has been blocked. I have not yet found an application benchmark that really demonstrates this (e.g. the SQL benchmark is now entirely bottlenecked by the global select lock), but on a microbenchmark designed to specifically test scheduler performance (sysbench --test=3Dthreads) this gives dramatic results on an 8-core opteron. -- This test mode was written to benchmark scheduler performance, more specifically the cases when a scheduler has a large number of threads competing for some set of mutexes. SysBench creates a specified number of threads and a specified number of mutexes. Then each thread starts running the requests consisting of locking the mutex, yielding the CPU, so the thread is placed in the run queue by the scheduler, then unlocking the mutex when the thread is rescheduled back to execution. For each request, the above actions are run several times in a loop, so the more iterations is performed, the more concurrency is placed on each mutex. -- With the threadlock.diff changes and sched_smp there is a factor of 3.8 performance improvement compared to sched_ule. 4BSD actually performs 30% better than ULE on this microbenchmark (it has been much slower on all the application benchmarks I've done on this system), but is still a factor of 2.8 slower than sched_smp. Indeed, profiling confirms that with ULE and 4BSD the global sched_lock is the only relevant lock, and is heavily contended. This contention largely goes away with the per-cpu scheduler locks in sched_smp (but there is still some contention). Profiling indicates there might be further scope to as much as double performance of this benchmark by improving the load balancing and other architectural changes (system is about 50% idle still). I am hoping to see some real application benchmark improvements on sun4v when Kip gets it up and running again (should be soon), since last time we looked the global sched_lock was a dominant effect there. Kris --vtzGhvizbBRQ85DL Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFGVNQNWry0BWjoQKURAlVVAKDzB/DTbAQh5X6fbaN7JwdRjtH+lwCgj6uq PM/rKM5sVT3cEZwABTUgF8w= =whQ2 -----END PGP SIGNATURE----- --vtzGhvizbBRQ85DL--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070523235349.GA66762>