Date: Sat, 11 Nov 2006 08:53:41 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: Kip Macy <kip.macy@gmail.com> Cc: freebsd-current@freebsd.org Subject: Re: MUTEX_PROFILING option has been removed ... Message-ID: <20061111085252.N63959@fledge.watson.org> In-Reply-To: <b1fa29170611101947o24f802d8k7654153c8286ed23@mail.gmail.com> References: <b1fa29170611101947o24f802d8k7654153c8286ed23@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 10 Nov 2006, Kip Macy wrote: > and replaced by LOCK_PROFILING. > > - When LOCK_PROFILING is compiled in and enabled the kernel will now > profile hold times for all locks (spin mutex, blocking mutex, rwlock, sx > lock, and lockmgr). > - We now track the wait-to-acquire time, which I believe to be a more > useful metric of contention than hold time or number of times contested. > - The overhead of having LOCK_PROFILING compiled in but not enabled has > been reduced by moving large chunks of code out of line - on the T1 the > measured overhead is < 1%. > - There is no longer a single mutex for serializing updates to the > profiling hash - reducing the locking contention of measuring lock > contention. This sounds like really great work -- thanks for working on this! Robert N M Watson Computer Laboratory University of Cambridge > > Thanks to DES for the MUTEX_PROFILING implementation and Kris Kennaway for > many of the optimizations that made their way into this patch. > > Please report to me any issues caused by this change. I give some examples > of its immediate utility below: > > > I'm running a buildworld that isn't using all the system threads. I sorted > on the third column (maximum total wait) - the first is due to the idle > threads constantly trying to get work. The third and fourth are from make > using select. Looking at kern_select - one sees that it is clearly fairly > single-threaded. Oddly enough, makes Job.c already has support for kqueue, > but it isn't the default. I defined USE_KQUEUE and select went away as a > point of contention during builds. We see here that the page queue mutex is > a major point of contention. > > max total wait_total count avg wait_avg > cnt_hold cnt_lock name > 24 3566322 1311264691 1358360 2 965 6800925 > 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/kern_idle.c:121 > (sched lock) > > 29 1218447 414601055 172116 7 2408 533196 > 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/subr_sleepqueue.c:529 > (sched lock) > 2 3013 413907132 8359 0 49516 14603 > 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/sys_generic.c:812 > (sched lock) > > 1027 242236 413829518 14365 16 28808 4462 > 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/sys_generic.c:776 > (sellck) > > 1894753 787273038 55823553 726605 1083 76 0 > 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/vfs_default.c:263 > (nfs) > > 253 104799 8583672 204689 0 41 39672 > 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:844 > (vm page queue mutex) > 153 264890 3935024 227674 1 17 46885 > 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:902 > (vm page queue mutex) > > 316 3238931 2650089 227674 14 11 113827 > 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/sun4v/sun4v/pmap.c:956 > (vm page queue mutex) > 35 101146 1916077 82252 1 23 16275 > 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:342 > (vm page queue mutex) > > 4 106600 1665429 285490 0 5 475928 > 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/subr_sleepqueue.c:318 > (sched lock) > > > > Here we do a make -j32 of the kernel, so all cpu threads are in use (thus no > issues with the idle threads). The turnstile lock contention is likely a > result of all the cpu threads contending for the page queue mutex. This > could probably be improved by adaptively spinning if the current holder of > the mutex is running. Many of page queue mutex acquisitions are merely to > protect setting flags in an individual page. In the case of a 32 cpu system > having a lock per vm_page would probably be the way to go - however, this > would penalize systems with 4 and fewer cpus. Perhaps alc should look into > varying the granularity of locking as a function of the number of cpus. > max total wait_total count avg wait_avg cnt_hold > cnt_lock name > 5 7266196 206805560 7522452 0 27 > 48266619 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/subr_turnstile.c:487 > (turnstile chain) > 457 528521 180592127 550284 0 328 > 1469872 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:844 (vm page queue > mutex) > 15057461 1679582934 117520488 87978 19090 1335 > 0 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/ufs/ffs/ffs_vnops.c:366 (ufs) > 214 1076256 112489341 559032 1 201 > 1520471 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:902 (vm page queue > mutex) > 424 8250360 105249196 559031 14 188 > 1767340 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/sun4v/sun4v/pmap.c:956 (vm page > queue mutex) > 72563452121 218316084315 94216669 452713 482239 208 > 0 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/vfs_default.c:263 (nfs) > 23 1349030 14049785 280685 4 50 > 923679 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/kern_idle.c:121 (sched lock) > 73 214117 11078161 63944 3 173 > 2505 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/nfsclient/nfs_socket.c:1235 > (Giant) > 42 92768 10431233 40012 2 260 > 122966 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:342 (vm page queue > mutex) > 6429 2155581 6645086 18297 117 363 > 105550 0 > /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_object.c:651 (vm page queue > mutex) > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061111085252.N63959>