Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Nov 2006 19:47:54 -0800
From:      "Kip Macy" <kip.macy@gmail.com>
To:        freebsd-current@freebsd.org
Subject:   MUTEX_PROFILING option has been removed ...
Message-ID:  <b1fa29170611101947o24f802d8k7654153c8286ed23@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
and replaced by LOCK_PROFILING.

 - When LOCK_PROFILING is compiled in and enabled the kernel will now
profile hold times for all locks (spin mutex, blocking mutex, rwlock, sx
lock, and lockmgr).
 - We now track the wait-to-acquire time, which I believe to be a more
useful metric of contention than hold time or number of times contested.
 - The overhead of having LOCK_PROFILING compiled in but not enabled has
been reduced by moving large chunks of code out of line - on the T1 the
measured overhead is < 1%.
 - There is no longer a single mutex for serializing updates to the
profiling hash - reducing the locking contention of measuring lock
contention.

Thanks to DES for the MUTEX_PROFILING implementation and Kris Kennaway for
many of the optimizations that made their way into this patch.

Please report to me any issues caused by this change. I give some examples
of its immediate utility below:


I'm running a buildworld that isn't using all the system threads. I sorted
on the third column (maximum total wait) - the first is due to the idle
threads constantly trying to get work. The third and fourth are from make
using select. Looking at kern_select - one sees that it is clearly fairly
single-threaded. Oddly enough, makes Job.c already has support for kqueue,
but it isn't the default. I defined USE_KQUEUE and select went away as a
point of contention during builds. We see here that the page queue mutex is
a major point of contention.

   max        total   wait_total       count   avg wait_avg
cnt_hold     cnt_lock name
    24      3566322   1311264691     1358360     2   965      6800925
          0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/kern_idle.c:121
(sched lock)

    29      1218447    414601055      172116     7  2408       533196
          0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/subr_sleepqueue.c:529
(sched lock)
     2         3013    413907132        8359     0 49516        14603
          0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/sys_generic.c:812
(sched lock)

  1027       242236    413829518       14365    16 28808         4462
          0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/sys_generic.c:776
(sellck)

1894753    787273038     55823553      726605  1083    76            0
           0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/vfs_default.c:263
(nfs)

   253       104799      8583672      204689     0    41        39672
          0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:844
(vm page queue mutex)
   153       264890      3935024      227674     1    17        46885
          0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:902
(vm page queue mutex)

   316      3238931      2650089      227674    14    11       113827
          0 /usr/flatstor/shared/freebsd/kmacy/src/sys/sun4v/sun4v/pmap.c:956
(vm page queue mutex)
    35       101146      1916077       82252     1    23        16275
          0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:342
(vm page queue mutex)

     4       106600      1665429      285490     0     5       475928
          0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/subr_sleepqueue.c:318
(sched lock)



Here we do a make -j32 of the kernel, so all cpu threads are in use (thus no
issues with the idle threads). The turnstile lock contention is likely a
result of all the cpu threads contending for the page queue mutex. This
could probably be improved by adaptively spinning if the current holder of
the mutex is running.  Many of page queue mutex acquisitions are merely to
protect setting flags in an individual page. In the case of a 32 cpu system
having a lock per vm_page would probably be the way to go - however, this
would penalize systems with 4 and fewer cpus. Perhaps alc should look into
varying the granularity of locking as a function of the number of cpus.
   max        total   wait_total       count   avg wait_avg     cnt_hold
cnt_lock name
     5      7266196    206805560     7522452     0    27
48266619            0
/usr/flatstor/shared/freebsd/kmacy/src/sys/kern/subr_turnstile.c:487
(turnstile chain)
   457       528521    180592127      550284     0   328
1469872            0
/usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:844 (vm page queue
mutex)
15057461   1679582934    117520488       87978 19090  1335
0            0
/usr/flatstor/shared/freebsd/kmacy/src/sys/ufs/ffs/ffs_vnops.c:366 (ufs)
   214      1076256    112489341      559032     1   201
1520471            0
/usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:902 (vm page queue
mutex)
   424      8250360    105249196      559031    14   188
1767340            0
/usr/flatstor/shared/freebsd/kmacy/src/sys/sun4v/sun4v/pmap.c:956 (vm page
queue mutex)
72563452121 218316084315     94216669      452713 482239   208
0            0
/usr/flatstor/shared/freebsd/kmacy/src/sys/kern/vfs_default.c:263 (nfs)
    23      1349030     14049785      280685     4    50
923679            0
/usr/flatstor/shared/freebsd/kmacy/src/sys/kern/kern_idle.c:121 (sched lock)
    73       214117     11078161       63944     3   173
2505            0
/usr/flatstor/shared/freebsd/kmacy/src/sys/nfsclient/nfs_socket.c:1235
(Giant)
    42        92768     10431233       40012     2   260
122966            0
/usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:342 (vm page queue
mutex)
  6429      2155581      6645086       18297   117   363
105550            0
/usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_object.c:651 (vm page queue
mutex)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b1fa29170611101947o24f802d8k7654153c8286ed23>