Date: Thu, 12 Mar 2015 18:13:00 -0500 From: Alan Cox <alan.l.cox@gmail.com> To: Mateusz Guzik <mjguzik@gmail.com>, Ryan Stone <rysto32@gmail.com>, FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: [PATCH] Convert the VFS cache lock to an rmlock Message-ID: <CAJUyCcMoRu7JMCWfYb3acBF=fNopKAV4Ge8-mhApjuJ7ujOqFg@mail.gmail.com> In-Reply-To: <20150312173635.GB9153@dft-labs.eu> References: <CAFMmRNysnUezX9ozGrCpivPCTMYRJtoxm9ijR0yQO03LpXnwBQ@mail.gmail.com> <20150312173635.GB9153@dft-labs.eu>
next in thread | previous in thread | raw e-mail | index | archive | help
Below is partial results from a profile of a parallel (-j7) "buildworld" on a 6-core machine that I did after the introduction of pmap_advise, so this is not a new profile. The results are sorted by total waiting time and only the top 20 entries are listed. max wait_max total wait_total count avg wait_avg cnt_hold cnt_lock name 1027 208500 16292932 1658585700 5297163 3 313 0 3313855 kern/vfs_cache.c:629 (rw:Name Cache) 208564 186514 19080891106 1129189627 355575930 53 3 0 1323051 kern/vfs_subr.c:2099 (lockmgr:ufs) 169241 148057 193721142 419075449 13819553 14 30 0 110089 kern/vfs_subr.c:2210 (lockmgr:ufs) 187092 191775 1923061952 257319238 328416784 5 0 0 5106537 kern/vfs_cache.c:488 (rw:Name Cache) 23 114 134681925 220476269 40747213 3 5 0 25679721 kern/kern_clocksource.c:233 (spin mutex:et_hw_mtx) 39069 101543 1931226072 208764524 482193429 4 0 0 22375691 kern/vfs_subr.c:2177 (sleep mutex:vnode interlock) 187131 187056 2509403648 140794697 298324050 8 0 0 14386756 kern/vfs_cache.c:669 (sleep mutex:vnode interlock) 1421 257059 260943147 139520512 104936165 2 1 0 12997640 vm/vm_page.c:1225 (sleep mutex:vm page free queue) 39612 145747 371125327 121005252 136149528 2 0 0 8280782 kern/vfs_subr.c:2134 (sleep mutex:vnode interlock) 1720 249735 226621512 91906907 93436933 2 0 0 7092634 vm/vm_page.c:1770 (sleep mutex:vm active pagequeue) 394155 394200 330090749 86368442 48766123 6 1 0 1169061 kern/vfs_hash.c:78 (sleep mutex:vfs hash) 892 93103 3446633 75923096 1482518 2 51 0 236865 kern/vfs_cache.c:799 (rw:Name Cache) 4030 394151 395521192 63355061 47860319 8 1 0 6439221 kern/vfs_hash.c:86 (sleep mutex:vnode interlock) 4554 147798 247338596 56263926 104192514 2 0 0 9455460 vm/vm_page.c:1948 (sleep mutex:vm page free queue) 2587 230069 219652081 48271335 94011085 2 0 0 9011261 vm/vm_page.c:1729 (sleep mutex:vm active pagequeue) 16420 50195 920083075 38568487 347596869 2 0 0 3035672 kern/vfs_subr.c:2107 (sleep mutex:vnode interlock) 57348 93913 65957615 31956482 2487620 26 12 0 39048 vm/vm_fault.c:672 (rw:vm object) 1798 93694 127847964 28490515 46510308 2 0 0 1897724 kern/vfs_subr.c:419 (sleep mutex:struct mount mtx) 249739 207227 775356648 25501046 95007901 8 0 0 211559 vm/vm_fault.c:918 (sleep mutex:vm page) 452130 157222 70439287 18564724 5429942 12 3 0 10813 vm/vm_map.c:2738 (rw:vm object) On Thu, Mar 12, 2015 at 12:36 PM, Mateusz Guzik <mjguzik@gmail.com> wrote: > On Thu, Mar 12, 2015 at 11:14:42AM -0400, Ryan Stone wrote: > > I've just submitted a patch to Differential[1] for review that converts > the > > VFS cache to use an rmlock in place of the current rwlock. My main > > motivation for the change is to fix a priority inversion problem that I > saw > > recently. A real-time priority thread attempted to acquire a write lock > on > > the VFS cache lock, but there was already a reader holding it. The > reader > > was preempted by a normal priority thread, and my real-time thread was > > starved. > > > > [1] https://reviews.freebsd.org/D2051 > > > > > > I was worried about the performance implications of the change, as I > wasn't > > sure how common write operations on the VFS cache would be. I did a -j12 > > buildworld/buildkernel test on a 12-core Haswell Xeon system, as I > figured > > that would be a reasonable stress test that simultaneously creates lots > of > > small files and reads a lot of files as well. This actually wound up > being > > about a 10% performance *increase* (the units below are seconds of > elapsed > > time as measured by /usr/bin/time, so smaller is better): > > > > $ ministat -C 1 orig.log rmlock.log > > x orig.log > > + rmlock.log > > > +------------------------------------------------------------------------------+ > > | + > x > > | > > |++++ x x > xxx > > | > > | |A| > > |_________A___M____|| > > > +------------------------------------------------------------------------------+ > > N Min Max Median Avg > Stddev > > x 6 2710.31 2821.35 2816.75 2798.0617 > 43.324817 > > + 5 2488.25 2500.25 2498.04 2495.756 > 5.0494782 > > Difference at 95.0% confidence > > -302.306 +/- 44.4709 > > -10.8041% +/- 1.58935% > > (Student's t, pooled s = 32.4674) > > > > The one outlier in the rwlock case does confuse me a bit. What I did was > > booted a freshly-built image with the rmlock lock applied, did a git > > checkout of head, and then did 5 builds in a row. The git checkout > should > > have had the effect of priming the disk cache with the source files. > Then > > I installed the stock head kernel, rebooted, and ran 5 more builds (and > > then 1 more when I noticed the outlier). The fast outlier was the > *first* > > run, which should have been running with a cold disk cache, so I really > > don't know why it would be 90 seconds faster. I do see that this run > also > > had about 500-600 fewer seconds spent in system time: > > > > x orig.log > > > +------------------------------------------------------------------------------+ > > | > > x | > > |x x x > > xx | > > | > > |_________________________A__________M_____________|| > > > +------------------------------------------------------------------------------+ > > N Min Max Median Avg > Stddev > > x 6 3515.23 4121.84 4105.57 4001.71 > 239.61362 > > > > I'm not sure how much that I care, given that the rmlock is universally > > faster (but maybe I should try the "cold boot" case anyway). > > > > If anybody had any comments or further testing that they would like to > see, > > please let me know. > > Workloads like buildworld and the like (i.e. a lot of forks + execs) run > into very severe contention in vm, which is orders of magnitude bigger > than anything else. > > As such your result seems quite suspicious. > > Can you describe in more detail how were you testing? > > Did you have a separate fs for obj tree which was mounted+unmounted > before each run? > > I suggest you grab a machine from zoo[1] and run some tests on "bigger" > hardware. > > A perf improvement, even slight, is definitely welcome. > > [1] https://wiki.freebsd.org/TestClusterOneReservations > > -- > Mateusz Guzik <mjguzik gmail.com> > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJUyCcMoRu7JMCWfYb3acBF=fNopKAV4Ge8-mhApjuJ7ujOqFg>