From owner-freebsd-current@FreeBSD.ORG Thu Mar 19 18:02:44 2015 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6C65BF3E; Thu, 19 Mar 2015 18:02:44 +0000 (UTC) Received: from pp1.rice.edu (proofpoint1.mail.rice.edu [128.42.201.100]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 324DFBEE; Thu, 19 Mar 2015 18:02:44 +0000 (UTC) Received: from pps.filterd (pp1.rice.edu [127.0.0.1]) by pp1.rice.edu (8.14.5/8.14.5) with SMTP id t2JI2hXk007057; Thu, 19 Mar 2015 13:02:43 -0500 Received: from mh1.mail.rice.edu (mh1.mail.rice.edu [128.42.201.20]) by pp1.rice.edu with ESMTP id 1t6v5h9871-1; Thu, 19 Mar 2015 13:02:42 -0500 X-Virus-Scanned: by amavis-2.7.0 at mh1.mail.rice.edu, auth channel Received: from 108-254-203-201.lightspeed.hstntx.sbcglobal.net (108-254-203-201.lightspeed.hstntx.sbcglobal.net [108.254.203.201]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh1.mail.rice.edu (Postfix) with ESMTPSA id C65AC4602A2; Thu, 19 Mar 2015 13:02:37 -0500 (CDT) Message-ID: <550B0F3D.8070500@rice.edu> Date: Thu, 19 Mar 2015 13:02:37 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Mateusz Guzik , John Baldwin , freebsd-current@freebsd.org, alc@freebsd.org, Ryan Stone Subject: Re: [PATCH] Convert the VFS cache lock to an rmlock References: <20150313053203.GC9153@dft-labs.eu> <6376695.VOvhinOncy@ralph.baldwin.cx> <20150318175812.GA3512@dft-labs.eu> In-Reply-To: <20150318175812.GA3512@dft-labs.eu> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=11 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1503190161 X-Proofpoint-SSN: Sensitivity3 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Mar 2015 18:02:44 -0000 On 03/18/2015 12:58, Mateusz Guzik wrote: > On Wed, Mar 18, 2015 at 10:17:22AM -0400, John Baldwin wrote: >> On Friday, March 13, 2015 06:32:03 AM Mateusz Guzik wrote: >>> On Thu, Mar 12, 2015 at 06:13:00PM -0500, Alan Cox wrote: >>>> Below is partial results from a profile of a parallel (-j7) "buildwo= rld" on >>>> a 6-core machine that I did after the introduction of pmap_advise, s= o this >>>> is not a new profile. The results are sorted by total waiting time = and >>>> only the top 20 entries are listed. >>>> >>> Well, I ran stuff on lynx2 in the zoo on fresh -head with debugging >>> disabled (MALLOC_PRODUCTION included) and got quite different results= =2E >>> >>> The machine is Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz >>> 2 package(s) x 10 core(s) x 2 SMT threads >>> >>> 32GB of ram >>> >>> Stuff was built in a chroot with world hosted on zfs. >>> >>>> max wait_max total wait_total count avg wait_a= vg >>>> cnt_hold cnt_lock name >>>> >>>> 1027 208500 16292932 1658585700 5297163 3 313= 0 >>>> 3313855 kern/vfs_cache.c:629 (rw:Name Cache) >>>> >>>> 208564 186514 19080891106 1129189627 355575930 53 3= 0 >>>> 1323051 kern/vfs_subr.c:2099 (lockmgr:ufs) >>>> >>>> 169241 148057 193721142 419075449 13819553 14 30= 0 >>>> 110089 kern/vfs_subr.c:2210 (lockmgr:ufs) >>>> >>>> 187092 191775 1923061952 257319238 328416784 5 0= 0 >>>> 5106537 kern/vfs_cache.c:488 (rw:Name Cache) >>>> >>> make -j 12 buildworld on freshly booted system (i.e. the most namecac= he insertions): >>> >>> 32 292 3042019 33400306 8419725 0 3 = 0 2578026 kern/sys_pipe.c:1438 (sleep mutex:pipe mutex) >>> 170608 152572 642385744 27054977 202605015 3 0 = 0 1306662 kern/vfs_subr.c:2176 (lockmgr:zfs) >> You are using ZFS, Alan was using UFS. It would not surprise me that = those >> would perform quite differently, and it would not surprise me that UFS= is >> more efficient in terms of its interactions with the VM. >> > This is lots of forks + execs and page queue has only one lock. > > Fwiw, ufs world backed by dedidacted parition, other conditions unchang= ed > > First build: > 52 411 3327572 36528102 8508545 0 4 0= 2587337 kern/sys_pipe.c:1438 (sleep mutex:pipe mutex) > 308102 308099 430348158 36163333 203246241 2 0 0= 459200 kern/vfs_subr.c:2176 (lockmgr:ufs) > 78 818 44308162 29968440 165009282 0 0 0= 22053854 vm/vm_page.c:1502 (sleep mutex:vm page free queue) > 48 238 18872405 28456578 164327020 0 0 0= 22512151 vm/vm_page.c:2294 (sleep mutex:vm page free queue) > 208 1486 69780863 17484511 144970085 0 0 0= 134429 amd64/amd64/pmap.c:4204 (rw:pmap pv global) > 52 1263 19390169 8186840 142392234 0 0 0= 11151398 vm/vm_page.c:2053 (sleep mutex:vm active pagequeue) > 27 1801 19754927 8092312 142403798 0 0 0= 9993164 vm/vm_page.c:2097 (sleep mutex:vm active pagequeue) > 1145 1300 19872450 7158951 7690094 2 0 0= 269930 vm/vm_fault.c:785 (rw:vm object) > 25579 1636 10955717 5962336 12620413 0 0 0= 96691 vm/vm_map.c:2883 (rw:vm object) > 39 54994 1428266 5141221 368787 3 13 0= 13391 kern/kern_exec.c:590 (lockmgr:ufs) > 311470 311513 67241105 3971357 30997102 2 0 0= 5094 kern/vfs_lookup.c:509 (lockmgr:ufs) > 55 568 64300400 3821394 214921016 0 0 0= 2699822 kern/vfs_cache.c:487 (rw:Name Cache) > 14 2224 31486 3784823 266762 0 14 0= 43516 amd64/amd64/pmap.c:3767 (rw:pmap pv global) This is pmap_remove_all, which is the only amd64 pmap function that still acquires the global lock in exclusive mode. This entry shows that we're calling pmap_remove_all about 80% more often than we used to and this is causing pmap_enter (line 4204) above to wait. I'm curious to see what the profile looks like if you change this acquisition to a read acquire. Such a change hasn't been committed because of a race involving the completion of TLB shootdown. However, it's extremely improbable that the race will come out badly for a parallel "buildworld". > 15 1179 184521 2651974 2126398 0 1 0= 31955 vm/vm_object.c:646 (rw:vm object) > 43267 54635 56190811 2228815 368787 152 6 0= 10399 kern/imgact_elf.c:829 (lockmgr:ufs) > 1802 3276 55104622 2042552 142404165 0 0 0= 543434 vm/vm_fault.c:997 (sleep mutex:vm page) > 29 54 50897307 1792095 199350363 0 0 0= 2305326 kern/vfs_cache.c:668 (sleep mutex:vnode interlock) > 1051 1252 3640803 1792074 1030592 3 1 0= 18897 vm/vm_object.c:1703 (rw:vm object) > 17 1389 269455 1778764 2828515 0 0 0= 106843 vm/vm_object.c:1222 (rw:vm object) > 472 1444 14742782 1731247 6851167 2 0 0= 20011 amd64/amd64/pmap.c:3620 (rw:pmap pv global) > > reset + rm -r /usr/obj/* > > 230791 100799 405140208 51092446 203312369 1 0 0= 472299 kern/vfs_subr.c:2176 (lockmgr:ufs) > 188 8046 69993327 41591950 144943809 0 0 0= 175226 amd64/amd64/pmap.c:4204 (rw:pmap pv global) > 76 647 45324624 35204587 164788099 0 0 0= 22749884 vm/vm_page.c:1502 (sleep mutex:vm page free queue) > 40 237 19428445 33209934 164266257 0 0 0= 22973810 vm/vm_page.c:2294 (sleep mutex:vm page free queue) > 52 298 3038837 32813929 8418613 0 3 0= 2587957 kern/sys_pipe.c:1438 (sleep mutex:pipe mutex) > 1939 3216 19524902 14623470 7688013 2 1 0= 274392 vm/vm_fault.c:785 (rw:vm object) > 21696 2861 11773227 13246477 12617163 0 1 0= 106342 vm/vm_map.c:2883 (rw:vm object) > 54 7664 1445280 10079322 368785 3 27 0= 15598 kern/kern_exec.c:590 (lockmgr:ufs) > 22 2736 19935005 9087069 142381462 0 0 0= 10214061 vm/vm_page.c:2097 (sleep mutex:vm active pagequeue) > 36 885 19540659 8825819 142378309 0 0 0= 11305041 vm/vm_page.c:2053 (sleep mutex:vm active pagequeue) > 15 3045 33404 7922947 267066 0 29 0= 48667 amd64/amd64/pmap.c:3767 (rw:pmap pv global) > 13 2605 194977 6294921 2126083 0 2 0= 36096 vm/vm_object.c:646 (rw:vm object) > 82 2269 269861 4506215 2827953 0 1 0= 107885 vm/vm_object.c:1222 (rw:vm object) > 2179 6501 82846723 4123049 368785 224 11 0= 13145 kern/imgact_elf.c:829 (lockmgr:ufs) > 81 722 64614191 4099711 214861181 0 0 0= 2763484 kern/vfs_cache.c:487 (rw:Name Cache) > 3925 2851 3900231 3909417 1031608 3 3 0= 22052 vm/vm_object.c:1703 (rw:vm object) > 483 3335 14875079 3554944 6850151 2 0 0= 24182 amd64/amd64/pmap.c:3620 (rw:pmap pv global) > 187 129 103680324 2146404 250760755 0 0 0= 1630451 amd64/amd64/pmap.c:5362 (rw:pmap pv list) > 30468 2402 126229130 2042062 158088931 0 0 0= 71708 vm/vm_object.c:517 (rw:vm object) > 28 76 51334143 1892757 199327513 0 0 0= 2384837 kern/vfs_cache.c:668 (sleep mutex:vnode interlock) > >