From owner-svn-src-all@freebsd.org Tue Sep 10 20:19:29 2019 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id DA6E9E3483; Tue, 10 Sep 2019 20:19:29 +0000 (UTC) (envelope-from mjg@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46Sbwd5R6xz3ybt; Tue, 10 Sep 2019 20:19:29 +0000 (UTC) (envelope-from mjg@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 9DC892B69C; Tue, 10 Sep 2019 20:19:29 +0000 (UTC) (envelope-from mjg@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id x8AKJT89052194; Tue, 10 Sep 2019 20:19:29 GMT (envelope-from mjg@FreeBSD.org) Received: (from mjg@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id x8AKJTuO052193; Tue, 10 Sep 2019 20:19:29 GMT (envelope-from mjg@FreeBSD.org) Message-Id: <201909102019.x8AKJTuO052193@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mjg set sender to mjg@FreeBSD.org using -f From: Mateusz Guzik Date: Tue, 10 Sep 2019 20:19:29 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r352183 - head/sys/kern X-SVN-Group: head X-SVN-Commit-Author: mjg X-SVN-Commit-Paths: head/sys/kern X-SVN-Commit-Revision: 352183 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Sep 2019 20:19:29 -0000 Author: mjg Date: Tue Sep 10 20:19:29 2019 New Revision: 352183 URL: https://svnweb.freebsd.org/changeset/base/352183 Log: cache: avoid excessive relocking on entry removal during lookup Due to lock ordering issues (bucket lock held, vnode locks wanted) the code starts with trylocking which in face of contention often fails. Prior to the change it would loop back with a possible yield. Instead note we know what locks are needed and can take them in the right order, avoiding retries. Then we can safely re-lookup and see if the entry we are looking for is still there. On a 104-way box poudriere would result in constant retries during an 11h run as seen in the vfs.cache.zap_and_exit_bucket_fail counter. before: 408866592 after : 0 However, a new stat reports: vfs.cache.zap_and_exit_bucket_relock_success: 32638 Note this is only a bandaid over current design issues. Tested by: pho Sponsored by: The FreeBSD Foundation Modified: head/sys/kern/vfs_cache.c Modified: head/sys/kern/vfs_cache.c ============================================================================== --- head/sys/kern/vfs_cache.c Tue Sep 10 20:19:02 2019 (r352182) +++ head/sys/kern/vfs_cache.c Tue Sep 10 20:19:29 2019 (r352183) @@ -378,6 +378,8 @@ STATNODE_COUNTER(numfullpathfail2, "Number of fullpath search errors (VOP_VPTOCNP failures)"); STATNODE_COUNTER(numfullpathfail4, "Number of fullpath search errors (ENOMEM)"); STATNODE_COUNTER(numfullpathfound, "Number of successful fullpath calls"); +STATNODE_COUNTER(zap_and_exit_bucket_relock_success, + "Number of successful removals after relocking"); static long zap_and_exit_bucket_fail; STATNODE_ULONG(zap_and_exit_bucket_fail, "Number of times zap_and_exit failed to lock"); static long zap_and_exit_bucket_fail2; STATNODE_ULONG(zap_and_exit_bucket_fail2, @@ -526,6 +528,19 @@ cache_trylock_vnodes(struct mtx *vlp1, struct mtx *vlp } static void +cache_lock_vnodes(struct mtx *vlp1, struct mtx *vlp2) +{ + + MPASS(vlp1 != NULL || vlp2 != NULL); + MPASS(vlp1 <= vlp2); + + if (vlp1 != NULL) + mtx_lock(vlp1); + if (vlp2 != NULL) + mtx_lock(vlp2); +} + +static void cache_unlock_vnodes(struct mtx *vlp1, struct mtx *vlp2) { @@ -992,10 +1007,47 @@ out: return (error); } +/* + * If trylocking failed we can get here. We know enough to take all needed locks + * in the right order and re-lookup the entry. + */ static int -cache_zap_wlocked_bucket(struct namecache *ncp, struct rwlock *blp) +cache_zap_unlocked_bucket(struct namecache *ncp, struct componentname *cnp, + struct vnode *dvp, struct mtx *dvlp, struct mtx *vlp, uint32_t hash, + struct rwlock *blp) { + struct namecache *rncp; + + cache_assert_bucket_locked(ncp, RA_UNLOCKED); + + cache_sort_vnodes(&dvlp, &vlp); + cache_lock_vnodes(dvlp, vlp); + rw_wlock(blp); + LIST_FOREACH(rncp, (NCHHASH(hash)), nc_hash) { + if (rncp == ncp && rncp->nc_dvp == dvp && + rncp->nc_nlen == cnp->cn_namelen && + !bcmp(rncp->nc_name, cnp->cn_nameptr, rncp->nc_nlen)) + break; + } + if (rncp != NULL) { + cache_zap_locked(rncp, false); + rw_wunlock(blp); + cache_unlock_vnodes(dvlp, vlp); + counter_u64_add(zap_and_exit_bucket_relock_success, 1); + return (0); + } + + rw_wunlock(blp); + cache_unlock_vnodes(dvlp, vlp); + return (EAGAIN); +} + +static int __noinline +cache_zap_wlocked_bucket(struct namecache *ncp, struct componentname *cnp, + uint32_t hash, struct rwlock *blp) +{ struct mtx *dvlp, *vlp; + struct vnode *dvp; cache_assert_bucket_locked(ncp, RA_WLOCKED); @@ -1010,14 +1062,17 @@ cache_zap_wlocked_bucket(struct namecache *ncp, struct return (0); } + dvp = ncp->nc_dvp; rw_wunlock(blp); - return (EAGAIN); + return (cache_zap_unlocked_bucket(ncp, cnp, dvp, dvlp, vlp, hash, blp)); } -static int -cache_zap_rlocked_bucket(struct namecache *ncp, struct rwlock *blp) +static int __noinline +cache_zap_rlocked_bucket(struct namecache *ncp, struct componentname *cnp, + uint32_t hash, struct rwlock *blp) { struct mtx *dvlp, *vlp; + struct vnode *dvp; cache_assert_bucket_locked(ncp, RA_RLOCKED); @@ -1034,8 +1089,9 @@ cache_zap_rlocked_bucket(struct namecache *ncp, struct return (0); } + dvp = ncp->nc_dvp; rw_runlock(blp); - return (EAGAIN); + return (cache_zap_unlocked_bucket(ncp, cnp, dvp, dvlp, vlp, hash, blp)); } static int @@ -1197,7 +1253,7 @@ retry: goto out_no_entry; } - error = cache_zap_wlocked_bucket(ncp, blp); + error = cache_zap_wlocked_bucket(ncp, cnp, hash, blp); if (__predict_false(error != 0)) { zap_and_exit_bucket_fail++; cache_maybe_yield(); @@ -1396,7 +1452,7 @@ success: zap_and_exit: if (blp != NULL) - error = cache_zap_rlocked_bucket(ncp, blp); + error = cache_zap_rlocked_bucket(ncp, cnp, hash, blp); else error = cache_zap_locked_vnode(ncp, dvp); if (__predict_false(error != 0)) { @@ -1922,6 +1978,7 @@ nchinit(void *dummy __unused) numfullpathfail2 = counter_u64_alloc(M_WAITOK); numfullpathfail4 = counter_u64_alloc(M_WAITOK); numfullpathfound = counter_u64_alloc(M_WAITOK); + zap_and_exit_bucket_relock_success = counter_u64_alloc(M_WAITOK); } SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_SECOND, nchinit, NULL);