From owner-freebsd-fs@FreeBSD.ORG Wed Oct 18 06:40:51 2006 Return-Path: X-Original-To: fs@FreeBSD.org Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0184E16A403; Wed, 18 Oct 2006 06:40:51 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4143043D55; Wed, 18 Oct 2006 06:40:50 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout2.pacific.net.au (Postfix) with ESMTP id 6FFE06E826; Wed, 18 Oct 2006 16:40:46 +1000 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (Postfix) with ESMTP id 73A132744E; Wed, 18 Oct 2006 16:40:44 +1000 (EST) Date: Wed, 18 Oct 2006 16:40:43 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Chuck Lever In-Reply-To: <20061017113943.C67620@delplex.bde.org> Message-ID: <20061018153336.E72684@delplex.bde.org> References: <200610140725.k9E7PC37008454@repoman.freebsd.org> <20061014231502.GA38708@rink.nu> <20061015105809.M59123@delplex.bde.org> <20061015051044.GA42764@xor.obsecurity.org> <20061014222221.H97880@ns1.feral.com> <20061014222437.N4701@ns1.feral.com> <20061015153454.G59979@delplex.bde.org> <76bd70e30610150837w61689cf6ya2499d100a15c3e8@mail.gmail.com> <20061016164122.S63585@delplex.bde.org> <76bd70e30610160620x67e5d3a5j938c26744d0b9759@mail.gmail.com> <20061017113943.C67620@delplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: mjacob@FreeBSD.org, fs@FreeBSD.org, Kris Kennaway Subject: negative cache hits for nfs (was: cvs commit: ...) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Oct 2006 06:40:51 -0000 [I changed the Cc from cvs* to fs] On Tue, 17 Oct 2006, Bruce Evans wrote: > On Mon, 16 Oct 2006, Chuck Lever wrote: > >> On 10/16/06, Bruce Evans wrote: >>> On Sun, 15 Oct 2006, Chuck Lever wrote: >>>> [An independent imeout for the access cache isn't useful.] >>> >>> I'll try removing the special support for the access cache timeout in >>> rc.conf first. >> >> OK. I can review patches if you think that would help, but I can't >> contribute code at the moment because of IP issues at my current >> employer. Hopefully that will change soon. > > Thanks. Removing it in rc.conf won't require review :-). >> ... >> Another thing to consider is that a LOOKUP is usually more expensive >> for servers than a GETATTR. If your client has already cached lookup >> results for the file to be opened, you can get away with a GETATTR on >> the parent directory to verify that it has not changed, and that will >> almost always be faster than doing a full LOOKUP. > > FreeBSD's client is doing not very good things for Lookup too. It is > missing caching of negative lookups. make(1) likes to do a lot of > negative lookups... NetBSD fixed this in 1997, sigh. Here is a merge of some bits from NetBSD for review. It is mostly the 1997 version, with updates to use timespecs instead of time_t's, but not updates to use changes that don't seem to be related to correctness, or ones less than 18 months old (if any). % Index: nfs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/nfsclient/nfs_vnops.c,v % retrieving revision 1.270 % diff -u -2 -r1.270 nfs_vnops.c % --- nfs_vnops.c 14 Oct 2006 07:25:11 -0000 1.270 % +++ nfs_vnops.c 18 Oct 2006 01:41:14 -0000 % @@ -852,7 +869,17 @@ % return (error); % } % - if ((error = cache_lookup(dvp, vpp, cnp)) && error != ENOENT) { % + if ((error = cache_lookup(dvp, vpp, cnp)) != 0) { % struct vattr vattr; % % + if (error == ENOENT) { % + /* Negative cache hit. Use it unless stale. */ % + if (VOP_GETATTR(dvp, &vattr, cnp->cn_cred, td) == 0 && % + timespeccmp(&vattr.va_mtime, &np->n_nctime, ==)) % + return (ENOENT); % + % + cache_purge(dvp); % + timespecclear(&np->n_nctime); % + goto dorpc; % + } % newvp = *vpp; % if (!VOP_GETATTR(newvp, &vattr, cnp->cn_cred, td) % @@ -871,4 +898,5 @@ % *vpp = NULLVP; % } % +dorpc: % error = 0; % newvp = NULLVP; % @@ -951,4 +979,11 @@ % nfsmout: % if (error) { % + if (error == ENOENT && (cnp->cn_flags & MAKEENTRY) && % + cnp->cn_nameiop != CREATE) { % + /* Negative cache entry. */ % + if (!timespecisset(&np->n_nctime)) % + np->n_nctime = np->n_vattr.va_mtime; % + cache_enter(dvp, NULL, cnp); % + } % if (newvp != NULLVP) { % vput(newvp); % @@ -1931,6 +1966,9 @@ % if (newvp) % vput(newvp); % - } else % + } else { % + if (cnp->cn_flags & MAKEENTRY) % + cache_enter(dvp, newvp, cnp); % *ap->a_vpp = newvp; % + } % return (error); % } % Index: nfsnode.h % =================================================================== % RCS file: /home/ncvs/src/sys/nfsclient/nfsnode.h,v % retrieving revision 1.59 % diff -u -2 -r1.59 nfsnode.h % --- nfsnode.h 13 Sep 2006 18:39:09 -0000 1.59 % +++ nfsnode.h 18 Oct 2006 00:48:44 -0000 % @@ -100,4 +100,5 @@ % struct timespec n_mtime; /* Prev modify time. */ % time_t n_ctime; /* Prev create time. */ % + struct timespec n_nctime; /* Last neg cache entry (dir) */ % time_t n_expiry; /* Lease expiry time */ % nfsfh_t *n_fhp; /* NFS File Handle */ For building kernels, this gives a larger speedup than everything that I tried short of completely dropping cto consistency, provided dotdot caching in vfs_cache.c isn't lost. The following benchmarks are also with zapping of the attribute cache turned off in nfs_close() to avoid doubled Getattr's in open() without breaking cto consistency. Times and nfsstats are for the second run of "make depend; sync" and "make; sync " after "make clean cleandend; sync; sleep 1" in each run, with a RELENG_4 kernel sources, ~RELENG_5 userland and -current+ kernel, sources and obj and /usr on nfs, unloaded network latency 100uS, ... Before: 12.75 real 5.19 user 1.58 sys Lookup Read Write Access Getattr Other Total 14203 548 599 21561 454 97 37462 78.80 real 62.01 user 4.45 sys Lookup Read Write Create Access Fsstat Other Total 19543 2410 5353 442 24241 1742 14 53745 After: 10.68 real 5.20 user 1.42 sys Lookup Read Write Access Getattr Other Total 1268 548 599 21575 454 112 24556 76.38 real 62.00 user 4.28 sys Lookup Read Write Create Access Fsstat Other Total 4122 2410 5353 442 24222 1750 14 38313 The number of Lookups has been reduced by a factor of 11+ for make -n and 5- for make. With lost dotdot caching, After: 11.02 real 5.25 user 1.40 sys Lookup Read Write Access Getattr Other Total 3031 548 599 21574 453 112 26317 84.19 real 62.20 user 4.71 sys Lookup Read Write Create Access Fsstat Other Total 45063 2410 5353 442 24290 1750 14 79322 This does another 40000+ Lookups, much the same as Before, but ending up with 45000+ instead of 59000+. Of course things are slower with cold caches, but they aren't much slower (less than what losing dotdot caching costs). According to vfs.cache.numcache, there are less than 2000 files to look up, so even 4122 Lookups is a lot. (vfs.cache.numcache was 1482 and vfs.cache.numneg was 103 after the run that produced the above statistics. This includes a few other files looked up since rebooting a few minuts earlier. I wasn't careful about starting from scratch without dotdot caching). With cto consistency completely turned off, without lost dotdot caching, After: 7.54 real 5.17 user 1.13 sys Lookup Read Write Access Getattr Other Total 1284 548 599 2161 454 112 5158 72.67 real 61.95 user 3.82 sys Lookup Read Write Create Access Fsstat Other Total 4799 2410 5353 442 1529 1750 14 16297 This reduces the Access count by a factor of almost 18. Now another pessimization is more obvious -- why should building kernels be doing all those Fsstat's? I haven't located them for sure, but that opendir always calls statfs(2) for the silly purpose of determining whether it needs to do extra work to support unionfs. For comparison: >From now on, kernel source and obj are on a local file system. nfs is still on /usr, so there are a few RPCs for execing things. Before (without lost dotdot caching, with cto consistency): 6.39 real 5.04 user 1.08 sys Lookup Access Other Total 865 651 3 1519 66.86 real 61.42 user 4.45 sys Lookup Access Other Total 3115 1350 1 4466 Before (without lost dotdot caching, and _without_ cto consistency): 6.17 real 5.25 user 0.88 sys Other Total 86 86 66.61 real 61.54 user 4.74 sys Other Total 94 94 The differences that the nfs changes make with just /usr on nfs are measureable but tiny. At least in my configuration. All my executables are statically linked, else the extra opens for cto consistency of the shared libraries would give many more than 1350 Access RPCs. Bruce