Date: Sat, 14 Oct 2006 16:06:53 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: fs@freebsd.org Cc: mohans@freebsd.org Subject: Re: lost dotdot caching pessimizes nfs especially Message-ID: <20061014143825.F1264@epsplex.bde.org> In-Reply-To: <20061006050913.Y5250@epsplex.bde.org> References: <20061006050913.Y5250@epsplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 6 Oct 2006, Bruce Evans wrote: > This change: > > % Index: vfs_cache.c > % =================================================================== > % RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v > % retrieving revision 1.102 > % retrieving revision 1.103 > % diff -u -2 -r1.102 -r1.103 > % --- vfs_cache.c 13 Jun 2005 05:59:59 -0000 1.102 > % +++ vfs_cache.c 17 Jun 2005 01:05:13 -0000 1.103 > % ... > > is responsible for about half of the performance loss since RELENG_4 > for building kernels over nfs (/usr and sys trees on nfs). The kernel > build uses "../../" a lot, and the above change apparently results in > lots of network activity for things that should be cached locally. > > Some times for building a RELENG_4 kernel under conditions invariant > except for the host kernel (after "make clean; sleep 2; make depend; > make; make clean; sleep 2; make depend" to warm up caches): > > kernel: > RELENG_4 77.51 real 60.62 user 4.36 sys > current.2004.07.01 ~78.5 (lost details) > current.2005.01.01 ~79 (lost details) > current.2005.06.17 82.42 real 62.50 user 4.71 sys > current.2005.06.19 89.53 real 62.18 user 5.44 sys > current.2005.06.17+ ~89.5 (lost details) > .17+ = .17 plus above change > current.2005.06.17+* 86.08 real 62.43 user 5.13 sys > .17+* = .17+ with ../.. in Makefile avoided using a symlink > @ -> <path to sys not using ..> > RELENG_6 91.14 real 62.04 user 5.71 sys > current similar to RELENG_6 (lost details) > > The total performance loss is about 18%. > > The total performance loss for a local sys tree (/usr still on nfs) is much > smaller (about 4%): > > RELENG_4 65.19 real 60.50 user 3.95 sys > current.2005.06.17 67.49 real 62.13 user 4.27 sys > RELENG_6 67.83 real 61.84 user 4.71 sys > current similar to RELENG_6 (lost details) > > The nfs performance for building of things that should be entirely > cached locally is very dependent on network latency. Not caching > things very well causes lots of unnecessary network traffic for Getattr > and Lookup. The packets are small, so throughput is unimportant and > latency dominates. For building over nfs without -j, the dead time > (real - user - sys) is almost directly proportional to the latency. > My usual local network has fairly low latency (~100uS unloaded) and > the ~14 seconds dead time in the above is for it. Switching to a 1 > Gbps network with lower quality NICs gives an unloaded latency of ~160uS > and a dead time of ~21 seconds. Building with -j helps even for UP, > at the cost of extra CPU, by letting some processes advance using cached > stuff while others are waiting for the network. Building with -j helps > even more on FreeBSD cluster machines, more because they have a much > higher network latency than because they are SMP. I finished finding almost all the lost performance. As indicated above, It was almost all in nfs. This change: % Index: nfs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/nfsclient/nfs_vnops.c,v % retrieving revision 1.235 % retrieving revision 1.236 % diff -u -2 -r1.235 -r1.236 % --- nfs_vnops.c 6 Dec 2004 18:52:28 -0000 1.235 % +++ nfs_vnops.c 6 Dec 2004 19:18:00 -0000 1.236 % @@ -418,10 +418,11 @@ % if (error) % return (error); % - np->n_mtime = vattr.va_mtime.tv_sec; % + np->n_mtime = vattr.va_mtime; % } else { % + np->n_attrstamp = 0; ^^^^^^^^^^^^^^^^^^^^ % error = VOP_GETATTR(vp, &vattr, ap->a_cred, ap->a_td); % if (error) % return (error); % - if (np->n_mtime != vattr.va_mtime.tv_sec) { % + if (NFS_TIMESPEC_COMPARE(&np->n_mtime, &vattr.va_mtime)) { % if (vp->v_type == VDIR) % np->n_direofoffset = 0; and associated changes give silly behaviour that almost doubles the number of Access RPCs. One of the associated changes clears n_attrstamp on close(). Then on open(), since lookup() is called before the above is reached, nfs_access_otw() has always just been called, and the above forces another call. Counting RPCs gives a good metric for the pessimizations. Removing the above clearing in RELENG_6 gives the following improvement: Before: 89.90 real 62.16 user 5.50 sys Lookup Read Write Create Access Fsstat Setattr Other Total 60010 2410 5353 442 43785 1742 5194 6 118942 After: 86.46 real 62.22 user 5.21 sys Lookup Read Write Create Access Fsstat Setattr Other Total 59986 2410 5353 442 20935 1742 5194 6 96068 Note the RPC delta-counts barely changed except for the Access one. About 20000 Access calls were avoided. Just removing the clearing is not correct but is close. The pessimization in vfs_cache.c 1.103 is now easy to quantify. It triples the number of Lookup RPCs. Removing it in addition to the above gives a much larger improvement: 79.24 real 61.87 user 5.04 sys Lookup Read Write Create Access Fsstat Setattr Other Total 19548 2410 5353 442 20922 1742 5194 6 55617 Note the RPC delta-counts barely changed except for the Lookup one. About 40000 Lookup calls were avoided. Just removing the change in vfs_cache.c 1.103 is not close to being correct. The last major pessimization is another silly one. The changes to mark atimes on exec() and mmap() cause a silly null Setattr RPC for every exec() (more for interprters?) and every mmap(). This is easy to fix (almost) correctly. VOP_SETATTR() is assumed to do nothing for requests that it doesn't understand, but nfs_setattr() does null RPCs instead. The following fix: % diff -c2 ./nfsclient/nfs_vnops.c~ ./nfsclient/nfs_vnops.c % *** ./nfsclient/nfs_vnops.c~ Sun Oct 8 23:08:57 2006 % --- ./nfsclient/nfs_vnops.c Fri Oct 13 09:58:12 2006 % *************** % *** 669,675 **** % % /* % ! * Setting of flags is not supported. % */ % ! if (vap->va_flags != VNOVAL) % return (EOPNOTSUPP); % % --- 677,684 ---- % % /* % ! * Setting of flags and marking of atimes are not supported. % */ % ! if (vap->va_flags != VNOVAL || % ! ((bdefix & 4) && (vap->va_vaflags & VA_MARK_ATIME))) % return (EOPNOTSUPP); % in addition to the removals gives the following improvement with bdefix set to 7: 78.14 real 62.03 user 4.79 sys Lookup Read Write Create Access Fsstat Other Total 19556 2410 5353 442 19581 1738 14 49094 Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061014143825.F1264>