Date: Thu, 30 Jul 2015 16:06:39 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Warner Losh <imp@bsdimp.com> Cc: Hans Petter Selasky <hps@selasky.org>, =?utf-8?Q?Roger_Pau_Monn=C3=A9?= <royger@freebsd.org>, Adrian Chadd <adrian.chadd@gmail.com>, Warner Losh <imp@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, Shani Michaeli <shanim@mellanox.com> Subject: Re: svn commit: r285068 - in head/sys: conf modules/agp modules/geom/geom_part/geom_part_apm modules/geom/geom_part/geom_part_bsd modules/geom/geom_part/geom_part_bsd64 modules/geom/geom_part/geom_part... Message-ID: <20150730135259.C2050@besplex.bde.org> In-Reply-To: <2779441E-E53E-4DB4-84CF-36A0CFCB4C08@bsdimp.com> References: <201507030150.t631oRd0039490@repo.freebsd.org> <5596C188.10404@FreeBSD.org> <CAJ-VmomcHt_RiDRDC3s8_sjQXyOfh5yrNQjOxOEp-re1ceb2yQ@mail.gmail.com> <5596C7E7.5090700@FreeBSD.org> <68C8F69B-56DF-45C3-8DBB-40514CA48D85@bsdimp.com> <55B8A8CA.90701@selasky.org> <3414D44A-A22F-4693-9F84-A8E880C0B185@bsdimp.com> <55B8F547.2010008@selasky.org> <2779441E-E53E-4DB4-84CF-36A0CFCB4C08@bsdimp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 29 Jul 2015, Warner Losh wrote: >> On Jul 29, 2015, at 9:46 AM, Hans Petter Selasky <hps@selasky.org> wrote: >> In this particular case one "find of /sys" takes 11-16 seconds over NFS, so building a single KMOD takes 16 seconds too. It's not possible to eliminate the find entirely during repeated builds? > > 16 seconds? That\xe2\x80\x99s a really slow NFS server and at least 11 seconds longer than it should take :(. 11 seconds? Building full (FreeBSD-4) kernel takes 4.5 here seconds with a tuned nfs, starting with a warm cache after "make depend". makeworld takes 130 seconds over nfs, with nfs costing about 15 seconds of that, starting with a cold cache on the client and a warm cache on the server, and doing sufficient "make depend" steps (most are optimized out in ny version to save 10% with make -j1 and much more with make -j8; "make depend" and "make install" are not parallelized so they are the slowest parts of makeworld). It is mysterious that makeworld starting with a warm cache on the client is slightly slower. This took some fighting with network latency and nfs's lack of caching of attributes (it caches everything, but checks attributes on every first-open). For find, I think this results in an RPC or two for opening every directory in the traversal, but the caching works for regular files. "find ." on /usr/src takes: 0.27 real 0.03 user 0.24 sys on the server 3.27 real 0.05 user 0.40 sys on the client (3x faster) Lookup Access Fsstat Getattr Other Total (client nfsstat delta:) 1759 408 6987 14374 1 23529 while cached du -a takes: 0.58 real 0.09 user 0.48 sys on the server 10.30 real 0.09 user 0.78 sys on the client Lookup Access Fsstat Getattr Other Total (client nfsstat delta:) 49655 6987 6986 13979 6 77613 /usr/src has 6986 directories and 49643 files. There seems to be 1 Fsstat and 2 Getattr's per directory, 1 Access per directory for du -a, and 1 Lookup per file for du -a. The caching helped more for Lookup and Access for "find .. After mounting with -nocto: find .: 1.20 real 0.04 user 0.32 sys Fsstat Other Total 6987 16 7003 du -a: 6.83 real 0.04 user 0.69 sys Lookup Access Fsstat Other Total 42666 237 6986 3 49892 There are still too many Fsstats. This reminds me of an old pessimization in opendir(). It calls fstatfs() for almost every directory to support unionfs although unionfs never workded correctly and is almost never used. At least nfs3 never cached Fsstat in any FreeBSD implementation, so "find ." has to do lots of Fsstat RPCs. The caching works perfectly to avoid almost all other RPCs. There are still too many Lookups. The Lookup count for du -a now seems to be the number of files less than the number of directories. Apparently, lookups are only cached right for directories. After waiting a bit for cache timeouts to expire, du -a does 49000+ Lookups to look up the directories too. The files get cached if you read them. This gives the silly behaviour that tar cvvf runs much faster than du -a with a warm cache, since caching actually works for it (except for Fsstat): tar cvvf /dev/zero (nocto): 2.41 real 0.17 user 1.40 sys Fsstat Other Total 6986 10 6996 I had to fudge this test a little to avoid cache timeouts. I ran it a few times and picked the fastest ones. The default cache timeouts are 3-30 seconds for files and 30-60 seconds for directories (depending on the age of the file). tar with a cold cache takes longer than that, so it takes a few runs to get everything cached. Caching everything is only possible since the caching works for reads. So the first run of tar cvvf caches all the data so that subsequent runs have a chance of completing faster than the attribute cache times out. du -a has without nocto a high variance since it takes a significant fraction of the cache timeout. I don't use nocto in production since it reduces robustness and I want to make nfs caching work better without it. Caching can also be improved and robustness unimproved by increasing the cache timeouts. These default and current settings of these are hard to determine since their documentation is incorrect and mount(8) still doesn't support retrieving any mount options that are not in old flags. The default for acdirmin is still documented as being 30. This was too large, so it was changed to 3 in the code. The man page hasn't caught up with the change. 3 seems too low. The default for acdirmax is still 60. The defaults for nametimeo and negnametimeo are only documented as manifest constants, so they they are always correct in the man page because they are invisible there. NFS_DEFAULT_NEGNAMETIMEO is too long and has value 60. For the ac* timeouts, there is a hint that the actual timeout (between the min and the max) depends on the age of the file. For negnametimeo, there is no hint that the actual timeout is often determined by changes in the directory. negnametimeo seems to be just defense against bugs in the attribute caching. I don't know of any similar timeout for data. Such a timeout would flush the data cache quite often when nothing has changed, just in case we don't detect an actual change due to buggy timestamps or buggy caching of timestamps. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150730135259.C2050>