FreeBSD Mail Archives

Date:      Thu, 30 Jul 2015 16:06:39 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Warner Losh <imp@bsdimp.com>
Cc:        Hans Petter Selasky <hps@selasky.org>,  =?utf-8?Q?Roger_Pau_Monn=C3=A9?= <royger@freebsd.org>,  Adrian Chadd <adrian.chadd@gmail.com>, Warner Losh <imp@freebsd.org>,  "src-committers@freebsd.org" <src-committers@freebsd.org>,  "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>,  "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>,  Shani Michaeli <shanim@mellanox.com>
Subject:   Re: svn commit: r285068 - in head/sys: conf modules/agp modules/geom/geom_part/geom_part_apm modules/geom/geom_part/geom_part_bsd modules/geom/geom_part/geom_part_bsd64 modules/geom/geom_part/geom_part...
Message-ID:  <20150730135259.C2050@besplex.bde.org>
In-Reply-To: <2779441E-E53E-4DB4-84CF-36A0CFCB4C08@bsdimp.com>
References:  <201507030150.t631oRd0039490@repo.freebsd.org> <5596C188.10404@FreeBSD.org> <CAJ-VmomcHt_RiDRDC3s8_sjQXyOfh5yrNQjOxOEp-re1ceb2yQ@mail.gmail.com> <5596C7E7.5090700@FreeBSD.org> <68C8F69B-56DF-45C3-8DBB-40514CA48D85@bsdimp.com> <55B8A8CA.90701@selasky.org> <3414D44A-A22F-4693-9F84-A8E880C0B185@bsdimp.com> <55B8F547.2010008@selasky.org> <2779441E-E53E-4DB4-84CF-36A0CFCB4C08@bsdimp.com>

On Wed, 29 Jul 2015, Warner Losh wrote:

>> On Jul 29, 2015, at 9:46 AM, Hans Petter Selasky <hps@selasky.org> wrote:
>> In this particular case one "find of /sys" takes 11-16 seconds over NFS, so building a single KMOD takes 16 seconds too. It's not possible to eliminate the find entirely during repeated builds?
>
> 16 seconds? That\xe2\x80\x99s a really slow NFS server and at least 11 seconds
longer than it should take :(.

11 seconds? Building full (FreeBSD-4) kernel takes 4.5 here seconds with
a tuned nfs, starting with a warm cache after "make depend". makeworld
takes 130 seconds over nfs, with nfs costing about 15 seconds of that,
starting with a cold cache on the client and a warm cache on the server,
and doing sufficient "make depend" steps (most are optimized out in ny
version to save 10% with make -j1 and much more with make -j8; "make
depend" and "make install" are not parallelized so they are the slowest
parts of makeworld). It is mysterious that makeworld starting with a
warm cache on the client is slightly slower.

This took some fighting with network latency and nfs's lack of caching
of attributes (it caches everything, but checks attributes on every
first-open). For find, I think this results in an RPC or two for
opening every directory in the traversal, but the caching works for
regular files. "find ." on /usr/src takes:

0.27 real 0.03 user 0.24 sys on the server
3.27 real 0.05 user 0.40 sys on the client (3x faster)
Lookup Access Fsstat Getattr Other Total (client nfsstat delta:)
1759 408 6987 14374 1 23529

while cached du -a takes:

0.58 real 0.09 user 0.48 sys on the server
10.30 real 0.09 user 0.78 sys on the client
Lookup Access Fsstat Getattr Other Total (client nfsstat delta:)
49655 6987 6986 13979 6 77613

/usr/src has 6986 directories and 49643 files. There seems to be 1 Fsstat
and 2 Getattr's per directory, 1 Access per directory for du -a, and 1
Lookup per file for du -a. The caching helped more for Lookup and Access
for "find ..

After mounting with -nocto:

find .:
1.20 real 0.04 user 0.32 sys
Fsstat Other Total
6987 16 7003

du -a:
6.83 real 0.04 user 0.69 sys
Lookup Access Fsstat Other Total
42666 237 6986 3 49892

There are still too many Fsstats. This reminds me of an old
pessimization in opendir(). It calls fstatfs() for almost every
directory to support unionfs although unionfs never workded correctly
and is almost never used. At least nfs3 never cached Fsstat in any
FreeBSD implementation, so "find ." has to do lots of Fsstat RPCs.
The caching works perfectly to avoid almost all other RPCs.

There are still too many Lookups. The Lookup count for du -a now seems
to be the number of files less than the number of directories.
Apparently, lookups are only cached right for directories. After
waiting a bit for cache timeouts to expire, du -a does 49000+ Lookups
to look up the directories too.

The files get cached if you read them. This gives the silly behaviour
that tar cvvf runs much faster than du -a with a warm cache, since caching
actually works for it (except for Fsstat):

tar cvvf /dev/zero (nocto):
2.41 real 0.17 user 1.40 sys
Fsstat Other Total
6986 10 6996

I had to fudge this test a little to avoid cache timeouts. I ran it
a few times and picked the fastest ones. The default cache timeouts
are 3-30 seconds for files and 30-60 seconds for directories (depending
on the age of the file). tar with a cold cache takes longer than that,
so it takes a few runs to get everything cached. Caching everything
is only possible since the caching works for reads. So the first run
of tar cvvf caches all the data so that subsequent runs have a chance
of completing faster than the attribute cache times out. du -a has
without nocto a high variance since it takes a significant fraction
of the cache timeout.

I don't use nocto in production since it reduces robustness and I want
to make nfs caching work better without it.

Caching can also be improved and robustness unimproved by increasing the
cache timeouts. These default and current settings of these are hard to
determine since their documentation is incorrect and mount(8) still
doesn't support retrieving any mount options that are not in old flags.

The default for acdirmin is still documented as being 30. This was too
large, so it was changed to 3 in the code. The man page hasn't caught
up with the change. 3 seems too low. The default for acdirmax is still
60.

The defaults for nametimeo and negnametimeo are only documented as
manifest constants, so they they are always correct in the man page
because they are invisible there. NFS_DEFAULT_NEGNAMETIMEO is too
long and has value 60. For the ac* timeouts, there is a hint that
the actual timeout (between the min and the max) depends on the age
of the file. For negnametimeo, there is no hint that the actual
timeout is often determined by changes in the directory.
negnametimeo seems to be just defense against bugs in the attribute
caching. I don't know of any similar timeout for data. Such a timeout
would flush the data cache quite often when nothing has changed, just
in case we don't detect an actual change due to buggy timestamps or
buggy caching of timestamps.

Bruce

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150730135259.C2050>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation