From owner-freebsd-hackers  Thu Jul  4 19:32:25 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5C5E537B400
	for <freebsd-hackers@freebsd.org>; Thu,  4 Jul 2002 19:32:21 -0700 (PDT)
Received: from scaup.mail.pas.earthlink.net (scaup.mail.pas.earthlink.net [207.217.120.49])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E8D7243E31
	for <freebsd-hackers@freebsd.org>; Thu,  4 Jul 2002 19:32:20 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0169.cvx22-bradley.dialup.earthlink.net ([209.179.198.169] helo=mindspring.com)
	by scaup.mail.pas.earthlink.net with esmtp (Exim 3.33 #1)
	id 17QItM-00033g-00; Thu, 04 Jul 2002 22:32:16 -0400
Message-ID: <3D2504DC.36D046D7@mindspring.com>
Date: Thu, 04 Jul 2002 19:30:52 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Richard Sharpe <rsharpe@ns.aus.com>
Cc: freebsd-hackers@freebsd.org
Subject: Re: Adding readdir entries to the name cache ...
References: <Pine.LNX.4.33.0207051104570.2925-100000@ns.aus.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Richard Sharpe wrote:
> I am interested in hearing opinions on whether or not it is useful to
> preload entries into the name cache that are obtained with
> readdir/getdirentries/getdents.

It depends.  In a past life at another company, we were able to get a
30%+ performance improvement simply by doing this type of caching.
The system where the cacheing was being done, though, faulted the
inodes in asynchornously.

Basically, this meant that, for something like Samba, which equates
every directory operation including open with a corresponding "stat",
that the object to be stat'ed didn't have the added latency that
would have resulted from the inode fetch, had it been done serially.

We got another 15% out of it by returning the stat information with
the directory entries to user space, all as one operation, with a
getdirentries interface that returned ful stat information.

Then we got another 15% performance by going the globbing in the
kernel, instead of transferring all the data to user space, and
doing the globbing there.  This was for a wire protocol that asked
first for directories, then for files, and which frequently used
globbind that excluded at least 20% of the directory contents, and
tended to do the directory/file searches serially *every time*.


> We recently had a problem of the kernel hanging because we were loading
> entries into the name that were being picked up using
> readdir/getdirentries/getdents. These entries were inserted using
> cache_entry, which seems not to check that the entry is already in the
> cache, it simply inserts it. (Normally, cache_entry is called because a VOP
> routine was told to MAKEENTRY).

It seems to me that you could check for equality on insertion if
there was a hash collision.  THis is what the SVR4 DNLC code does.

Note that you can get another 8-12% by making negative cache entries,
since DOS/Windows clients tend to search the full path each time,
even though they have already received a successful response in
the past (i.e. there is no client caching for this information,
because there is no distributed coherency protocol that would permit
server invalidation of the cache).  Unmodified, SVR4 DNLC can not
support negative cache entries (there need to be two line changes).

Note that negative cache entries are twice as valuable as positive
entries, since on average, a linear search for an object which is
present will need to iterate only 50% of the search space, whereas
an object which is not present requires that you iterate 100% of
the search space.


> It seemed to me that one way to fix that would be to check whether an
> entry was in the name cache before trying to load it, however, I wonder at
> the wisdom of trying to do so, and feel that the current situation might
> be more than satisfactory.

No.  It's advisable.  You must handle the hash insertion collision
in any case; the extra overhead is in the traversal and compare of
the collision chain.  For correct selection of hash bucket size and
hash algorithm, the only time you will be comparing, statistically
speaking, will be when the entry already exists.  This basically
means that the check should cost you a single compare in addition
to the hash chain pointer, which you have to look at anyway, to do
the insertion, in case the bucket is non-empty.


> The most obvious reason for loading readdir entries is to handle ls -l.
> Since ls has to read the directory enties, which it does with
> getdirentries, and then has to stat the entries, which would cause it to
> do a lookup on the name and then retrieve the vnode/inode, it would seem
> to be a win if we preload the entries in the name cache.

Yes.  This is a win.  A bigger win is a dirent structure that
contains stat, as well as name information: you end uf with
50% fewer user/kernel boundary crossings.


> However, ls seems to call lstat in the same order that the files are in
> the directory, and a normal clock approach to directories would yield
> exactly the same result. Further, in the cases that the user did not want
> a -l, we would avoid adding many potentially useless names to the name
> cache and reducing its performance.

This is because the sort occurs first.  An unsorted "ls" (which is
available -- see the man page) doesn't have this issue.

It would also be useful for "find" and "mtree", FWIW.

The last time I looked into pre-faulting the vnodes on directory
searches, though, it was very difficult in FreeBSD, because it
could not handle inodes as existing in faulted pages, and instead
required direct I/O: so you paid the serial latency no matter when
you did it, because the process blocked.


> [1] Samba, because it has to support the Windows case insensitive file
> system, must do some pretty ugly things :-) When a client asks that a file
> be opened, for example, Samba tries with exactly the case that was
> presented. If that fails, it must do a readdir scan of the directory so it
> can do a case-insensitive match. So, even negative caching does not buy us
> much in the the case of Samba. What would help is a case insensitive
> filesystem.

It is useful to be able to do "case sensitive on storage, case insensitive
on lookup" on a per process basis.  The easiest is if you wire this in
as a flag on the proc itself.  The normal way this is done is a flag to
sfork, but... it should also be possible to have the proc open itself
in procfs, and then ioctl() down a flag setting for this.

Be aware that if you *do* end up prefaulting everything, that it is
very application dependent whether you will be gaining, or merely
thrashing the cache.  For DOS/Windows/Mac client machines of FS's
exported by a host OS, and for which the exported FS has directories
in the execution path of the client, it's very valuable.  But you
need to be able to turn it off (I would say "default it to off") for
uses of FS's that have poorer locality.  Probably, you would want to
make it a sysctl: web servers tend to have very good locality, just
like file servers.  New servers, etc., tend to have *terrible*
locality.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message