FreeBSD Mail Archives

Date:      Fri, 04 Apr 2003 14:52:23 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        "Andrew R. Reiter" <arr@watson.org>
Cc:        freebsd-smp@freebsd.org
Subject:   Re: namecache
Message-ID:  <3E8E0CA7.E21BA871@mindspring.com>
References:  <20030404153953.S62312@fledge.watson.org> <3E8E001C.B39BF3ED@mindspring.com> <20030404170317.K63340@fledge.watson.org>

"Andrew R. Reiter" wrote:
> On Fri, 4 Apr 2003, Terry Lambert wrote:
> :"Andrew R. Reiter" wrote:
> :> Awhile back I shot a patch to this list for review that locked down some
> :> stuff in the namecache arena.  I feel the patch was OK, but not great.
> :>
> :> What I want to do is something w/o any locks, but I dont feel I have a
> :> good enough grasp of that code base to do something really well since I
> :> know it would most likely require a rewrite of that code.

I think the rewrite will be necessary.  It should be very easy,
just knowing the Solaris data structures.

For both the Solaris and SVR4 DNLC, the one thing I would caution
you against is a *really* blind reimplementation.

The SVR4 code, with which I am very familiar, did not allow entries
with a NULL vp for the lookup target.

In point of face, you should allow this.  The reason is that a
negative cache is twice as valuable as a postive cache, for any
list of data that must be linearly iterated.

The reason is that, on a hit, you iterate, on average 50% of the
entries before you find the one you are looking for; on a miss,
you must iterate all 100% of the entries, to make sure they are
not the one.

As a common example, as you look for a file before you go to create
it; when this happens, you iterate all directory entries in the
directory in which the create is taking place, and you do not find
the item.

So you go to create it; the insertion algorithm *again* iterates it,
because it does not know if it is a create or replace operation.

If you had a cache entry, it would still need to iterate the entries,
but it could avoid the compare operation that would otherwise be
necessary, and only look for free space to put the entry.  A possible
optimization would cache the offset of a first-fit entry in the NULL
vp valued DNLC entry.  This gives you a search start point for the
insert which is "likely" to still be valid, and you can search from
that point onward, because the earlier entries are likely to remain
"too small".

Maybe it should be bounded against "cache poisoning" (e.g. you
should maybe limit the total number of negative entries to some
maximum less than the size of the whole cache).  Not sure this
is valid, since you could "poison" with positive entries, too.

Another possible optimization, which may be no good on FreeBSD is
to add entries on traversal.  This optimization is best suited to
Samba, Appletalk, and similar servers, which expect the directory
entry to be identical to the inode, so you have to have the inode
information available immediately following knowing the directory
entry is there.  You'd only do this in the VOP_READDIR iteration
case, not everywhere you iterate for just one file lookup, but it
will gain you at least 15% performance on diretory operations on a
file server for PC's (such servers iterate, then stat each file to
get stat data to return both stat data and names as a single block
of "file entry information").

> I read the stuff in the Solaris book (I've got that one), but I've not
> read the section in Magic Garden (despite owning it as well).
> 
> I'll read further.

Really, either one is enough to write the implementation.  The SVR4
book has API definitions, though, so it might be useful to you to
use both books, if you are having trouble, and don't want to use the
BSD namecache API for some reason.

> Thanks!

You're welcome.

-- Terry

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E8E0CA7.E21BA871>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation