From owner-freebsd-hackers Fri Apr 25 10:15:27 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id KAA27851 for hackers-outgoing; Fri, 25 Apr 1997 10:15:27 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.50]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id KAA27846 for ; Fri, 25 Apr 1997 10:15:25 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id KAA03608; Fri, 25 Apr 1997 10:11:21 -0700 From: Terry Lambert Message-Id: <199704251711.KAA03608@phaeton.artisoft.com> Subject: Re: the namei cache... To: phk@dk.tfs.com (Poul-Henning Kamp) Date: Fri, 25 Apr 1997 10:11:21 -0700 (MST) Cc: bakul@torrentnet.com, phk@dk.tfs.com, hackers@freebsd.org In-Reply-To: <2038.861948587@critter> from "Poul-Henning Kamp" at Apr 25, 97 08:09:47 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > >This is analogous to one of the ways one implements a symbol table > >in a lexically scoped language processing program. > > But these programs don't work with a finite bounded number of > entries, so reuse policies doesn't matter to them. I don't necessarily agree with this... he's not suggesting btree's or something else that needs balancing. > >Scaling. Directories with 100+ entries are not uncommon. Even > >/usr/include and /usr/include/sys have over 100 entries each. > > You obviously don't know how the name cache operates. Only names > you lookup ends up in the cache, it's not the entire directory > that gets put into the cache (unless you do a "ls -l" that is). This is true. Unless the inode is brought in core, a vnode is associated with the inode, and the vnode and name are associated in the cache, a cache entry is not made. One exception: negative cache entries. They are created on name lookup misses that do not result in vnode hits. You *could* end up with a lot of these. In some locality of reference models, it's actually valuable to fault all the related inodes into core. This is less expensive on systems that use device/offset rather than vnode/offset page identification, since you end up with 4k/128 or 32 of the things in core anyway. This is typically a win if your VFS consumer is going to return inode data with the directory entry. For high latency links, this is valuable (NFS). It's also valuable for lookup/open interfaces, such as those exported by SMB, ATP, and NCP servers. > >once encountered a directory with 2,000,000+ entries! One does not > >optimize for such border cases but it is nice when they are handled > >effortlessly as a byproduct of a design decision. > > For that case there will be no difference at all, even wcarchive only > has 44000 entries in the cache. In fairness, I think Bakul's claim loses water here, if you define "abusing the FS directory structure as a database" as bad practice. Of course, until you get rid of terminfo, that may be hard to do. 8-). > >A dynamically growing hashtable is a must. > > Hello Houston ? We have lost gravity! Of course we can't do that > in the kernel. Memory is way too expensive. ??? The name cache already uses a hash table. The difference is really computational, isn't it, in reordering the hash lists on growth. The hash *should be proportional to the number of buckets, and if you grow the buckets, it makes sense to grow the hash. For something like the name cache, you can live with it not being terribly sparse; you ough to implement two types of checksum however, and use one to hash and one to differentiate the compares, if you aren't going to bump up the hash table size when you bump up the number of buckets. > > I wish all of these "instant-fs" specialists would read up on their > subject matter before they jump in with their misunderstandings! > Bakul has always struck me as someone with a firm knowledge of commercial kernels. He may have pressed one of your hot buttons, but that doesn't mean you should treat him as a clueless newby. Better to enlighten him (assuming you're right and he's wrong; otherwise better to let him enlighten you) than to shout him down. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.