Date: Mon, 28 Apr 1997 01:37:44 -0400 From: Bakul Shah <bakul@torrentnet.com> To: Poul-Henning Kamp <phk@dk.tfs.com> Cc: hackers@freebsd.org Subject: Re: the namei cache... Message-ID: <199704280537.BAA27601@chai.plexuscom.com> In-Reply-To: Your message of "Fri, 25 Apr 1997 08:09:47 %2B0200." <2038.861948587@critter>
next in thread | previous in thread | raw e-mail | index | archive | help
> >This is analogous to one of the ways one implements a symbol table > >in a lexically scoped language processing program. > > But these programs don't work with a finite bounded number of > entries, so reuse policies doesn't matter to them. The bound is not so low as in the namei cache case but reuse policies do matter. The point of analogy was not to duplicate what they do but to remind you to check out the range of techniques they use. May be some of them are useful? > >Scaling. Directories with 100+ entries are not uncommon. Even > >/usr/include and /usr/include/sys have over 100 entries each. > You obviously don't know how the name cache operates. Only names > you lookup ends up in the cache, it's not the entire directory > that gets put into the cache (unless you do a "ls -l" that is). If anyone stats all the names in /usr/include suddenly you are doing 50+ compares on average. Hashtables are competitive with linear linked lists as search structures even for very small table sizes and far superior for larger tables. *Even* if you use a fixed size hashtable (which *will* degenerate to O(N) comparisons when the number of entries is far greater than the number of buckets) you come out ahead. BTW, I believe AIX or some other commercial Unix uses hashtables attached to directory entries. As to the reuse policy, there are a number of possibilities. [Again, I will point you to look at non OS literature: garbage collection.] Here is one example: you can keep an entry on two lists: the hashtable entry for some directory and a doubly ended list -- whenever an entry is _referenced_, it is moved to the *head* of the second list. Whenever a _free_ entry is needed, you take it from the *tail* of the second list and move it to the head. You can also try something akin to `generational' garbage collectors (in that a young entry is a candidate for removal. It is `aged' only after it is referenced a number of times). Each solution has its costs and benefits. Without some profiling and *hard* staring at the code I wouldn't know which one makes more sense. My comment about scaling stands. > >A dynamically growing hashtable is a must. > > Hello Houston ? We have lost gravity! Of course we can't do that > in the kernel. Memory is way too expensive. Sorry, wrong number! No Angelica, John or Gary here. Obviously, to grow one data structure you have to take memory from somewhere else but you are assuming MALLOC/FREE. If you use an `arena', a separate memory area, you can recycle old entries and avoid malloc. > I wish all of these "instant-fs" specialists would read up on their > subject matter before they jump in with their misunderstandings! I wish all of these "instant-optimization" specialists would do some profiling before they start hacking with their misunderstandings! A friend of mine and I once implemented a stateless as well as a stateful remote file system atop 4.2BSD unix for a client (this was before Sun announced NFS). That does not make me a FS specialist and I haven't spent hours in the FS code since then but I am more familiar with it than you might think. Try not to assume that anything that does not make sense to you is automatically meaningless. Your comment above is especially irksome because _you_ asked for comments. I responded only because you are implementing what (to me) is an obviously suboptimal solution. Even if you feel I am wrong *and* a clueless newbie, it behooves you a) to politely point out where I am wrong and why, and b) to thank me because I took the trouble to comment. Ah well. -- bakul
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199704280537.BAA27601>