Date: Thu, 04 Jul 2002 19:30:52 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Richard Sharpe <rsharpe@ns.aus.com> Cc: freebsd-hackers@freebsd.org Subject: Re: Adding readdir entries to the name cache ... Message-ID: <3D2504DC.36D046D7@mindspring.com> References: <Pine.LNX.4.33.0207051104570.2925-100000@ns.aus.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Richard Sharpe wrote: > I am interested in hearing opinions on whether or not it is useful to > preload entries into the name cache that are obtained with > readdir/getdirentries/getdents. It depends. In a past life at another company, we were able to get a 30%+ performance improvement simply by doing this type of caching. The system where the cacheing was being done, though, faulted the inodes in asynchornously. Basically, this meant that, for something like Samba, which equates every directory operation including open with a corresponding "stat", that the object to be stat'ed didn't have the added latency that would have resulted from the inode fetch, had it been done serially. We got another 15% out of it by returning the stat information with the directory entries to user space, all as one operation, with a getdirentries interface that returned ful stat information. Then we got another 15% performance by going the globbing in the kernel, instead of transferring all the data to user space, and doing the globbing there. This was for a wire protocol that asked first for directories, then for files, and which frequently used globbind that excluded at least 20% of the directory contents, and tended to do the directory/file searches serially *every time*. > We recently had a problem of the kernel hanging because we were loading > entries into the name that were being picked up using > readdir/getdirentries/getdents. These entries were inserted using > cache_entry, which seems not to check that the entry is already in the > cache, it simply inserts it. (Normally, cache_entry is called because a VOP > routine was told to MAKEENTRY). It seems to me that you could check for equality on insertion if there was a hash collision. THis is what the SVR4 DNLC code does. Note that you can get another 8-12% by making negative cache entries, since DOS/Windows clients tend to search the full path each time, even though they have already received a successful response in the past (i.e. there is no client caching for this information, because there is no distributed coherency protocol that would permit server invalidation of the cache). Unmodified, SVR4 DNLC can not support negative cache entries (there need to be two line changes). Note that negative cache entries are twice as valuable as positive entries, since on average, a linear search for an object which is present will need to iterate only 50% of the search space, whereas an object which is not present requires that you iterate 100% of the search space. > It seemed to me that one way to fix that would be to check whether an > entry was in the name cache before trying to load it, however, I wonder at > the wisdom of trying to do so, and feel that the current situation might > be more than satisfactory. No. It's advisable. You must handle the hash insertion collision in any case; the extra overhead is in the traversal and compare of the collision chain. For correct selection of hash bucket size and hash algorithm, the only time you will be comparing, statistically speaking, will be when the entry already exists. This basically means that the check should cost you a single compare in addition to the hash chain pointer, which you have to look at anyway, to do the insertion, in case the bucket is non-empty. > The most obvious reason for loading readdir entries is to handle ls -l. > Since ls has to read the directory enties, which it does with > getdirentries, and then has to stat the entries, which would cause it to > do a lookup on the name and then retrieve the vnode/inode, it would seem > to be a win if we preload the entries in the name cache. Yes. This is a win. A bigger win is a dirent structure that contains stat, as well as name information: you end uf with 50% fewer user/kernel boundary crossings. > However, ls seems to call lstat in the same order that the files are in > the directory, and a normal clock approach to directories would yield > exactly the same result. Further, in the cases that the user did not want > a -l, we would avoid adding many potentially useless names to the name > cache and reducing its performance. This is because the sort occurs first. An unsorted "ls" (which is available -- see the man page) doesn't have this issue. It would also be useful for "find" and "mtree", FWIW. The last time I looked into pre-faulting the vnodes on directory searches, though, it was very difficult in FreeBSD, because it could not handle inodes as existing in faulted pages, and instead required direct I/O: so you paid the serial latency no matter when you did it, because the process blocked. > [1] Samba, because it has to support the Windows case insensitive file > system, must do some pretty ugly things :-) When a client asks that a file > be opened, for example, Samba tries with exactly the case that was > presented. If that fails, it must do a readdir scan of the directory so it > can do a case-insensitive match. So, even negative caching does not buy us > much in the the case of Samba. What would help is a case insensitive > filesystem. It is useful to be able to do "case sensitive on storage, case insensitive on lookup" on a per process basis. The easiest is if you wire this in as a flag on the proc itself. The normal way this is done is a flag to sfork, but... it should also be possible to have the proc open itself in procfs, and then ioctl() down a flag setting for this. Be aware that if you *do* end up prefaulting everything, that it is very application dependent whether you will be gaining, or merely thrashing the cache. For DOS/Windows/Mac client machines of FS's exported by a host OS, and for which the exported FS has directories in the execution path of the client, it's very valuable. But you need to be able to turn it off (I would say "default it to off") for uses of FS's that have poorer locality. Probably, you would want to make it a sysctl: web servers tend to have very good locality, just like file servers. New servers, etc., tend to have *terrible* locality. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D2504DC.36D046D7>