From owner-freebsd-current Tue Jul 25 02:32:45 1995 Return-Path: current-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.11/8.6.6) id CAA18215 for current-outgoing; Tue, 25 Jul 1995 02:32:45 -0700 Received: from minnow.render.com (render.demon.co.uk [158.152.30.118]) by freefall.cdrom.com (8.6.11/8.6.6) with ESMTP id CAA18202 for ; Tue, 25 Jul 1995 02:32:35 -0700 Received: (from dfr@localhost) by minnow.render.com (8.6.9/8.6.9) id KAA01139; Tue, 25 Jul 1995 10:32:16 +0100 Date: Tue, 25 Jul 1995 10:32:15 +0100 (BST) From: Doug Rabson To: Terry Lambert cc: peter@haywire.dialix.com, freebsd-current@freebsd.org Subject: Re: what's going on here? (NFSv3 problem?) In-Reply-To: <9507242136.AA09885@cs.weber.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: current-owner@freebsd.org Precedence: bulk On Mon, 24 Jul 1995, Terry Lambert wrote: > > The NFSv3 code in -current uses the modification time of the directory as > > the verifier. This is perhaps a slightly pessimistic solution but it > > should detect compactions. The client reacts to bad cookie errors by > > flushing its cached information about the directory. This seems to be a > > reasonable reaction to the directory being modified. > > > > Can the ufs code ever compact a directory block *without* the > > modification time of the directory changing? Presumably it only ever > > does this as a result of some other operation on the directory. > > This is a good question. > > I can answer it in terms of existing practice and in terms of POSIX time > update semantics requirements. > > For UFS, the compaction can only take place when an entry has failed > lookup during creation and is therefore being created (ie: with the > directory locked). > > That is, a directory data modification is involved. > > Does this mean that directory times will be updated? > > Under POSIX, it does not. The modification time update semantics in > POSIX are file-bound. That is, one is not required to update the > times for directories the same as one is required to update the times > for files. The single exception to this is the directory read > operations which must *mark for update* the access time. Note that > this does not require that it have been updated by the time a subsequent > access has taken place. > > We can easily envision compaction in a DOS style directory (after all, > this is what Win95 does in order to support long names, effectively), > where since the file names are attributes of the file rather than real > directory contents, such compaction does *not* cause the directory to > be even marked for update! > > That is, depending on this behaviour has existing failure modes for > non-POSIX file systems in any case. > > I think it is a mistake to assume that the NFS exporting of file > systems should only work when the NFS export is a client of POSIX > file system services (and even then, it depends on "mark for update" > referring to a change of the in core time stamp rather than a real > marking by flagging the in core and on disk times to be updated at > dirty page discard time -- assuming a directory is implemented as a > file at all instead of being considered a logically seperate entity). The current code in the NFS server generates the verifier in the supposedly FS independant server code. This is the part which is wrong. The VOP_READDIR call should really allow the FS to return a verifier along with the cookies. Of course, if I suggested another change to VOP_READDIR, then the flames would really start... > > At the time, I was more interested in fixing the completely stupid > > assumption the NFS server was making about the FS implementation which > > only ever worked for UFS. Adding a whole new layer of code between NFS > > and the VFS would have added maintenance problems, consistency problems > > (we would be caching directory information; when is the cache invalid? > > when should stuff be removed from it?) and needless complication. > > I think the cache issue is seperate. Specifically, directory caching > should be generalized externally to the file system implementations > themselves. Potentially, it should even be a seperate layer, although > the only thing dictating that would be the lack of a filesystem initiated > cache callback mechanism for ensuring coherency. Even then, that's a > problem with the file system under the cache and should be handled in > the file system implementation rather than being hacked around by adding > function call layering everywhere so that it can be omitted for file > systems that might undergo promiscuous changes (ie: NFS, AFS). > > The assumptions that NFS made were, indeed *wrong*. But since the issue > was FS implementation independent metadata presentation, the fact is > that the complication would have been purely NFS's problem -- and at > that, it's caused by the statelessness NFS insists on maintaining. > A presentation layer would have added a single function call overhead to > the NFS based ops -- and avoided the buffer size implications that got > strewn about as a result. The layer itself would have disappeared, > all but the single function call dealing with stat, when the call > graph was collapsed in creating the file system instance. > > Admittedly, this would have meant dealing with some of the messier > stackability isses then, rather than later. > > The other alternative would have been to put off the stackability > issues until later and to eat two copies in the NFS layer (and some > stack allocated buffer space). This actually wouldn't have been that > big of a hit to take in any case, since the bottleneck is the network > (relative to the extra copy time). > > Either way, it's really water under the bridge, although I'm going to > be beating on some of the stackability issues in the near future; in > particular, moving the directory cache up to the vnode rather than the > inode layer and going to a per FS type vnode pool to overcome the > inode space limitations imposed by common inode allocation both need > to happen in the near future. Luckily USL has kindly documented SVR4 > DNLC (a vnode based directory name lookup cache) for us, though it > is missing the ability to keep negative cache entries (i'll fix that, > though; it's relatively easy even in the USL code). Well in the interests of stability and to avoid making FreeBSD 2.0 even later, I chose the easy solution :-). If you can improve the situation by reworking the system name cache, then that can only be a good thing. -- Doug Rabson, Microsoft RenderMorphics Ltd. Mail: dfr@render.com Phone: +44 171 251 4411 FAX: +44 171 251 0939