Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Jul 1995 10:32:15 +0100 (BST)
From:      Doug Rabson <dfr@render.com>
To:        Terry Lambert <terry@cs.weber.edu>
Cc:        peter@haywire.dialix.com, freebsd-current@freebsd.org
Subject:   Re: what's going on here? (NFSv3 problem?)
Message-ID:  <Pine.BSF.3.91.950725102458.230B-100000@minnow.render.com>
In-Reply-To: <9507242136.AA09885@cs.weber.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 24 Jul 1995, Terry Lambert wrote:

> > The NFSv3 code in -current uses the modification time of the directory as 
> > the verifier.  This is perhaps a slightly pessimistic solution but it 
> > should detect compactions.  The client reacts to bad cookie errors by 
> > flushing its cached information about the directory.  This seems to be a 
> > reasonable reaction to the directory being modified.
> > 
> > Can the ufs code ever compact a directory block *without* the 
> > modification time of the directory changing?  Presumably it only ever 
> > does this as a result of some other operation on the directory.
> 
> This is a good question.
> 
> I can answer it in terms of existing practice and in terms of POSIX time
> update semantics requirements.
> 
> For UFS, the compaction can only take place when an entry has failed
> lookup during creation and is therefore being created (ie: with the
> directory locked).
> 
> That is, a directory data modification is involved.
> 
> Does this mean that directory times will be updated?
> 
> Under POSIX, it does not.  The modification time update semantics in
> POSIX are file-bound.  That is, one is not required to update the
> times for directories the same as one is required to update the times
> for files.  The single exception to this is the directory read
> operations which must *mark for update* the access time.  Note that
> this does not require that it have been updated by the time a subsequent
> access has taken place.
> 
> We can easily envision compaction in a DOS style directory (after all,
> this is what Win95 does in order to support long names, effectively),
> where since the file names are attributes of the file rather than real
> directory contents, such compaction does *not* cause the directory to
> be even marked for update!
> 
> That is, depending on this behaviour has existing failure modes for
> non-POSIX file systems in any case.
> 
> I think it is a mistake to assume that the NFS exporting of file
> systems should only work when the NFS export is a client of POSIX
> file system services (and even then, it depends on "mark for update"
> referring to a change of the in core time stamp rather than a real
> marking by flagging the in core and on disk times to be updated at
> dirty page discard time -- assuming a directory is implemented as a
> file at all instead of being considered a logically seperate entity).

The current code in the NFS server generates the verifier in the 
supposedly FS independant server code.  This is the part which is wrong.  
The VOP_READDIR call should really allow the FS to return a verifier 
along with the cookies.  Of course, if I suggested another change to 
VOP_READDIR, then the flames would really start...

> > At the time, I was more interested in fixing the completely stupid 
> > assumption the NFS server was making about the FS implementation which 
> > only ever worked for UFS.  Adding a whole new layer of code between NFS 
> > and the VFS would have added maintenance problems, consistency problems 
> > (we would be caching directory information; when is the cache invalid?  
> > when should stuff be removed from it?) and needless complication.
> 
> I think the cache issue is seperate.  Specifically, directory caching
> should be generalized externally to the file system implementations
> themselves.  Potentially, it should even be a seperate layer, although
> the only thing dictating that would be the lack of a filesystem initiated
> cache callback mechanism for ensuring coherency.  Even then, that's a
> problem with the file system under the cache and should be handled in
> the file system implementation rather than being hacked around by adding
> function call layering everywhere so that it can be omitted for file
> systems that might undergo promiscuous changes (ie: NFS, AFS).
> 
> The assumptions that NFS made were, indeed *wrong*.  But since the issue
> was FS implementation independent metadata presentation, the fact is
> that the complication would have been purely NFS's problem -- and at
> that, it's caused by the statelessness NFS insists on maintaining.
> A presentation layer would have added a single function call overhead to
> the NFS based ops -- and avoided the buffer size implications that got
> strewn about as a result.  The layer itself would have disappeared,
> all but the single function call dealing with stat, when the call
> graph was collapsed in creating the file system instance.
> 
> Admittedly, this would have meant dealing with some of the messier
> stackability isses then, rather than later.
> 
> The other alternative would have been to put off the stackability
> issues until later and to eat two copies in the NFS layer (and some
> stack allocated buffer space).  This actually wouldn't have been that
> big of a hit to take in any case, since the bottleneck is the network
> (relative to the extra copy time).
> 
> Either way, it's really water under the bridge, although I'm going to
> be beating on some of the stackability issues in the near future; in
> particular, moving the directory cache up to the vnode rather than the
> inode layer and going to a per FS type vnode pool to overcome the
> inode space limitations imposed by common inode allocation both need
> to happen in the near future.  Luckily USL has kindly documented SVR4
> DNLC (a vnode based directory name lookup cache) for us, though it
> is missing the ability to keep negative cache entries (i'll fix that,
> though; it's relatively easy even in the USL code).

Well in the interests of stability and to avoid making FreeBSD 2.0 even 
later, I chose the easy solution :-).  If you can improve the situation 
by reworking the system name cache, then that can only be a good thing.

--
Doug Rabson, Microsoft RenderMorphics Ltd.	Mail:  dfr@render.com
						Phone: +44 171 251 4411
						FAX:   +44 171 251 0939




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.91.950725102458.230B-100000>