Date: Mon, 24 Jul 95 15:36:48 MDT From: terry@cs.weber.edu (Terry Lambert) To: dfr@render.com (Doug Rabson) Cc: peter@haywire.dialix.com, freebsd-current@freebsd.org Subject: Re: what's going on here? (NFSv3 problem?) Message-ID: <9507242136.AA09885@cs.weber.edu> In-Reply-To: <Pine.BSF.3.91.950724112353.12542B-100000@minnow.render.com> from "Doug Rabson" at Jul 24, 95 11:50:14 am
next in thread | previous in thread | raw e-mail | index | archive | help
> > Most file systems do not provide a generation count on directory blocks > > with which to validate the "cookie". > > > > With that in mind, the "cookie" is typically interpreted either as an > > entry offset or as a byte offset of entry, either in the block or in > > the directory. > > The NFSv3 code in -current uses the modification time of the directory as > the verifier. This is perhaps a slightly pessimistic solution but it > should detect compactions. The client reacts to bad cookie errors by > flushing its cached information about the directory. This seems to be a > reasonable reaction to the directory being modified. > > Can the ufs code ever compact a directory block *without* the > modification time of the directory changing? Presumably it only ever > does this as a result of some other operation on the directory. This is a good question. I can answer it in terms of existing practice and in terms of POSIX time update semantics requirements. For UFS, the compaction can only take place when an entry has failed lookup during creation and is therefore being created (ie: with the directory locked). That is, a directory data modification is involved. Does this mean that directory times will be updated? Under POSIX, it does not. The modification time update semantics in POSIX are file-bound. That is, one is not required to update the times for directories the same as one is required to update the times for files. The single exception to this is the directory read operations which must *mark for update* the access time. Note that this does not require that it have been updated by the time a subsequent access has taken place. We can easily envision compaction in a DOS style directory (after all, this is what Win95 does in order to support long names, effectively), where since the file names are attributes of the file rather than real directory contents, such compaction does *not* cause the directory to be even marked for update! That is, depending on this behaviour has existing failure modes for non-POSIX file systems in any case. I think it is a mistake to assume that the NFS exporting of file systems should only work when the NFS export is a client of POSIX file system services (and even then, it depends on "mark for update" referring to a change of the in core time stamp rather than a real marking by flagging the in core and on disk times to be updated at dirty page discard time -- assuming a directory is implemented as a file at all instead of being considered a logically seperate entity). All that said, yes, in UFS, it happens to work. Currently. 8-/. > > The stat structure passed around internally is larger than the stat > > structure expected by NFS. > > > > Rather than fix the view of things at the time it was exported to > > NFS, the internal buffer representation for all file system capable > > of being exported was changed. > > > > I can't say I'm not glad that this is coming back to haunt us. > > At the time, I was more interested in fixing the completely stupid > assumption the NFS server was making about the FS implementation which > only ever worked for UFS. Adding a whole new layer of code between NFS > and the VFS would have added maintenance problems, consistency problems > (we would be caching directory information; when is the cache invalid? > when should stuff be removed from it?) and needless complication. I think the cache issue is seperate. Specifically, directory caching should be generalized externally to the file system implementations themselves. Potentially, it should even be a seperate layer, although the only thing dictating that would be the lack of a filesystem initiated cache callback mechanism for ensuring coherency. Even then, that's a problem with the file system under the cache and should be handled in the file system implementation rather than being hacked around by adding function call layering everywhere so that it can be omitted for file systems that might undergo promiscuous changes (ie: NFS, AFS). The assumptions that NFS made were, indeed *wrong*. But since the issue was FS implementation independent metadata presentation, the fact is that the complication would have been purely NFS's problem -- and at that, it's caused by the statelessness NFS insists on maintaining. A presentation layer would have added a single function call overhead to the NFS based ops -- and avoided the buffer size implications that got strewn about as a result. The layer itself would have disappeared, all but the single function call dealing with stat, when the call graph was collapsed in creating the file system instance. Admittedly, this would have meant dealing with some of the messier stackability isses then, rather than later. The other alternative would have been to put off the stackability issues until later and to eat two copies in the NFS layer (and some stack allocated buffer space). This actually wouldn't have been that big of a hit to take in any case, since the bottleneck is the network (relative to the extra copy time). Either way, it's really water under the bridge, although I'm going to be beating on some of the stackability issues in the near future; in particular, moving the directory cache up to the vnode rather than the inode layer and going to a per FS type vnode pool to overcome the inode space limitations imposed by common inode allocation both need to happen in the near future. Luckily USL has kindly documented SVR4 DNLC (a vnode based directory name lookup cache) for us, though it is missing the ability to keep negative cache entries (i'll fix that, though; it's relatively easy even in the USL code). The stackability issues must be resolved to support both user space file system developement and source level debugging, and to allow for general support of a per block file compression layer that operates only on files, not directories. > I added code as part of this fix which would deal with unaligned UFS > directory reads, more or less on the lines of the approach you suggested. I noticed (and appreciate!) the code there. It helps the restart stuff immensely. The code pretty much has to be there for a VM86() based INT 21 redirector to map UFS volumes as DOS drives under VM86() based DOS emulation in any case. The lack of an opendir/closedir type paradigm in the DOS FindFirst/FindNext directory scanning routines makes this especially necessary, unless we wanted to keep around LRU lists of some finite number of contexts for DOS searches outstanding (what Novell does in their DOS redirector). It also allows a "DOS porting interface" for DOS code that does INT 21 access, if the interface is exported at the FS system call layer by using a VFS layer specific ioctl() for FindFirst/FindNext. Wine wants this kind of portability API. > The FS reads from the aligned address. NFS then finds from the information > returned by the FS the first entry whose cookie is greater or equal to the > cookie sent by the client. The only restriction this places on VFS for > directory cookies is that they increase monotonically in a directory. This is The Right Way. 8-). > In the case of a compacted directory block, the client may recieve > filenames it has already seen or it may miss a few entries. It will > never recieve corrupt information. Right. I believe it is the responsibility of the client to deal with this fact. Otherwise we are screwed at the outset regarding kernel preemption and SMP kernel reentrancy, both short-term issues in terms of the need to provide file system multithreading. So I definitely don't have problems with that code. > The current v2 server has an adequate strategy for dealing with directory > compaction for all read sizes, IMHO. The directory verifier is *not* > optional in NFSv3. The only optional part AFAIK is the use of READDIRPLUS by > the client to read file attributes with the names. Both READDIR and > READDIRPLUS *must* implement a verifier strategy. I'm much less concerned with the client side of things, but the verifier *does* prevent the server from being a chincy minimal implementation, and that's the important thing. It remains to be seen if using the date as the verifier is really a valid thing to do for non-POSIX compliant file system implementaions -- or POSIX complient implementations where the directory is not a file (NT, VMS, etc.). I think the answer must be "no". > A server *can* choose to return zero for a verifier but only if the > cookies it generates are *always* valid, e.g. for read-only media. From > rfc1813, section 3.3.16: > > One implementation of the cookie-verifier mechanism might > be for the server to use the modification time of the > directory. This might be overly restrictive, however. A > better approach would be to record the time of the last > directory modification that changed the directory > organization in a way that would make it impossible to > reliably interpret a cookie. Servers in which directory > cookies are always valid are free to use zero as the > verifier always. Yes. This speaks to the organization (or rather the lack of it) in the VFS framework regarding directory vs. file operations. Specifically, it should be possible to get even callbacks ala Andrew at the presentation layer such that file system events that affect NFS exported volumes are in fact propagated to the NFS layer so it can act appropriately. Obviously, this code isn't there yet. 8-(. Regards, Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9507242136.AA09885>