Date: Mon, 30 Aug 1999 02:20:33 +0400 From: Dmitrij Tejblum <tejblum@arc.hq.cti.ru> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: Dmitrij Tejblum <tejblum@arc.hq.cti.ru>, Doug Rabson <dfr@nlsystems.com>, current@FreeBSD.ORG Subject: Re: NFSv3 on freebsd<-->solaris Message-ID: <199908292220.CAA00778@tejblum.pp.ru> In-Reply-To: Your message of "Sun, 29 Aug 1999 13:12:31 PDT." <199908292012.NAA06936@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> It isn't possible to do this and still remain synchronized. If the > directory changes on the server, the client has no way of knowing > whether a cookie corresponds to the same file if you always return > a valid response. This breaks the protocol. > > A local filesystem getdirientries() call is monotonic, stateful, and > cache coherent. An NFS readdir rpc is stateless, not monotonic, and can > only approximate cache coherency. Perhaps I am mistaken, but I disagree. getdirentries() call is not monolitic and is stateless. Let see: To read a directory with the getdirentries() call, the application have to open it just like every over file and get a file descriptor. Like every over file descriptor, the open directory has associated offset, or pointer. The getdirentries() syscall supply the directory pointer to VOP_READDIR as uio_offset. (The cookie sent by NFS client is supplied to VOP_READDIR as uio_offset too.) After exit from VOP_READDIR, the uio_offset stored back in the file descriptor offset. The file offset is the only state saved. Note also that offset has nothing to do with the size of data transferred by getdirentries(), escpecially if the filesystem is not UFS. That is, the offset is actually just a handy place to store the cookie (OTOH, for any local filesystem I am aware of it indeed the offset in the physical directory.) Note that the application can do lseek on the directory, that is change the next cookie used. It is used by seekdir(). (And, of course, the application may lseek to anywhere it like, and the filesystem will have to deal with the bogus cookie. > * an NFS readdir rpc is stateless and not monotonic. The server cannot > tell the difference between a new rpc, a retry, or several different > processes on the client scanning the same directory (running at different > points in the directory). With the local applications, VOP_READDIR cannot tell the difference too. There may be several program scanning one directory, the program may do seekdir(), the only known thing is the uio_offset, that is the cookie. > > * An NFS readdir rpc can only approximate cache coherency, but that > doesn't mean you can throw cache coherency out the window. What cache coherency? Noone ever mmap() a directory, I hope. After getdirentries() syscall finished, someone may change the directory in any way (just after read() call and a regular file). After the nfs readdir reply sent to the client, someone may change the directory in any way. Again, I don't see any difference. > It > approximates cache coherency through the use of the verifier key. If > the verifier key supplied by the client is wrong, the server has to > tell it so. Otherwise the client's directory cache will get out of > sync. Nope, the verifier is for the server can validate the cookie. Cache validation need to be done my checking of mtime, like with regular files. What if the client cached all the directory, and then the directory has changed? So, the cache coherency with directories is no worse than with regular files. Note, that just like READ call return file attributes that can be used to cache validation, the READDIR call return the directory attributes, that can be used for this purpose. > Furthermore, the NFS readdir rpc has no notion of 'dead' directory entries > as far as I can tell. This means that from the point of view of an NFS > client, directories are always 'compacted'. Since clients may implement > a block cache for directories, the server cannot afford to return a valid > response if the verifier mismatches because it will screw up the client's > block cache for the directory. This is very different from the way most > local directories are scanned - filesystems such as UFS maintain dead > directory entries and thus allow a directory data block to be scanned > without any locking. We cannot use this trick with NFS. > > Add on top of that the fact that the NFS directory 'block size' may > different then a local filesystem's. NFS must translate padding > characteristics between the local filesystem and the NFS client's notion > of the directory. Even if we did support the notion of dead directory > entries in NFS, trying to translate the padding characteristics at the > same time would be fairly difficult to accomplish. Umm, I didn't understand that the translation has to do with the issue. BTW, not all local filesystems are UFS. > > :> Our NFS client used to have the same problem (a long time ago) and I put > :> code into it to re-read the directory if its cookies are stale. > : > :(According to a mail recently sent to -hackers, that doesn't work. > :In -current, the recovery code has a debugging printf(), so I guess > :the code only triggered in very rare cases (see above).) > > This works on FreeBSD clients as far as I know. That is what I thought > that email sent to hackers said... that it works w/ FreeBSD clients but > not with certain Sun clients. The email titled "readdir() broken?" say that he can work around this bug by the workaround designed for SunOS 4.1.4 (and local filesystems). His NFS client and server are -STABLE. > > :Anyway, I don't actually care what is correct NFS client behavior. I am > :saying that sending "bad cookie" error is not useful for FreeBSD sever. > : > :Dima > > My understanding is that it is part of the protocol spec. We are not > going to become incompatible with the spec. I think this is a misinterpretation of the spec (though the place apparently cannot be interpreted correctly). Again, since Sun, who invented NFS and wrote the NFS spec, had the "bug" all the time (in Solaris 2.5, 2.7 ...) then it must be not a bug. Dima To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908292220.CAA00778>