Date: Sun, 22 Jun 2003 15:53:04 +0400 (MSD) From: Andrey Alekseyev <uitm@blackflag.ru> To: truckman@FreeBSD.org (Don Lewis) Cc: freebsd-hackers@freebsd.org Subject: Re: open() and ESTALE error Message-ID: <200306221153.PAA00759@slt.oz> In-Reply-To: <200306220635.h5M6ZXM7066060@gw.catspoiler.org> from Don Lewis at "Jun 21, 3 11:35:33 pm"
next in thread | previous in thread | raw e-mail | index | archive | help
Don, > When a file is renamed on the server, its file handle remains valid. Actually I was wrong in my assumption on how names are purged from the namecache. And I didn't mean an operation with a file opened on the client. And it actually happens that this sequence of commands will get you ENOENT (not ESTALE) on the client because of a new lookup in #4: 1. server: echo "1111" > 1 2. client: cat 1 3. server: mv 1 2 4. client: cat 1 <--- ENOENT here Name cache can be purged by nfs_lookup(), if the latter finds that the capability numbers doesn't match. In this case, nfs_lookup() will send a new "lookup" RPC request to the server. Name cache can also be purged from getnewvnode() and vclean(). Which code does that for the above scenario it's quite obscure to me. Yes, my knowledge is limited :) > Here's the output of the script: > > #!/bin/sh -v > rm -f file1 file2 > ssh -n mousie rm -f file1 file2 > echo foo > file1 > echo bar > file2 > ssh -n mousie cat file1 > foo > ssh -n mousie cat file2 > bar > tail -f file1 & > sleep 1 > foo > cat file1 > foo > cat file2 > bar > ssh -n mousie 'mv file1 tmpfile; mv file2 file1; mv tmpfile file2' > cat file1 > bar > cat file2 > foo > echo baz >> file2 > sleep 1 > baz > kill $! > Terminated > ssh -n mousie cat file1 > bar > ssh -n mousie cat file2 > foo > baz > > Notice that immediately after the files are swapped on the server, the > cat commands on the client are able to immediately detect that the files > have been interchanged and they open the correct files. The tail > command shows that the original handle for file1 remains valid after the > rename operations and when more data is written to file2 after the > interchange, the data is appended to the file that was formerly file1. By the way, what were the values of acregmin/acregmax/acdirmin/acdirmax and also the value of vfs.nfs.access_cache_timeout in your tests? I believe, the results of your test patterns heavily depend on the NFS attributes cache tunables (which happen to affect all cycles of NFS operation) and on the command execution timing as well. Moreover, I'm suspect that all this is badly linked with the type and sequence of operations on both the server and the client. Recall, I was about to fix just *one* common scenario :) With different values of acmin/acmax and access_cache_timeout, and manual operations I was able to achieve the result you consider as "proper" above and also, the "wrong" effect that you described below. > And its output: > > #!/bin/sh -v > rm -f file1 file2 > ssh -n mousie rm -f file1 file2 > echo foo > file1 > echo bar > file2 > ssh -n mousie cat file1 > foo > ssh -n mousie cat file2 > bar > sleep 1 > cat file1 > foo > cat file2 > bar > ssh -n mousie 'mv file1 file2' > cat file2 > foo > cat file1 > cat: file1: No such file or directory > > Even though file2 was unlinked and replaced by file1 on the server, the > client immediately notices the change and is able to open the proper > file. My tests always eventually produce ESTALE for file2 here. However, I suspect their must be configurations where I won't get ESTALE. > Conclusion: relying on seeing an ESTALE error to retry is insufficient. > Depending on how files are manipulated, open() may successfully return a > descriptor for the wrong file and even enable the contents of that file > to be overwritten. The namei()/lookup() code is broken and that's what > needs to be fixed. I don't think it's namei()/lookup() which is broken. I'm afraid, the name and attribute caching logic is somewhat far from ideal. Namecache routines seem to work fine, they just do actual parsing/lookup of a pathname. Other functions manipulate with the cached names basing on their understanding of the cache validity (both namecache and cached dir/file attributes). I've also done a number of tcpdump's for different test patterns and I believe, what happens with the cached vnode may depend on the results of the "access" RPC request to the server. As I said before, I was not going to fix all the NFS inefficiencies related to heavily shared file environments. However, I still believe that open-retry-on-ESTALE *may* help people to avoid certain erratic conditions. At least, I think that having this functionality switchable with an additional sysctl variable, *may* help lots of people in the black art of tuning NFS <attribute> caching. As there are no exact descriptions on how all of this behaves, people usually have to experiment with their own certain environments. Also, I agree it's not the "fix" of everything. And I didn't even mention I want this to be integrated in the source :) Actually, I know that it works for what I've been fixing locally and just asked for technical comments about possible "vnode leakage" and nameidata initialization which nobody provided yet ;-P I appreciate *very much* all of the answers, though. Definitely, a food for thought, but I'm a little bit tired of this issue already :) Thanks again for your efforts.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200306221153.PAA00759>