Date: Sat, 21 Jun 2003 23:35:33 -0700 (PDT) From: Don Lewis <truckman@FreeBSD.org> To: uitm@blackflag.ru Cc: freebsd-hackers@FreeBSD.org Subject: Re: open() and ESTALE error Message-ID: <200306220635.h5M6ZXM7066060@gw.catspoiler.org> In-Reply-To: <200306202216.CAA01809@slt.oz>
next in thread | previous in thread | raw e-mail | index | archive | help
On 21 Jun, Andrey Alekseyev wrote: > Don, > >> old vnode and its associated file handle. If the file on the server was >> renamed and not deleted, the server won't return ESTALE for the handle > > I'm all confused and messed up :) Actually, a rename on the server is not > the same as sillyrename on the client. If you rename a file on the > server for which there is a cached file handle on the client, next time > the client will use its cached file handle, it'll get ESTALE from the server. > I don't know how this happens, though. Until I dig more around all the > rename paraphernalia, I won't know. If someone can clear this out, please > do. It'll be much appreciated. At this time I can't link this with the > inode generation number changes (as there is no new inode allocated when > the file is renamed). When a file is renamed on the server, its file handle remains valid. I had some time to write some scripts to exercise this stuff and discovered some interesting things. The NFS server is a 4.8-stable box named mousie, and the NFS client is running 5.1-current. The tests were run in my NFS-mounted home directory. Here's the first script: #!/bin/sh -v rm -f file1 file2 ssh -n mousie rm -f file1 file2 echo foo > file1 echo bar > file2 ssh -n mousie cat file1 ssh -n mousie cat file2 tail -f file1 & sleep 1 cat file1 cat file2 ssh -n mousie 'mv file1 tmpfile; mv file2 file1; mv tmpfile file2' cat file1 cat file2 echo baz >> file2 sleep 1 kill $! ssh -n mousie cat file1 ssh -n mousie cat file2 Here's the output of the script: #!/bin/sh -v rm -f file1 file2 ssh -n mousie rm -f file1 file2 echo foo > file1 echo bar > file2 ssh -n mousie cat file1 foo ssh -n mousie cat file2 bar tail -f file1 & sleep 1 foo cat file1 foo cat file2 bar ssh -n mousie 'mv file1 tmpfile; mv file2 file1; mv tmpfile file2' cat file1 bar cat file2 foo echo baz >> file2 sleep 1 baz kill $! Terminated ssh -n mousie cat file1 bar ssh -n mousie cat file2 foo baz Notice that immediately after the files are swapped on the server, the cat commands on the client are able to immediately detect that the files have been interchanged and they open the correct files. The tail command shows that the original handle for file1 remains valid after the rename operations and when more data is written to file2 after the interchange, the data is appended to the file that was formerly file1. My second script is an attempt to reproduce the open() -> ESTALE error. #!/bin/sh -v rm -f file1 file2 ssh -n mousie rm -f file1 file2 echo foo > file1 echo bar > file2 ssh -n mousie cat file1 ssh -n mousie cat file2 sleep 1 cat file1 cat file2 ssh -n mousie 'mv file1 file2' cat file2 cat file1 And its output: #!/bin/sh -v rm -f file1 file2 ssh -n mousie rm -f file1 file2 echo foo > file1 echo bar > file2 ssh -n mousie cat file1 foo ssh -n mousie cat file2 bar sleep 1 cat file1 foo cat file2 bar ssh -n mousie 'mv file1 file2' cat file2 foo cat file1 cat: file1: No such file or directory Even though file2 was unlinked and replaced by file1 on the server, the client immediately notices the change and is able to open the proper file. Since my scripts weren't provoking the reported problem, I wondered if this was a 4.x vs. 5.x problem, or if the problem didn't occur in the current working directory, or if the problem only occurred if a directory was specified in the file path. I modified my scripts to work with a subdirectory and got rather different results: #!/bin/sh -v rm -f dir/file1 dir/file2 ssh -n mousie rm -f dir/file1 dir/file2 echo foo > dir/file1 echo bar > dir/file2 ssh -n mousie cat dir/file1 foo ssh -n mousie cat dir/file2 bar tail -f dir/file1 & sleep 1 foo cat dir/file1 foo cat dir/file2 bar ssh -n mousie 'mv dir/file1 dir/tmpfile; mv dir/file2 dir/file1; mv dir/tmpfile dir/file2' sleep 120 cat dir/file1 bar cat dir/file2 bar echo baz >> dir/file2 sleep 1 kill $! Terminated ssh -n mousie cat dir/file1 bar baz ssh -n mousie cat dir/file2 foo Even after waiting long enough for the cached attributes to time out, the one of cat commands on the client opened the incorrect file and when the shell executed the echo command to append to one of the files, the wrong file was opened and appended to. Conclusion, the client is confused and retrying open() on an ESTALE error is insufficient to fix the problem. By specifying a directory in the path, I'm was also able to reproduce the ESTALE error one time, but now I always get: #!/bin/sh -v rm -f dir/file1 dir/file2 ssh -n mousie rm -f dir/file1 dir/file2 echo foo > dir/file1 echo bar > dir/file2 ssh -n mousie cat dir/file1 foo ssh -n mousie cat dir/file2 bar sleep 1 cat dir/file1 foo cat dir/file2 bar ssh -n mousie 'mv dir/file1 dir/file2' sleep 120 cat dir/file2 foo cat dir/file1 foo unless I decrease the sleep time: #!/bin/sh -v rm -f dir/file1 dir/file2 ssh -n mousie rm -f dir/file1 dir/file2 echo foo > dir/file1 echo bar > dir/file2 ssh -n mousie cat dir/file1 foo ssh -n mousie cat dir/file2 bar sleep 1 cat dir/file1 foo cat dir/file2 bar ssh -n mousie 'mv dir/file1 dir/file2' # sleep 120 sleep 1 cat dir/file2 cat: dir/file2: Stale NFS file handle cat dir/file1 foo In one of my tests, I got an xauth warning from ssh, which made me think that maybe the manipulation of my .Xauthority file might affect the results. When I reran the original tests without X11 forwarding, I got results similar to those that I got when I specified a directory in the path: #!/bin/sh -v rm -f file1 file2 ssh -x -n mousie rm -f file1 file2 echo foo > file1 echo bar > file2 ssh -x -n mousie cat file1 foo ssh -x -n mousie cat file2 bar sleep 1 cat file1 foo cat file2 bar ssh -x -n mousie 'mv file1 file2' cat file2 cat: file2: Stale NFS file handle cat file1 foo Conclusion: relying on seeing an ESTALE error to retry is insufficient. Depending on how files are manipulated, open() may successfully return a descriptor for the wrong file and even enable the contents of that file to be overwritten. The namei()/lookup() code is broken and that's what needs to be fixed.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200306220635.h5M6ZXM7066060>