Date: Wed, 24 Feb 1999 00:55:51 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: toasty@home.dragondata.com (Kevin Day) Cc: hackers@FreeBSD.ORG Subject: Re: ESTALE the best approach? Message-ID: <199902240055.RAA22403@usr09.primenet.com> In-Reply-To: <199902210737.BAA21850@home.dragondata.com> from "Kevin Day" at Feb 21, 99 01:37:42 am
next in thread | previous in thread | raw e-mail | index | archive | help
> Forgetting standards and past practices, is ESTALE a good approach to > dealing with an NFS outage/reboot/whatever? > > Very few programs know how to deal with ESTALE, and I really have yet to see > one that knows how to recover from it. Programs have to be able to deal with errors. They either have specific per-error strategies, or they have an "all other errors" strategy. Is it right to put the onus of reestablishing the state onto the stateful application, instead of on the stateless NFS? The answer has traditionally been "yes". There is a fundamental breakdown in the idea of coding for transactions, and this is what has driven this decision. At some future point, when transaction guaranbtees are made by the kernel to user space applications, it may well be a good idea to revisit the idea of state recovery. Alternatively, at that time, it will still be the onus of the application to deal with transaction failures, either via implicit roll-back, or retry. > 28591 eggdrop 0.000037 CALL read(0x6,0xd6800,0x400) > 28591 eggdrop 0.000379 RET read -1 errno 70 Stale NFS file handle Whatever is calling this is not checking the return value for read. This is acceptable for tty's, since the tty is supposed to guarantee a signal to the process group leader, which is then sent to each process in the group as the group leade fails away, leaving each process a group leader (and thus subject to signal requirements). Some people will argue against this, using POSIX as evidence, but POSIX is a documentation of existing System V derived UNIX practice, so those people are arguably wrong because That's What SVR4 Does. > 28591 eggdrop 0.000381 RET read -1 errno 70 Stale NFS file handle > > Because they not realize that ESTALE is a fatal condition, or lots of > programs tend to just go bezerk at having a FD closed on them... It's not so much that they don't know how to handle ESTALE, it's more that they are (evilly) ignoring the -1 return from read. > I've been experimenting here with making any ESTALE return something other > than ESTALE, to see what happens. > > EBADF was nearly as bad, as most programs that couldn't deal with ESTALE > probably didn't expect a fd that they had already opened to be suddenly > closed. Plus it was never closed. The vnode is still hanging out. The EBADF'ed descriptor is unrecoverable, since only a stupid program would close an already closed fd. So the struct file * pointing to the vnode is also still hanging out. > EINVAL seemed to make most programs die on their own, but not all. Some also > left some very cryptic/wrong diagnostics behind. Well, that's because you gave it the wrong error code. 8-). > > My next step is going to be to make nfsrv_fhtovp(?) actually kill the > process instead of returning anything, in a final attempt to fix this, > locally. Is there some justification for treating ESTALE like a transient > error anyway? Did some implementation somewhere eventually restore things? Yes. The main problem is that the ESTALE does not trigger a remount attempt, like it should, and a subsequent cleanup of the mount specific portions of the outstanding NFSnodes. Generally, the remount attempt is signalled by statd. The ESTALE is actually a result of the mount going south, and not getting reset like it's supposed to be. In theory, it should never make user space, unless the remount attempts are time or attempt-count constrained by mount options. The only way you get a stale node is a server reboot, and that's as a result of the "security" cruft that was nont implemented link-layer like it should have been (packet sequence randomization, etc.). Basically, an operation was attempted against the server with an NFS mount that was invalid because the server reset the sequence base out from under you. NFS security was never intended to use weak techniques against session replay of plaintext; instead, it was intended to use DES or Kerberos, and not protect against replay at all (who care if you replay garbage?). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199902240055.RAA22403>