Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Feb 1999 00:55:51 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        toasty@home.dragondata.com (Kevin Day)
Cc:        hackers@FreeBSD.ORG
Subject:   Re: ESTALE the best approach?
Message-ID:  <199902240055.RAA22403@usr09.primenet.com>
In-Reply-To: <199902210737.BAA21850@home.dragondata.com> from "Kevin Day" at Feb 21, 99 01:37:42 am

next in thread | previous in thread | raw e-mail | index | archive | help
> Forgetting standards and past practices, is ESTALE a good approach to
> dealing with an NFS outage/reboot/whatever?
> 
> Very few programs know how to deal with ESTALE, and I really have yet to see
> one that knows how to recover from it.

Programs have to be able to deal with errors.  They either have
specific per-error strategies, or they have an "all other errors"
strategy.

Is it right to put the onus of reestablishing the state onto the
stateful application, instead of on the stateless NFS?  The answer
has traditionally been "yes".

There is a fundamental breakdown in the idea of coding for
transactions, and this is what has driven this decision.

At some future point, when transaction guaranbtees are made by the
kernel to user space applications, it may well be a good idea to
revisit the idea of state recovery.

Alternatively, at that time, it will still be the onus of the
application to deal with transaction failures, either via implicit
roll-back, or retry.


>  28591 eggdrop  0.000037 CALL  read(0x6,0xd6800,0x400)
>  28591 eggdrop  0.000379 RET   read -1 errno 70 Stale NFS file handle

Whatever is calling this is not checking the return value for read.

This is acceptable for tty's, since the tty is supposed to guarantee
a signal to the process group leader, which is then sent to each
process in the group as the group leade fails away, leaving each
process a group leader (and thus subject to signal requirements).

Some people will argue against this, using POSIX as evidence, but
POSIX is a documentation of existing System V derived UNIX practice,
so those people are arguably wrong because That's What SVR4 Does.


>  28591 eggdrop  0.000381 RET   read -1 errno 70 Stale NFS file handle
> 
> Because they not realize that ESTALE is a fatal condition, or lots of
> programs tend to just go bezerk at having a FD closed on them...

It's not so much that they don't know how to handle ESTALE, it's
more that they are (evilly) ignoring the -1 return from read.

> I've been experimenting here with making any ESTALE return something other
> than ESTALE, to see what happens.
> 
> EBADF was nearly as bad, as most programs that couldn't deal with ESTALE
> probably didn't expect a fd that they had already opened to be suddenly
> closed.

Plus it was never closed.  The vnode is still hanging out.  The
EBADF'ed descriptor is unrecoverable, since only a stupid program
would close an already closed fd.  So the struct file * pointing to
the vnode is also still hanging out.


> EINVAL seemed to make most programs die on their own, but not all. Some also
> left some very cryptic/wrong diagnostics behind.

Well, that's because you gave it the wrong error code.  8-).


> 
> My next step is going to be to make nfsrv_fhtovp(?) actually kill the
> process instead of returning anything, in a final attempt to fix this,
> locally. Is there some justification for treating ESTALE like a transient
> error anyway? Did some implementation somewhere eventually restore things?

Yes.  The main problem is that the ESTALE does not trigger a remount
attempt, like it should, and a subsequent cleanup of the mount specific
portions of the outstanding NFSnodes.

Generally, the remount attempt is signalled by statd.

The ESTALE is actually a result of the mount going south, and not
getting reset like it's supposed to be.  In theory, it should never
make user space, unless the remount attempts are time or attempt-count
constrained by mount options.

The only way you get a stale node is a server reboot, and that's as a
result of the "security" cruft that was nont implemented link-layer
like it should have been (packet sequence randomization, etc.).

Basically, an operation was attempted against the server with an
NFS mount that was invalid because the server reset the sequence
base out from under you.

NFS security was never intended to use weak techniques against session
replay of plaintext; instead, it was intended to use DES or Kerberos,
and not protect against replay at all (who care if you replay garbage?).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199902240055.RAA22403>