Date: Wed, 24 Feb 1999 02:13:39 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: toasty@home.dragondata.com (Kevin Day) Cc: tlambert@primenet.com, hackers@FreeBSD.ORG Subject: Re: ESTALE the best approach? Message-ID: <199902240213.TAA26428@usr09.primenet.com> In-Reply-To: <199902240132.TAA25796@home.dragondata.com> from "Kevin Day" at Feb 23, 99 07:32:47 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > Programs have to be able to deal with errors. They either have > > specific per-error strategies, or they have an "all other errors" > > strategy. > > The problem I've seen is that some poorly designed software lists what it > thinks are fatal, and everything else is to be retried. Well, that's poorly designed code. It needs to be fixed. > Here for example, the executable checks for a few errors, if they don't > match what it thinks are fatal, it retries. I'd think that good programming > practice should be the reverse. Add what you know to be temporary, and > everything else (and errors you don't even know about) should be fatal. Right. > > > Because they not realize that ESTALE is a fatal condition, or lots of > > > programs tend to just go bezerk at having a FD closed on them... > > > > It's not so much that they don't know how to handle ESTALE, it's > > more that they are (evilly) ignoring the -1 return from read. > > Looking at the read(2) man page... The only error that could happen that > isn't a result of a programming error that could happen on a standard file > is 'EIO', which would probably be a hardware error. (At that point, when > hardware fails, I consider any application's behavior as 'undefined', on > the PC architecture) See my response to Matt. I believe that ESTALE should not be sent to user space under any circumstances. However, that said, POSIX requires that programs treat non-understood error codes as fatal. > I can see why people aren't checking for errors on a read(2), really... Lots > of FreeBSD's internal utilities don't do it, even. :) (however, most don't > retry ad nausem in response to an error) Well, they're broken, and need to be fixed, eventually. > > > EBADF was nearly as bad, as most programs that couldn't deal with ESTALE > > > probably didn't expect a fd that they had already opened to be suddenly > > > closed. > > > > Plus it was never closed. The vnode is still hanging out. The > > EBADF'ed descriptor is unrecoverable, since only a stupid program > > would close an already closed fd. So the struct file * pointing to > > the vnode is also still hanging out. > > Eeep, I hadn't thought of that. I guess I realize that has to be true, > otherwise you'd be getting EBADF. You should be. The bug is that FreeBSD is structurally incapable of doing a revocation, short of deadfs. > > The only way you get a stale node is a server reboot, and that's as a > > result of the "security" cruft that was nont implemented link-layer > > like it should have been (packet sequence randomization, etc.). Matt very correctly pointed out that you could get it on a server file deletion in the absence of locking (a lock would count as an open file reference, keeping the file around, even if deleted). Ignoring locking, the ESTALE shoudl result in revocation + EBADF (see other posting). > I'm actually seeing this on a network that has an 100MB ethernet segment > exculsively for NFS. > > I'll see this: > > nfs server home.internal:/home: not responding > nfs server home.internal:/home: is alive again > > in my syslog, with the timestamps being identical, and chances are, after > that, one or two processes goes bezerk afterwards. Right. This is the remount problem, not updating the existing NFSnodes. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199902240213.TAA26428>