From owner-freebsd-hackers@FreeBSD.ORG  Fri Jun 20 12:20:17 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id F29DA37B401
	for <freebsd-hackers@freebsd.org>;
	Fri, 20 Jun 2003 12:20:16 -0700 (PDT)
Received: from frontend3.aha.ru (elk.zenon.net [213.189.198.216])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0646643F75
	for <freebsd-hackers@freebsd.org>;
	Fri, 20 Jun 2003 12:20:15 -0700 (PDT)
	(envelope-from uitm@blackflag.ru)
Received: from [195.2.90.70] (HELO slt.oz)
  by frontend3.aha.ru (CommuniGate Pro SMTP 4.0.6)
  with ESMTP id 8362161; Fri, 20 Jun 2003 23:20:12 +0400
Received: (from uitm@localhost)
	by slt.oz (8.8.8/8.8.8) id XAA00938;
	Fri, 20 Jun 2003 23:22:05 +0400 (MSD)
From: Andrey Alekseyev <uitm@blackflag.ru>
Message-Id: <200306201922.XAA00938@slt.oz>
In-Reply-To: <3EF2CDF0.6014ACB6@mindspring.com> from Terry Lambert at "Jun 20,
	3 02:03:44 am"
To: tlambert2@mindspring.com (Terry Lambert)
Date: Fri, 20 Jun 2003 23:22:04 +0400 (MSD)
X-Mailer: ELM [version 2.4ME+ PL31 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
cc: freebsd-hackers@freebsd.org
Subject: Re: open() and ESTALE error
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Jun 2003 19:20:17 -0000

Terry,

> The place to correct this is probably the underlying FS.  I'd
> argue that getting ESTALE is a poke with a sharp stick that
> makes this more likely to happen.  ;^).

Initially I was going to "fix" the underlying FS (that is, the NFS code).
But it's extremely hard to do "nice", because I need to re-lookup the name(!)
which is not referenced (easily? at all?) below VFS.

> > I think this is exactly what happens :) Actually, I believe, I'm just
> > getting another namecache entry with another vnode/nfsnode/file handle.
> 
> You can't have this for other reasons; specifically, if you have
> the file open at th time of the rename, and it becomes a ".#nfs..."
> file (or whatever) on the server.

I didn't trace "sillyrename" scenario much. But I believe, nfs_sillyrename()
keeps it tight. At least, it uses nfs_lookitup() which may actually
*update* the file handle. And it plays with the name cache purging as well.
So I don't consider it as a real problem.

However, for open for reading/writing the scenario looks quite clear for me.
As I said in my previous message to Don, I'm just trying to eliminate
the need to modify otherwise generic application to cope with the necessity
of doing immediate open() if the first open failed with ESTALE. For a certain
more or less common situation :)  And I know, the second open from the
userland application always works for the case I've described.

> Don points out that Solaris tries to fix this via the "noac" mount
> option for client NFS.

It does bad things to performance, though :)  I'm not trying to uncache
everything. It's safe for me to use file pagecache if open() succeeds.
I'm not trying to reach an absolute shared file integrity with NFS, believe
me :)

> 	{ A, B, C }
> fd1 open on B
> fd2 open on C
> rename B -> C
> rename A -> B
> 
> ?  With your patch, I think we would potentially convert fd2 to point
> to B whien it really *should* be "ESTALE", which is wrong (think in
> terms of 2 or more clients doing the operations).

You didn't specify client or server side, though. The result heavily
depends on the exact scenario.

With a single client, a new open() for "C" will result in fd2 if the
original "C" is still opened (because of sillyrename?).
Without fd2, any new open() for "C" will get a valid file handle for what
originally was "B". And that's a correct behaviour.

If the renames were on the server, then fd1 will be valid until the last
client's close. However, any reference to the original "C" will fail.
Re-opening "C" should result in a new file handle for what originally was "B".

Am I wrong?