From owner-freebsd-fs@FreeBSD.ORG  Sat Jun 25 11:58:28 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B4F52106564A;
	Sat, 25 Jun 2011 11:58:28 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 7524B8FC14;
	Sat, 25 Jun 2011 11:58:28 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id EA33246B3C;
	Sat, 25 Jun 2011 07:58:27 -0400 (EDT)
Received: from kavik.baldwin.cx (c-68-36-150-83.hsd1.nj.comcast.net
	[68.36.150.83])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 77A7E8A01F;
	Sat, 25 Jun 2011 07:58:27 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-fs@freebsd.org
Date: Sat, 25 Jun 2011 07:58:23 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-RELEASE-p2; KDE/4.5.5; i386; ; )
References: <1656190156.1051008.1308953344203.JavaMail.root@erie.cs.uoguelph.ca>
	<alpine.GSO.1.10.1106242244170.6818@multics.mit.edu>
In-Reply-To: <alpine.GSO.1.10.1106242244170.6818@multics.mit.edu>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201106250758.23935.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Sat, 25 Jun 2011 07:58:27 -0400 (EDT)
Cc: shadow@gmail.com, Robert Watson <rwatson@freebsd.org>,
	Garance A Drosehn <gad@freebsd.org>
Subject: Re: [rfc] 64-bit inode numbers
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Jun 2011 11:58:28 -0000

On Friday, June 24, 2011 11:38:35 pm Benjamin Kaduk wrote:
> > point. fts(3) and friends will assume that it is a mount point
> > crossing when st_dev changes. It will then expect that the funny
> > rule that the d_ino in dirent will not be the same as st_ino.
> > 
> > What I do for NFSv4 is sythesize  the mnt_stat.f_fsid value and
> > return that as st_dev for the mounted volume until I see the fsid
> > returned by the server change. Below that point, I return the fsid
> > from the server as st_dev so long as it isn't the same as the
> 
> I think I'm confused.  You're ... walking a directory heirarchy, and
> return a fake st_dev value but hold onto the fsid value from the server,
> then when the fsid from the server changes (due to a ... different NFS
> mount?), start reporting that new fsid and throw away the fake st_dev
> value?  Can you point me at the code that is doing this?

I think he's saying that VOP_GETATTR() for different vnodes in a single NFSv4
"mount" (as in 'struct mount *') can return different st_dev values to 
userland where the st_dev value for a given vnode depends on the remote
fsid of the file on the NFSv4 server.  That is, for NFSv4 it seems that all 
files on a mount do not use the same value of st_dev (as they would for a 
local filesystem), but instead only files from the logical volume on the 
server share an st_dev.  That is, st_dev is per-vnode rather than just copied 
from the mount.  This is done by storing va_fsid in the NFS attribute cache 
for each vnode:

int
nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper,
    void *stuff, int writeattr, int dontshrink)
{
	...
	/*
	 * For NFSv4, if the node's fsid is not equal to the mount point's
	 * fsid, return the low order 32bits of the node's fsid. This
	 * allows getcwd(3) to work. There is a chance that the fsid might
	 * be the same as a local fs, but since this is in an NFS mount
	 * point, I don't think that will cause any problems?
	 */
	if (NFSHASNFSV4(nmp) && NFSHASHASSETFSID(nmp) &&
	    (nmp->nm_fsid[0] != np->n_vattr.na_filesid[0] ||
	     nmp->nm_fsid[1] != np->n_vattr.na_filesid[1])) {
		/*
		 * va_fsid needs to be set to some value derived from
		 * np->n_vattr.na_filesid that is not equal
		 * vp->v_mount->mnt_stat.f_fsid[0], so that it changes
		 * from the value used for the top level server volume
		 * in the mounted subtree.
		 */
		if (vp->v_mount->mnt_stat.f_fsid.val[0] !=
		    (uint32_t)np->n_vattr.na_filesid[0])
			vap->va_fsid = (uint32_t)np->n_vattr.na_filesid[0];
		else
			vap->va_fsid = (uint32_t)hash32_buf(
			    np->n_vattr.na_filesid, 2 * sizeof(uint64_t), 0);
	} else
		vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0];
	...
}

Then for VOP_GETATTR() it returns the va_fsid from the attribute cache
saved in 'vap' as the vnode's va_fsid which is used to compute st_dev in 
vn_stat().

I think the effect here is that 'mount' still only shows a single mountpoint
for NFSv4, but applications that check for 'st_dev' changing to see if they
are crossing a mountpoint (e.g. find -x) will treat the volumes as different
mountpoints.

-- 
John Baldwin