From owner-freebsd-current@FreeBSD.ORG  Fri Aug 19 14:58:22 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 24628106564A
	for <current@freebsd.org>; Fri, 19 Aug 2011 14:58:22 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id D2DE58FC0A
	for <current@freebsd.org>; Fri, 19 Aug 2011 14:58:21 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap4EAOVyTk6DaFvO/2dsb2JhbABBhEukOYFAAQEFIwRSGw4KAgINGQJZBhOvR5E6gSyEDIEQBJMTkRE
X-IronPort-AV: E=Sophos;i="4.68,251,1312171200"; d="scan'208";a="131599372"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 19 Aug 2011 10:29:17 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id B6928B3F31;
	Fri, 19 Aug 2011 10:29:17 -0400 (EDT)
Date: Fri, 19 Aug 2011 10:29:17 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Hiroki Sato <hrs@FreeBSD.org>
Message-ID: <1565511281.69213.1313764157732.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110819.224310.740411147168584392.hrs@allbsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: pjd@FreeBSD.org, current@FreeBSD.org
Subject: Re: fsid change of ZFS?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 14:58:22 -0000

Hiroki Sato wrote:
> Hiroki Sato <hrs@freebsd.org> wrote
> in <20110819.002046.908756241495481148.hrs@allbsd.org>:
> 
> hr> Hi,
> hr>
> hr> I have experienced "Stale NFS file handle" issue when switching
> hr> between oldnfs and newnfs on a CURRENT box (NFS server exporting
> ZFS
> hr> mountpoints). The cause was that fsid was changed in the following
> hr> conditions and not in the NFS subsystem itself, but I am wondering
> if
> hr> these are expected behavior...
> hr>
> hr> First, I tried the following configurations of NFS and ZFS, and
> saw
> hr> if fsid of the same mountpoint (a mounted ZFS dataset) changed or
> hr> not by using statfs(2):
> hr>
> hr> compile opts kld module fsid[0:1] kld loaded by
> hr>
> ----------------------------------------------------------------------------
> hr> NFSSERVER+NFSCLIENT zfs 865798fa:8346ef02 loader
> hr>
> hr> NFSSERVER+NFSCLIENT zfs 865798fa:8346ef07 kldload(8)
> hr>
> hr> NFSSERVER+NFSCLIENT+
> hr> NFSD+NFSCL zfs 865798fa:8346ef03 loader
> hr>
> hr> NFSSERVER+NFSCLIENT+
> hr> NFSD+NFSCL zfs 865798fa:8346ef08 kldload(8)
> hr>
> hr> NFSSERVER+NFSCLIENT nfsd+nfscl+zfs 865798fa:8346ef08 loader
> hr>
> ----------------------------------------------------------------------------
> 
> Ah, I found why this happened:
> 
> /*
> * The fsid is 64 bits, composed of an 8-bit fs type, which
> * separates our fsid from any other filesystem types, and a
> * 56-bit objset unique ID. The objset unique ID is unique to
> * all objsets open on this system, provided by unique_create().
> * The 8-bit fs type must be put in the low bits of fsid[1]
> * because that's where other Solaris filesystems put it.
> */
> fsid_guid = dmu_objset_fsid_guid(zfsvfs->z_os);
> ASSERT((fsid_guid & ~((1ULL<<56)-1)) == 0);
> vfsp->vfs_fsid.val[0] = fsid_guid;
> vfsp->vfs_fsid.val[1] = ((fsid_guid>>32) << 8) |
> vfsp->mnt_vfc->vfc_typenum & 0xFF;
> 
> Since the vfc_typenum variable is incremented every time a new vfs is
> installed, loading order of modules that call vfs_register() affects
> ZFS's fsid.
> 
> Anyway, possibility of fsid change is troublesome especially for an
> NFS server with a lot of clients running. Can zeroing or setting a
> fixed value to the lowest 8-bit of vfs_fsid.val[1] be harmful?
> 
> -- Hiroki
Well, the problem is that the fsid needs to be unique among all mounts.
The vfs_typenum field is used to try and ensure that it does not end up
the same value as a non-ZFS file system.

(A) I think making that field a fixed constant should be ok, if the function
checks for a conflict by calling vfs_getvfs() to check for one.
See vfs_getnewfsid() for how this is done. (There is a mutex lock that
needs to be held while doing it.) Alternately, if ZFS can call vfs_getnewfsid()
instead of doing its own, that might be nicer?

(B) Another way to fix this would be to modify vfs_register() to look up
file systems in a table (by vfc_name) and used a fixed, assigned value
from the table for vfc_typenum for entries found in the table. Only do
the "maxvfsconf++" when there isn't an entry for the fstype in the table.
(VFS_GENERIC can be set to the size of the table. That's what happened
 in the bad old days when vfsconf was a table built at kernel config time.)

If you guys think (B) is preferred, I could come up with a patch. I don't
know enough about ZFS to do (A).

rick