Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Dec 2013 18:21:55 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Jason Keltz <jas@cse.yorku.ca>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>, Steve Dickson <SteveD@redhat.com>
Subject:   Re: mount ZFS snapshot on Linux system
Message-ID:  <116973401.29503791.1386804115064.JavaMail.root@uoguelph.ca>
In-Reply-To: <52A7E53D.8000002@cse.yorku.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
Jason Keltz wrote:
> On 10/12/2013 7:21 PM, Rick Macklem wrote:
> > Jason Keltz wrote:
> >> I'm running FreeBSD 9.2 with various ZFS datasets.
> >> I export a dataset to a Linux system (RHEL64), and mount it.  It
> >> works
> >> fine...
> >> When I try to access the ZFS snapshot directory on the Linux NFS
> >> client,
> >> things go weird.
> >>
> >> With NFSv4:
> >>
> >> [jas@archive /]# cd /mnt/.zfs/snapshot
> >> [jas@archive snapshot]# ls
> >> 20131203  20131205  20131206  20131207  20131208  20131209
> >>  20131210
> >> [jas@archive snapshot]# cd 20131210
> >> 20131210: Not a directory.
> >>
> >> huh?
> >>
> >> [jas@archive snapshot]# ls -al
> >> total 77
> >> dr-xr-xr-x   9 root root   9 Dec 10 11:20 .
> >> dr-xr-xr-x   4 root root   4 Nov 28 15:42 ..
> >> drwxr-xr-x 380 root root 380 Dec  2 15:56 20131203
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131205
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131206
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131207
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131208
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131209
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131210
> >> [jas@archive snapshot]# stat *
> >> [jas@archive snapshot]# ls -al
> >> total 292
> >> dr-xr-xr-x 9 root      root         9 Dec 10 11:20 .
> >> dr-xr-xr-x 4 root      root         4 Nov 28 15:42 ..
> >> -rw-r--r-- 1 uax    guest   137647 Mar 17  2010 20131203
> >> -rw-r--r-- 1 uax    guest         865 Jul 31  2009 20131205
> >> -rw-r--r-- 1 uax    guest   137647 Mar 17  2010 20131206
> >> -rw-r--r-- 1 uax    guest         771 Jul 31  2009 20131207
> >> -rw-r--r-- 1 uax    guest         778 Jul 31  2009 20131208
> >> -rw-r--r-- 1 uax     guest       5281 Jul 31  2009 20131209
> >> -rw------- 1 btx      faculty      893 Jul 13 20:21 20131210
> >>
> >> But it gets even more fun..
> >>
> >> # ls -ali
> >> total 205
> >>     2 dr-xr-xr-x   9 root      root       9 Dec 10 11:20 .
> >>     1 dr-xr-xr-x   4 root      root       4 Nov 28 15:42 ..
> >> 863 -rw-r--r--   1 uax     guest 137647 Mar 17  2010 20131203
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131205
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131206
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131207
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131208
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131209
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131210
> >>
> >> This is not a user id mapping issue because all the files in /mnt
> >> have
> >> the proper owner/groups, and I can access them there fine.
> >>
> >> I also tried explicitly exporting .zfs/snapshot.  The result isn't
> >> any
> >> different.
> >>
> >> If I use nfs v3 it "works", but I'm seeing a whole lot of errors
> >> like
> >> these in syslog:
> >>
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131203: Invalid argument
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131209: Invalid argument
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131210: Invalid argument
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131207: Invalid argument
> >>
> >> It's not clear to me why this doesn't just "work".
> >>
> >> Can anyone provide any advice on debugging this?
> >>
> > As I think you already know, I know nothing about ZFS and never
> > use it.
> Yup! :)
> > Having said that, I suspect that there are filenos (i-node #s)
> > that are the same in the snapshot as in the parent file system
> > tree.
> >
> > The basic assumptions are:
> > - within a file system, all i-node# are unique (represent one file
> >    object only) and all file objects have the same fsid
> > - when the fsid changes, that indicates a file system boundary and
> >    fileno (i-node#s) can be reused in the subtree with a different
> >    fsid
> >
> > For NFSv3, the server should export single volumes only (all
> > objects
> > have the same fsid and the filenos are unique). This is indicated
> > to
> > the VFS by the use of the NOCROSSMOUNT flag on VOP_LOOKUP() and
> > friends.
> >
> > For NFSv4, the server does export multiple volumes and the boundary
> > is indicated by a change in fsid value.
> >
> > I suspect ZFS snaphots don't obey the above in some way, but that
> > is
> > just a hunch.
> >
> > Now, how to narrow this down...
> > - Do the above tests (both NFSv4 and NFSv3) and capture the
> > packets,
> >    then look at them in wireshark. In particular, look at the
> >    fileid numbers
> >    and fsid values for the various directories under .zfs.
> 
> I gave this a shot, but I haven't used wireshark to capture NFS
> traffic
> before, so if I need to provide additional details, let me know..
> 
> NFSv4:
> 
> For /mnt/.zfs/snapshot/20131203:
> fileid=4
> fsid4.major=1446349656
> fsid4.minor=222
> 
> For /mnt/.zfs/snapshot/20131205:
> fileid=4
> fsid4.major=1845998066
> fsid4.minor=222
> 
> For /mnt/jas:
> fileid=144
> fsid4.major=597946950
> fsid4.minor=222
> 
> For /mnt/jas1:
> fileid=338
> fsid4.major=597946950
> fsid4.minor=222
> 
> So fsid is the same for all the different "data" directories, which
> is
> what I would expect given what you said.  I  guess each snapshot is
> seen
> as a unique filesystem...  but then a repeating inode in different
> filesystems shouldn't be a problem...
> 
Yes, it appears that each snapshot is represented as a different file
system. As such, NFSv4 should work for these, but there is an additional
property of the "root" of each of these (20131203, ...).
When the directory .zfs/snapshot is read, the fileno for 20131203 should
be different than the fileno returned by VOP_GETATTR()/stat() for "20131203".
(The old "mounted-on" vs "root-of-mounted-fs" vnodes which you get for a
 "mount point".)
For NFSv4, the server returns the fileno in the VOP_READDIR() dirent as a
separate attribute called mounted_on_fileid vs the value returned by VOP_GETATTR()
as the fileid attribute.
If the value of these 2 attributes is the same, it is not a "mount point".

So, maybe you could take another look at the packet capture in wireshark
and see what the fileid and mounted_on_fileid attributes are?

> NFSv3:
> 
> For /mnt/.zfs/snapshot/20131203:
> fileid=4
> fsid=0x0000000056358b58
> 
> For /mnt/.zfs/snapshot/20131205:
> fileid=4
> fsid=0x000000006e07b1f2
> 
> For /mnt/jas
> fileid=144
> fsid=0x0000000023a3f246
> 
> For /mnt/jas1:
> fileid=338
> fsid=0x0000000023a3f246
> 
> Here, it seems it's the same, even though it's NFSv3... hmm.
> 
> 
> > - Try mounting the individual snapshot directory, like
> >     .zfs/snapshot/20131209 and see if that works (for both NFSv3
> >     and NFSv4).
> 
> Hmm .. I tried this:
> 
> /local/backup/home9/.zfs/snapshot/20131203  -ro
> archive-mrpriv.cs.yorku.ca
> V4: /
> 
> ... but syslog reports:
> 
> Dec 10 22:28:22 jungle mountd[85405]: can't export
> /local/backup/home9/.zfs/snapshot/20131203
> 
mountd will do a VFS_CHECKEXP(), which seems to fail for
these (which also explains the error messages). To be honest,
with these failing, remote access should fail.

Also, since NFSv3 exported volumes should not cross
"mount points" (anywhere the fsid changes), all a mount
above .zfs/snapshot/20131203 should get are a bunch of
empty directories called 20131203,...

For example, if in the UFS world with a separate
file systems /sub1 and /sub1/sub2 with both exported:
- an NFSv3 mount of /sub1 on /mnt would see an empty
  directory "sub2" when looking in /mnt. (Actually it
  isn't necessarily empty. It might have whatever is in
  the directory when /sub1/sub2 is not mounted.)

This seems pretty obviously broken for ZFS, but I think
it needs to be fixed in ZFS and I have no idea how to do
that, since I don`t know if snapshots are real mount points, etc.

> ... and of course I can't mount from either v3/v4.
> 
> On the other hand, I kept it as:
> 
> /local/backup/home9 -ro archive-mrpriv.cs.yorku.ca
> V4:/
> 
> ... and was able to NFSv4 mount
> /local/backup/home9/.zfs/snapshot/20131203, and this does indeed
> work.
> 
Yes, although technically it should not work unless 20131203 is
exported.

However, it is probably the easiest work around until this is fixed
someday.
So, just to make sure I am clear on this...
A NFSv4 mount of the snapshot works ok, even for a Linux client mount.

> > - Try doing the mounts with a FreeBSD client and see if you get the
> > same
> >    behaviour?
> I found this:
> http://forums.freenas.org/threads/mounting-snapshot-directory-using-nfs-from-linux-broken.6060/
> .. implies it will work from FreeBSD/Nexenta, just not Linux.

I suspect this might be the mounted_on_fileid vs fileid issue.
(ie, The Linux client needs this to be done correctly, but the other
 clients figure it out.)

One case that might break for FreeBSD would be to cd into a snapshot
and then do a pwd with the debug.disablecwd sysctl set to 1.

Hopefully the ZFS wizards are reading this, rick

> Found this as well:
> https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-discuss/lKyfYsjPMNM
> 
> Jason.
> 
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?116973401.29503791.1386804115064.JavaMail.root>