Date: Thu, 12 Dec 2013 15:15:33 -0500 From: Jason Keltz <jas@cse.yorku.ca> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>, Steve Dickson <SteveD@redhat.com> Subject: Re: mount ZFS snapshot on Linux system Message-ID: <52AA1965.9080709@cse.yorku.ca> In-Reply-To: <116973401.29503791.1386804115064.JavaMail.root@uoguelph.ca> References: <116973401.29503791.1386804115064.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On 12/11/2013 06:21 PM, Rick Macklem wrote: > Jason Keltz wrote: >> On 10/12/2013 7:21 PM, Rick Macklem wrote: >>> Jason Keltz wrote: >>>> I'm running FreeBSD 9.2 with various ZFS datasets. >>>> I export a dataset to a Linux system (RHEL64), and mount it. It >>>> works >>>> fine... >>>> When I try to access the ZFS snapshot directory on the Linux NFS >>>> client, >>>> things go weird. >>>> >>>> With NFSv4: >>>> >>>> [jas@archive /]# cd /mnt/.zfs/snapshot >>>> [jas@archive snapshot]# ls >>>> 20131203 20131205 20131206 20131207 20131208 20131209 >>>> 20131210 >>>> [jas@archive snapshot]# cd 20131210 >>>> 20131210: Not a directory. >>>> >>>> huh? >>>> >>>> [jas@archive snapshot]# ls -al >>>> total 77 >>>> dr-xr-xr-x 9 root root 9 Dec 10 11:20 . >>>> dr-xr-xr-x 4 root root 4 Nov 28 15:42 .. >>>> drwxr-xr-x 380 root root 380 Dec 2 15:56 20131203 >>>> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131205 >>>> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131206 >>>> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131207 >>>> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131208 >>>> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131209 >>>> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131210 >>>> [jas@archive snapshot]# stat * >>>> [jas@archive snapshot]# ls -al >>>> total 292 >>>> dr-xr-xr-x 9 root root 9 Dec 10 11:20 . >>>> dr-xr-xr-x 4 root root 4 Nov 28 15:42 .. >>>> -rw-r--r-- 1 uax guest 137647 Mar 17 2010 20131203 >>>> -rw-r--r-- 1 uax guest 865 Jul 31 2009 20131205 >>>> -rw-r--r-- 1 uax guest 137647 Mar 17 2010 20131206 >>>> -rw-r--r-- 1 uax guest 771 Jul 31 2009 20131207 >>>> -rw-r--r-- 1 uax guest 778 Jul 31 2009 20131208 >>>> -rw-r--r-- 1 uax guest 5281 Jul 31 2009 20131209 >>>> -rw------- 1 btx faculty 893 Jul 13 20:21 20131210 >>>> >>>> But it gets even more fun.. >>>> >>>> # ls -ali >>>> total 205 >>>> 2 dr-xr-xr-x 9 root root 9 Dec 10 11:20 . >>>> 1 dr-xr-xr-x 4 root root 4 Nov 28 15:42 .. >>>> 863 -rw-r--r-- 1 uax guest 137647 Mar 17 2010 20131203 >>>> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131205 >>>> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131206 >>>> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131207 >>>> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131208 >>>> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131209 >>>> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131210 >>>> >>>> This is not a user id mapping issue because all the files in /mnt >>>> have >>>> the proper owner/groups, and I can access them there fine. >>>> >>>> I also tried explicitly exporting .zfs/snapshot. The result isn't >>>> any >>>> different. >>>> >>>> If I use nfs v3 it "works", but I'm seeing a whole lot of errors >>>> like >>>> these in syslog: >>>> >>>> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for >>>> /local/backup/home9/.zfs/snapshot/20131203: Invalid argument >>>> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for >>>> /local/backup/home9/.zfs/snapshot/20131209: Invalid argument >>>> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for >>>> /local/backup/home9/.zfs/snapshot/20131210: Invalid argument >>>> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for >>>> /local/backup/home9/.zfs/snapshot/20131207: Invalid argument >>>> >>>> It's not clear to me why this doesn't just "work". >>>> >>>> Can anyone provide any advice on debugging this? >>>> >>> As I think you already know, I know nothing about ZFS and never >>> use it. >> Yup! :) >>> Having said that, I suspect that there are filenos (i-node #s) >>> that are the same in the snapshot as in the parent file system >>> tree. >>> >>> The basic assumptions are: >>> - within a file system, all i-node# are unique (represent one file >>> object only) and all file objects have the same fsid >>> - when the fsid changes, that indicates a file system boundary and >>> fileno (i-node#s) can be reused in the subtree with a different >>> fsid >>> >>> For NFSv3, the server should export single volumes only (all >>> objects >>> have the same fsid and the filenos are unique). This is indicated >>> to >>> the VFS by the use of the NOCROSSMOUNT flag on VOP_LOOKUP() and >>> friends. >>> >>> For NFSv4, the server does export multiple volumes and the boundary >>> is indicated by a change in fsid value. >>> >>> I suspect ZFS snaphots don't obey the above in some way, but that >>> is >>> just a hunch. >>> >>> Now, how to narrow this down... >>> - Do the above tests (both NFSv4 and NFSv3) and capture the >>> packets, >>> then look at them in wireshark. In particular, look at the >>> fileid numbers >>> and fsid values for the various directories under .zfs. >> I gave this a shot, but I haven't used wireshark to capture NFS >> traffic >> before, so if I need to provide additional details, let me know.. >> >> NFSv4: >> >> For /mnt/.zfs/snapshot/20131203: >> fileid=4 >> fsid4.major=1446349656 >> fsid4.minor=222 >> >> For /mnt/.zfs/snapshot/20131205: >> fileid=4 >> fsid4.major=1845998066 >> fsid4.minor=222 >> >> For /mnt/jas: >> fileid=144 >> fsid4.major=597946950 >> fsid4.minor=222 >> >> For /mnt/jas1: >> fileid=338 >> fsid4.major=597946950 >> fsid4.minor=222 >> >> So fsid is the same for all the different "data" directories, which >> is >> what I would expect given what you said. I guess each snapshot is >> seen >> as a unique filesystem... but then a repeating inode in different >> filesystems shouldn't be a problem... >> > Yes, it appears that each snapshot is represented as a different file > system. As such, NFSv4 should work for these, but there is an additional > property of the "root" of each of these (20131203, ...). > When the directory .zfs/snapshot is read, the fileno for 20131203 should > be different than the fileno returned by VOP_GETATTR()/stat() for "20131203". > (The old "mounted-on" vs "root-of-mounted-fs" vnodes which you get for a > "mount point".) > For NFSv4, the server returns the fileno in the VOP_READDIR() dirent as a > separate attribute called mounted_on_fileid vs the value returned by VOP_GETATTR() > as the fileid attribute. > If the value of these 2 attributes is the same, it is not a "mount point". > > So, maybe you could take another look at the packet capture in wireshark > and see what the fileid and mounted_on_fileid attributes are? Unfortunately, I didn't save the log, but it was easy enough to regenerate. But before we go there, I've spent a lot of time experimenting with this, so I can say... If I NFSv4 mount nfs-server:/local/backup/home9 to /mnt, then I: cd /mnt/.zfs/snapshot/20131203 ... it works great! I can change into any user directory, list files, etc. If I then: cd /mnt/.zfs/snapshot/20131205 .. it also works great! But... if I cd into /mnt/.zfs/snapshot, the free ride is over... all the snapshot directories appear as files and the problem is there. ... unless I unmount and remount, in which case I can repeat. I also found that a change of kernel from 2.6.32-358.14.1.el6 (the kernel I was running with RHEL6.4) to 2.6.32-431.el6 (the kernel that comes with RHEL6.5) does actually change something important.... If I mount nfs-server:/local/backup/home9 and try to change into "/mnt/.zfs/snapshot" with the new kernel, I still have the problem. Likewise, if I try to mount nfs-server:/local/backup/home9/.zfs, and change into "/mnt/snapshot", I also have the problem. If I mount nfs-server:/local/backup/home9/.zfs/snapshot and change into "/mnt", I stil have the older problem, but with the RH 6.4 kernel in place. However, if I do the same mount with the newer kernel, it now works. I can "ls" and see the snapshot directories. I can change into any of them, then "cd .." and change into another one. I tested this on two systems - one where I just installed the entire 6.5 upgrade, and the other where I just installed the kernel from 6.5 on the 6.4 system so it seems related to the kernel. It's still not clear why I can't just mount nfs-server:/local/backup/home9 on RHEL6.5, and the NFSv4 server figures it out. I did try from another FreeBSD client, and I can mount the tree at any point, and the NFS server is happy. This makes me believe it's probably a RHEL NFSv4 bug. Here's the numbers.. NFSv4: So, if I try to access the snapshot path directly, on the way ... .zfs: V4 LOOKUP fsid.major: 597946950 fileid: 1 fattr owner/group are root - correct snapshot: V4 LOOKUP fsid.major: 597946950 fileid: 2 fattr owner/group are root - correct If I access /.zfs/snapshot/20131203 directly...: 20131203: V4 LOOKUP fsid.major: 1446349656 fileid: 4 fattr owner/group are root - correct V4 READDIR snapshot, 20121203 entry: fsid.major: 597946950 <-- ???? fattr4_fileid: 863 fattr4_owner/group refers to a group on our system (the one displayed in ls sometimes).. FATTR4_MOUNTED_ON_FILEID: 0x000000000000035f But if I ls /mnt/.zfs/snapshot: V4 LOOKUP: 201203: fsid.major: 597946950 fileid: 4 V4 READDIR: fsid4.major: 597946950 fattr4_fileid: 863 fattr4_mounted_on_fileid: 0x000000000000035f >> NFSv3: >> >> For /mnt/.zfs/snapshot/20131203: >> fileid=4 >> fsid=0x0000000056358b58 >> >> For /mnt/.zfs/snapshot/20131205: >> fileid=4 >> fsid=0x000000006e07b1f2 >> >> For /mnt/jas >> fileid=144 >> fsid=0x0000000023a3f246 >> >> For /mnt/jas1: >> fileid=338 >> fsid=0x0000000023a3f246 >> >> Here, it seems it's the same, even though it's NFSv3... hmm. >> >> >>> - Try mounting the individual snapshot directory, like >>> .zfs/snapshot/20131209 and see if that works (for both NFSv3 >>> and NFSv4). >> Hmm .. I tried this: >> >> /local/backup/home9/.zfs/snapshot/20131203 -ro >> archive-mrpriv.cs.yorku.ca >> V4: / >> >> ... but syslog reports: >> >> Dec 10 22:28:22 jungle mountd[85405]: can't export >> /local/backup/home9/.zfs/snapshot/20131203 >> > mountd will do a VFS_CHECKEXP(), which seems to fail for > these (which also explains the error messages). To be honest, > with these failing, remote access should fail. > > Also, since NFSv3 exported volumes should not cross > "mount points" (anywhere the fsid changes), all a mount > above .zfs/snapshot/20131203 should get are a bunch of > empty directories called 20131203,... I tried again just in case I missed something... nfs-server:/local/backup/home9 on /mnt type nfs (ro,vers=3,addr=172.16.2.26) I can change into /mnt/.zfs/snapshot/20131203/jas and list the directory, or less a file. > For example, if in the UFS world with a separate > file systems /sub1 and /sub1/sub2 with both exported: > - an NFSv3 mount of /sub1 on /mnt would see an empty > directory "sub2" when looking in /mnt. (Actually it > isn't necessarily empty. It might have whatever is in > the directory when /sub1/sub2 is not mounted.) > > This seems pretty obviously broken for ZFS, but I think > it needs to be fixed in ZFS and I have no idea how to do > that, since I don`t know if snapshots are real mount points, etc. > >> ... and of course I can't mount from either v3/v4. >> >> On the other hand, I kept it as: >> >> /local/backup/home9 -ro archive-mrpriv.cs.yorku.ca >> V4:/ >> >> ... and was able to NFSv4 mount >> /local/backup/home9/.zfs/snapshot/20131203, and this does indeed >> work. >> > Yes, although technically it should not work unless 20131203 is > exported. Hmm.. I thought that this line in the exports man page meant that it was okay: "Because NFSv4 does not use the mount protocol, the ``administrative controls'' are not applied. Thus, all the above export line(s) should be considered to have the -alldirs flag, even if the line is specified without it." > However, it is probably the easiest work around until this is fixed > someday. > So, just to make sure I am clear on this... > A NFSv4 mount of the snapshot works ok, even for a Linux client mount. Yes. Although with the new kernel, I can mount nfs-server:/local/backup/home9/.zfs/snapshot now as well... which is neat because it solves the problem I was trying to solve.. I wanted users to be able to view their own snapshots, but not the snapshots of other users... Now, on the archive server, I can mount the snapshot dir via NFSv4, then, through autofs I am able to run a shell script that bind mounts the users own individual snapshot directories from the NFSv4 mount into one directory. I then provide chrooted sftp access to that directory for users to get at their files. A user now sees "20131203 20131204..." when they sftp in.. >>> - Try doing the mounts with a FreeBSD client and see if you get the >>> same >>> behaviour? >> I found this: >> http://forums.freenas.org/threads/mounting-snapshot-directory-using-nfs-from-linux-broken.6060/ >> .. implies it will work from FreeBSD/Nexenta, just not Linux. > I suspect this might be the mounted_on_fileid vs fileid issue. > (ie, The Linux client needs this to be done correctly, but the other > clients figure it out.) > > One case that might break for FreeBSD would be to cd into a snapshot > and then do a pwd with the debug.disablecwd sysctl set to 1. > > Hopefully the ZFS wizards are reading this, rick Me too! Jason.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52AA1965.9080709>