Date: Fri, 7 Aug 2015 20:28:20 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Julian Elischer <julian@freebsd.org> Cc: freebsd-fs@freebsd.org Subject: Re: changes to export whole FS hierarhy to mount it with one command on client? [changed subject] Message-ID: <1970606860.13124005.1438993700838.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <55C037D0.1000606@freebsd.org> References: <795246861.20150801140429@serebryakov.spb.ru> <1363497421.7238055.1438428070047.JavaMail.zimbra@uoguelph.ca> <1593307781.20150801143052@serebryakov.spb.ru> <55BEE668.3080303@freebsd.org> <67101638.8226696.1438604713620.JavaMail.zimbra@uoguelph.ca> <55BFC58C.6030802@freebsd.org> <987522757.8576059.1438636467059.JavaMail.zimbra@uoguelph.ca> <55C037D0.1000606@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Julian Elischer wrote: > On 8/4/15 5:14 AM, Rick Macklem wrote: > > Julian Elischer wrote: > >> On 8/3/15 8:25 PM, Rick Macklem wrote: > >>> Julian Elischer wrote: > >>>> On 8/1/15 7:30 PM, Lev Serebryakov wrote: > >>>>> Hello Rick, > >>>>> > >>>>> Saturday, August 1, 2015, 2:21:10 PM, you wrote: > >>>>> > >>>>>> To mount multiple file systems as one mount, you'll need to use NFSv4. > >>>>>> I > >>>>>> believe > >>>>>> you will have to have a separate export entry in the server for each > >>>>>> of > >>>>>> the file > >>>>>> systems. > >>>>> So, /etc/exports needs to have BOTH v3-style exports & V4: root of > >>>>> tree > >>>>> line? > >>>> OR you can have a non-standard patch that pjd wrote to do recursive > >>>> mounts of sub-filesystems. > >>>> it is not supposed to happen according to the standard but we have > >>>> found it useful. > >>>> Unfortnately it is written agains the old NFS Server. > >>>> > >>>> Rick, if I gave you the original pjd patch for the old server, could > >>>> you integrate it into the new server as an option? > >>>> > >>> A patch like this basically inserts the file system volume identifier > >>> in the high order bits of the fileid# (inode# if you prefer), so that > >>> duplicate fileid#s don't show up in a "consolidated file system" (for > >>> want of a better term). It also replies with the same "fake" fsid for > >>> all volumes involved. > >>> > >>> I see certain issues w.r.t. this: > >>> 1 - What happens when the exported volumes are disjoint and don't form > >>> one tree? (I think any just option should be restricted to volumes > >>> that form a tree, but I don't know an easy way to enforce that > >>> restriction?) > >>> 2 - It would be fine at this point to use the high order bits of the > >>> fileid#, > >>> since NFSv3 defines it as 64bits and FreeBSD's ino_t is 32bits. > >>> However, > >>> I believe FreeBSD is going to have to increase ino_t to 64bits > >>> soon. > >>> (I hope such a patch will be in FreeBSD11.) > >>> Once ino_t is 64bits, this option would have to assume that some # > >>> of > >>> the high order bits of the fileid# are always 0. Something like > >>> "the high order 24bits are always 0" would work ok for a while, > >>> then > >>> someone would build a file system large enough to overflow the > >>> 40bit > >>> (I know that's a lot, but some are already exceeding 32bits for # > >>> of > >>> fileids) field and cause trouble. > >>> 3 - You could get weird behaviour when the tree includes exports with > >>> different > >>> export options. This discussion includes just that and NFSv3 > >>> clients > >>> don't expect things to change within a mount. (An example would be > >>> having > >>> part of this consolidated tree require Kerberos authentication. > >>> Another > >>> might be having parts of the consolidated tree use different uid > >>> mapping > >>> for AUTH_SYS.) > >>> 4 - Some file systems (msdosfs ie. FAT) have limited capabilities w.r.t. > >>> what > >>> the NFS server can do to the file system. If one of these was > >>> imbedded > >>> in > >>> the consolidated tree, then it could cause confusion similar to #3. > >>> > >>> All in all, the "hack" is relatively easy to do, if: > >>> You use one kind of file system (for example ZFS) and make everything you > >>> are > >>> exporting one directory tree which is all exported in a compatible way. > >>> You also "know" that all the fileid#s in the underlying file systems will > >>> fit > >>> in the low order K bits of the 64bit fileid#. > >>> > >>> My biggest concern is #2, once ino_t becomes 64bits. > >>> > >>> If the collective thinks this is a good idea despite the issues above and > >>> can > >>> propose a good way to do it. (Maybe an export flag for all the volumes > >>> that > >>> will participate in the "consolidated file system"? The exports(5) man > >>> page > >>> could then try to clearly explain the limitations of its use, etc. Even > >>> with > >>> that, I suspect some would misuse the option and cause themselves grief.) > >>> > >>> Personally, since NFSv4 does this correctly, I don't see a need to "hack > >>> it" > >>> for NFSv3, but I'll leave it up to the collective. > >>> > >>> rick > >>> ps: Julian, you might want to repost this under a separate subject line, > >>> so > >>> people not interested in how ZFS can export multiple volumes > >>> without > >>> separate entries will read it. > >>> > >> In our environment we need to export V3 (and maybe even V2) in a > >> single hierarchy, even though it's multiple ZFS filesystems. > >> It's not dissimilar to having a separate ZFS for each user, except in > >> this case it's a separate ZFS for each site. > >> The "modified ZFS" filesystems have very special characteristics. We > >> are only having our very first nibbles (questions) about NFSv4. Until > >> now it's all NFS3. Possibly we'd only have to support it for NFSv3 > >> if V4 can use its native mechanisms. > >> > >> > > Sure. You have a particular environment where it is useful and you > > understand > > how to use it in that situation. I could do it here in about 10minutes and > > would > > do so if I needed it myself. The trick is I understand what is going on and > > the > > limitations w.r.t. doing it. > > > > If you know your file systems are all in one directory hierarchy (tree), > > all are ZFS > > and none of them even generate fileid#s that don't fit in 32bits and you > > are exporting > > them all in the same way, it's pretty easy to do. > > Unfortunately, that isn't what generic NFS server support for FreeBSD does. > > (If this is done, I think it has to be somehow restricted to the above or > > at least > > documented that it only works for the above cases.) > > > > Since an NFSv2 fileid# is 32bits, I don't believe this is practical for > > NFSv2 > > and I don't think anyone would care. Since NFSv4 does this "out of the > > box", > > I think the question is whether or not it should be done for NFSv3? > > > > The challenge would be to put it in FreeBSD in a way that people who don't > > necessarily understand what is "behind the curtain" can use it effectively > > and not run into problems. (An example being the previous thread where the > > different file systems are being created with different characteristics for > > different users. That could be confusing if the sysadmin thought it was > > "one volume".) > > I'll leave whether or not to do this up to the collective. (This is yet > > another one > > of these "easy to code" but hard to tell if it the correct thing to do > > situations.) > > If done I'd suggest: > > - Restricted to one file system type (ZFS or UFS or ...). The code would > > probably > > have file system specifics in it. The correct way to do this will be > > different for > > ZFS than UFS, I think? > > - A check for any fileid# that has high order bits set that would syslog an > > error. > > - Enabled by an export option, so it doesn't automatically apply to all > > file systems > > on the server. This also provides a place for it to be documented, > > including limitations. > Oh, I remembered another issue related to this: - Until FreeBSD changes ino_t to 64bits, any "virtual consolidated file system" from an NFSv3 server needs to provide fileid#s that are unique in the low order 32bits so that it won't break the FreeBSD client. --> I think this implies that it shouldn't go into head until after ino_t becomes 64bits, at the earliest. Even then this option would break older FreeBSD clients. (The documentation for the export option would need to mention this.) Without 64bit ino_t, the fileid the server puts on the wire would need to be something like 24bits from the file system on the server + 8bits for which file system it is (to uniquify the 32bit value). Expecting the fileid# to fit in 24bits on the underlying file system seems too restrictive to me. rick > Well obviously I would like it, becasue we need it and I don't want to > have to maintain patches going forward. > If it is in the tree YOU work on then it would automatically get > updated as needed. The mount option is nice, but at the moment we just > have it wired on, and only export a single (PZFS) hierarchy. (PZFS is > our own heavily modified version of ZFS that uses Amazon storage(*) as > a block backend in parallel with the local drives (which are more a > cache.. the cloud is authoritative). > > (*) a gross simplification. > > Different parts of the hierarchy are actually be different cloud > 'buckets',, (e.g. theoretically some could be amazon and some could be > google cloud storage). These sub-filesystems are unified as a > hierachy of ZFS filesystems into a single storage hierarchy via PZFS > and exported to the user via NFS and CIFS/Samba. > > If I need to maintain a separate set of changes for the option then > that's life. but it's of course preferable to me to have it upstreamed. > > p.s. to any Filesystem types.. yes we are hiring FreeBSD filesystem > people.. > http://panzura.com/company/careers-panzura/senior-software-engineer/.. > resumes via me for fast track :-) .. > > > > > > > > > Anyhow, if anyone has an opinion on whether ir not this should be in > > FreeBSD, please post, rick > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1970606860.13124005.1438993700838.JavaMail.zimbra>