From owner-freebsd-fs@freebsd.org Sat Aug 8 00:29:32 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 598DC9B6F74 for ; Sat, 8 Aug 2015 00:29:32 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id DCB3219AC; Sat, 8 Aug 2015 00:29:31 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DtBQCvTMVV/61jaINbg29qBYMevBCCQ4M2AoFzEQEBAQEBAQGBCoQjAQEBAwEjBFIFCwIBCBgCAg0ZAgJXAgQsiA0IDbdPlgMBAQEBAQUBAQEBAR2BIoothDAOAhU0B4JpgUMFhx2Na4UAgmeGPEaDXYMXjQ+DYwImgg4cFYFaIoE2AR8jgQQBAQE X-IronPort-AV: E=Sophos;i="5.15,632,1432612800"; d="scan'208";a="229709396" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 07 Aug 2015 20:28:22 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 8F78515F5D2; Fri, 7 Aug 2015 20:28:22 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Ka-yWqHx18pl; Fri, 7 Aug 2015 20:28:21 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 35AA915F5DA; Fri, 7 Aug 2015 20:28:21 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id bKKwLjTcja73; Fri, 7 Aug 2015 20:28:21 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 17A6315F5D2; Fri, 7 Aug 2015 20:28:21 -0400 (EDT) Date: Fri, 7 Aug 2015 20:28:20 -0400 (EDT) From: Rick Macklem To: Julian Elischer Cc: freebsd-fs@freebsd.org Message-ID: <1970606860.13124005.1438993700838.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <55C037D0.1000606@freebsd.org> References: <795246861.20150801140429@serebryakov.spb.ru> <1363497421.7238055.1438428070047.JavaMail.zimbra@uoguelph.ca> <1593307781.20150801143052@serebryakov.spb.ru> <55BEE668.3080303@freebsd.org> <67101638.8226696.1438604713620.JavaMail.zimbra@uoguelph.ca> <55BFC58C.6030802@freebsd.org> <987522757.8576059.1438636467059.JavaMail.zimbra@uoguelph.ca> <55C037D0.1000606@freebsd.org> Subject: Re: changes to export whole FS hierarhy to mount it with one command on client? [changed subject] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: changes to export whole FS hierarhy to mount it with one command on client? [changed subject] Thread-Index: +P5YmT8KfOaGOxZCs0iB7xzPOEUAhw== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Aug 2015 00:29:32 -0000 Julian Elischer wrote: > On 8/4/15 5:14 AM, Rick Macklem wrote: > > Julian Elischer wrote: > >> On 8/3/15 8:25 PM, Rick Macklem wrote: > >>> Julian Elischer wrote: > >>>> On 8/1/15 7:30 PM, Lev Serebryakov wrote: > >>>>> Hello Rick, > >>>>> > >>>>> Saturday, August 1, 2015, 2:21:10 PM, you wrote: > >>>>> > >>>>>> To mount multiple file systems as one mount, you'll need to use NFSv4. > >>>>>> I > >>>>>> believe > >>>>>> you will have to have a separate export entry in the server for each > >>>>>> of > >>>>>> the file > >>>>>> systems. > >>>>> So, /etc/exports needs to have BOTH v3-style exports & V4: root of > >>>>> tree > >>>>> line? > >>>> OR you can have a non-standard patch that pjd wrote to do recursive > >>>> mounts of sub-filesystems. > >>>> it is not supposed to happen according to the standard but we have > >>>> found it useful. > >>>> Unfortnately it is written agains the old NFS Server. > >>>> > >>>> Rick, if I gave you the original pjd patch for the old server, could > >>>> you integrate it into the new server as an option? > >>>> > >>> A patch like this basically inserts the file system volume identifier > >>> in the high order bits of the fileid# (inode# if you prefer), so that > >>> duplicate fileid#s don't show up in a "consolidated file system" (for > >>> want of a better term). It also replies with the same "fake" fsid for > >>> all volumes involved. > >>> > >>> I see certain issues w.r.t. this: > >>> 1 - What happens when the exported volumes are disjoint and don't form > >>> one tree? (I think any just option should be restricted to volumes > >>> that form a tree, but I don't know an easy way to enforce that > >>> restriction?) > >>> 2 - It would be fine at this point to use the high order bits of the > >>> fileid#, > >>> since NFSv3 defines it as 64bits and FreeBSD's ino_t is 32bits. > >>> However, > >>> I believe FreeBSD is going to have to increase ino_t to 64bits > >>> soon. > >>> (I hope such a patch will be in FreeBSD11.) > >>> Once ino_t is 64bits, this option would have to assume that some # > >>> of > >>> the high order bits of the fileid# are always 0. Something like > >>> "the high order 24bits are always 0" would work ok for a while, > >>> then > >>> someone would build a file system large enough to overflow the > >>> 40bit > >>> (I know that's a lot, but some are already exceeding 32bits for # > >>> of > >>> fileids) field and cause trouble. > >>> 3 - You could get weird behaviour when the tree includes exports with > >>> different > >>> export options. This discussion includes just that and NFSv3 > >>> clients > >>> don't expect things to change within a mount. (An example would be > >>> having > >>> part of this consolidated tree require Kerberos authentication. > >>> Another > >>> might be having parts of the consolidated tree use different uid > >>> mapping > >>> for AUTH_SYS.) > >>> 4 - Some file systems (msdosfs ie. FAT) have limited capabilities w.r.t. > >>> what > >>> the NFS server can do to the file system. If one of these was > >>> imbedded > >>> in > >>> the consolidated tree, then it could cause confusion similar to #3. > >>> > >>> All in all, the "hack" is relatively easy to do, if: > >>> You use one kind of file system (for example ZFS) and make everything you > >>> are > >>> exporting one directory tree which is all exported in a compatible way. > >>> You also "know" that all the fileid#s in the underlying file systems will > >>> fit > >>> in the low order K bits of the 64bit fileid#. > >>> > >>> My biggest concern is #2, once ino_t becomes 64bits. > >>> > >>> If the collective thinks this is a good idea despite the issues above and > >>> can > >>> propose a good way to do it. (Maybe an export flag for all the volumes > >>> that > >>> will participate in the "consolidated file system"? The exports(5) man > >>> page > >>> could then try to clearly explain the limitations of its use, etc. Even > >>> with > >>> that, I suspect some would misuse the option and cause themselves grief.) > >>> > >>> Personally, since NFSv4 does this correctly, I don't see a need to "hack > >>> it" > >>> for NFSv3, but I'll leave it up to the collective. > >>> > >>> rick > >>> ps: Julian, you might want to repost this under a separate subject line, > >>> so > >>> people not interested in how ZFS can export multiple volumes > >>> without > >>> separate entries will read it. > >>> > >> In our environment we need to export V3 (and maybe even V2) in a > >> single hierarchy, even though it's multiple ZFS filesystems. > >> It's not dissimilar to having a separate ZFS for each user, except in > >> this case it's a separate ZFS for each site. > >> The "modified ZFS" filesystems have very special characteristics. We > >> are only having our very first nibbles (questions) about NFSv4. Until > >> now it's all NFS3. Possibly we'd only have to support it for NFSv3 > >> if V4 can use its native mechanisms. > >> > >> > > Sure. You have a particular environment where it is useful and you > > understand > > how to use it in that situation. I could do it here in about 10minutes and > > would > > do so if I needed it myself. The trick is I understand what is going on and > > the > > limitations w.r.t. doing it. > > > > If you know your file systems are all in one directory hierarchy (tree), > > all are ZFS > > and none of them even generate fileid#s that don't fit in 32bits and you > > are exporting > > them all in the same way, it's pretty easy to do. > > Unfortunately, that isn't what generic NFS server support for FreeBSD does. > > (If this is done, I think it has to be somehow restricted to the above or > > at least > > documented that it only works for the above cases.) > > > > Since an NFSv2 fileid# is 32bits, I don't believe this is practical for > > NFSv2 > > and I don't think anyone would care. Since NFSv4 does this "out of the > > box", > > I think the question is whether or not it should be done for NFSv3? > > > > The challenge would be to put it in FreeBSD in a way that people who don't > > necessarily understand what is "behind the curtain" can use it effectively > > and not run into problems. (An example being the previous thread where the > > different file systems are being created with different characteristics for > > different users. That could be confusing if the sysadmin thought it was > > "one volume".) > > I'll leave whether or not to do this up to the collective. (This is yet > > another one > > of these "easy to code" but hard to tell if it the correct thing to do > > situations.) > > If done I'd suggest: > > - Restricted to one file system type (ZFS or UFS or ...). The code would > > probably > > have file system specifics in it. The correct way to do this will be > > different for > > ZFS than UFS, I think? > > - A check for any fileid# that has high order bits set that would syslog an > > error. > > - Enabled by an export option, so it doesn't automatically apply to all > > file systems > > on the server. This also provides a place for it to be documented, > > including limitations. > Oh, I remembered another issue related to this: - Until FreeBSD changes ino_t to 64bits, any "virtual consolidated file system" from an NFSv3 server needs to provide fileid#s that are unique in the low order 32bits so that it won't break the FreeBSD client. --> I think this implies that it shouldn't go into head until after ino_t becomes 64bits, at the earliest. Even then this option would break older FreeBSD clients. (The documentation for the export option would need to mention this.) Without 64bit ino_t, the fileid the server puts on the wire would need to be something like 24bits from the file system on the server + 8bits for which file system it is (to uniquify the 32bit value). Expecting the fileid# to fit in 24bits on the underlying file system seems too restrictive to me. rick > Well obviously I would like it, becasue we need it and I don't want to > have to maintain patches going forward. > If it is in the tree YOU work on then it would automatically get > updated as needed. The mount option is nice, but at the moment we just > have it wired on, and only export a single (PZFS) hierarchy. (PZFS is > our own heavily modified version of ZFS that uses Amazon storage(*) as > a block backend in parallel with the local drives (which are more a > cache.. the cloud is authoritative). > > (*) a gross simplification. > > Different parts of the hierarchy are actually be different cloud > 'buckets',, (e.g. theoretically some could be amazon and some could be > google cloud storage). These sub-filesystems are unified as a > hierachy of ZFS filesystems into a single storage hierarchy via PZFS > and exported to the user via NFS and CIFS/Samba. > > If I need to maintain a separate set of changes for the option then > that's life. but it's of course preferable to me to have it upstreamed. > > p.s. to any Filesystem types.. yes we are hiring FreeBSD filesystem > people.. > http://panzura.com/company/careers-panzura/senior-software-engineer/.. > resumes via me for fast track :-) .. > > > > > > > > > Anyhow, if anyone has an opinion on whether ir not this should be in > > FreeBSD, please post, rick > > > >