From owner-freebsd-fs@FreeBSD.ORG Sat Feb 19 21:20:25 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 795171065672; Sat, 19 Feb 2011 21:20:25 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 07FF08FC12; Sat, 19 Feb 2011 21:20:24 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAAvDX02DaFvO/2dsb2JhbACEIKMMqVePc4Eng0F2BIUNhwY X-IronPort-AV: E=Sophos;i="4.62,192,1297054800"; d="scan'208";a="110434568" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 19 Feb 2011 16:20:24 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4BE9FB3F2F; Sat, 19 Feb 2011 16:20:24 -0500 (EST) Date: Sat, 19 Feb 2011 16:20:24 -0500 (EST) From: Rick Macklem To: Lawrence Stewart Message-ID: <1194340518.139022.1298150424303.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4D5F0825.9010607@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE8 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: Mounting NFSv4 as root fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Feb 2011 21:20:25 -0000 > On 02/19/11 08:38, Rick Macklem wrote: > >> Hi Rick, > >> > >> I've set up a NFS server to pxeboot a set of testbed clients from. > >> The > >> server filesystem tree the client needs to use as its root has > >> nullfs > >> mounted directories in it. Therefore, NFSv4 is the only useful way > >> to > >> mount it on the client because of the cross mount point traversing > >> capabilities built into v4. I've verified that I can "mount_nfs -o > >> nfsv4 > >> ..." on the command line and see all the files in the tree so I > >> have > >> things working fine on the server side. > >> > >> I was aware our pxeboot only supports NFSv3, but hoped that by > >> specifying "newfs" and "nfsv4" in the fstype and options fields > >> respectively in fstab that things might just work when the mount > >> root > >> step after the kernel boot happens. It doesn't as I found out, > >> because > >> of two issues: > >> > >> 1. I believe there is a bug in the newnfs code. nfs_diskless.c > >> wasn't > >> copied from the old nfsclient and suitably modified for use with > >> newnfs. > >> As a result during boot, the ncl_mountroot() function in > >> nfs_clvfsops.c > >> calls nfs_setup_diskless() which calls into the old nfs code and > >> badness > >> happens from there on in. I have a patch which fixes this issue, > >> though > >> it may be completely the wrong way to do things as I'm very new (as > >> in > >> 24 hours new) to the NFS code. > >> > > Yep. I didn't see an easy way to set up the diskless root so that it > > would > > work for both clients concurrently, so I was planning on switching > > it if/when > > "newnfs" becomes the default client. (You can switch fairly easily. > > Just > > crib the code across, as it sounds like you have and then make sure > > the > > xxx_mountroot() in "newnfs" gets called instead of nfs_mountroot() > > in the > > other one. > > Yes that's exactly what I did. > > > However, that will just get a "newnfs" NFSv3 root mount to work. > > Yup, confirmed working as expected (mount output shows "newnfs" for / > whereas before it would fall back to "nfs" after the newnfs code > crapped > out. > > >> 2. pxeboot stores the filehandle and filehandle length it used to > >> grab > >> the kernel via NFS in the kernel's env and after the kernel has > >> booted, > >> it looks for these variables and reuses them i.e. at no point in > >> the > >> process does the code attempt to upgrade to NFSv4 if the bootstrap > >> uses > >> NFSv3 to grab the kernel. > >> > >> For my particular use case, I'm quite happy for the kernel to be > >> pulled > >> via NFSv3, but can't boot the client without somehow getting the > >> client > >> to switch to NFSv4 at the point where it mount's root after the > >> kernel > >> has finished booting. > >> > >> I tried a very hacky test in mountnfs() in nfs_clvfsops.c to see if > >> I > >> could set the NFSV4 flag, unset the V3 flag and tell the code to > >> forget > >> about the cached file handle set by the loader just to see if the > >> code > >> would try to renegotiate using v4... it crashed and burned. > >> > > The same file handle should work for NFSv4 (at least a FReeBSD > > server > > generates the same FH for a v3 vs v4 mount). > > Ah, interesting and good to know, thanks. So assuming the server is v4 > capable, you can just start issuing v4 RPCs to the handle established > by > pxeboot and things should keep working? > Should might be too strong a word, but at least for the FreeBSD server, yes. (I don't know about other servers. I just suspect that the FH's will be the same. The change from NFSv2 -> NFSv3 was caused by the NFSv3 making it a variable size. (Most servers fill the same FH in for NFSv2 and then just pad it to 32bytes.) > >> So, before I spend any more time on this, I hope to get your (or > >> anyone > >> else reading for that matter) thoughts on how best to proceed. Some > >> questions: > >> > >> - Could you guesstimate how much work is involved to get v4 support > >> into > >> libstand so that pxeboot can talk v4 natively? I spent quite some > >> time > >> poking at libstand's code last night but don't understand the NFSv4 > >> RPC > >> mechanism enough to attempt writing the basic code to do it yet. > >> The > >> RFC > >> explains the ordering of OPs needed quit well but I don't quite > >> grok > >> how > >> the data structures for interpreting responses work. > >> > > Lots. It will be easier to get the kernel to use v4 after pxeboot > > has > > loaded it via v3. > > ACK. > > >> - Can you think of a hacky simple way to force my client to > >> renegotiate > >> the mount as v4 at the time mount root happens? > >> > > If you are will to spend man weeks on this, you can probably get > > something to work for your lab (useless for others, because you'll > > have to hard wire a bunch of stuff into the kernel like your DNS > > domain name...). > > > > I have never intended to try and make an NFSv4 root mount work. > > (Someone said NFSv4 is NFS in name only:-) > > > > One of the most difficult parts will be the uid/gid<->name mapping. > > You would have to hack this enough so that it worked without > > nfsuserd. > > Something like hard wiring mappings into the kernel cache for enough > > entries that the root works. (Note that names look like > > root@cis.uoguelph.ca, > > so it needs to know the DNS domain as well as "root" == uid 0.) > > Then hopefully you don't need other mappings to work, because it > > would > > have to work without nfsuserd running and with nfsuserd running (in > > the > > root fs). > > > > Short answer. A severely hacked kernel might work for your lab, but > > a > > generic solution for FreeBSD would be very difficult. > > Thanks heaps for the brain dump, it really helps put things in > perspective. It's sounding like a much bigger job than I thought it > would be, even for a hacked up lab-only solution. > > > If you could move the "nullfs" mounts down a level, so the NFSv4 > > mount > > was below an NFSv3 root fs, that would be much easier. > > Agreed. The issue is we're using the ezjail management script from > ports > to manage the bootable client filesystems on the server, and it uses > nullfs mounts between a base filesystem and the client filesystems to > avoid duplicating all the utilities/libs in /bin, /sbin, /lib and > /libexec multiple times. Works well but not for this use case... oh > well. > > I guess it will be significantly easier to hack ezjail to just copy > the > dirs from the basejail into each client rather than try get the all > singing all dancing NFSv4 option going. > > Thanks again for your insights. > > Cheers, > Lawrence