Date: Sat, 19 Feb 2011 16:20:24 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: Lawrence Stewart <lstewart@freebsd.org> Cc: freebsd-fs@freebsd.org Subject: Re: Mounting NFSv4 as root fs Message-ID: <1194340518.139022.1298150424303.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4D5F0825.9010607@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 02/19/11 08:38, Rick Macklem wrote: > >> Hi Rick, > >> > >> I've set up a NFS server to pxeboot a set of testbed clients from. > >> The > >> server filesystem tree the client needs to use as its root has > >> nullfs > >> mounted directories in it. Therefore, NFSv4 is the only useful way > >> to > >> mount it on the client because of the cross mount point traversing > >> capabilities built into v4. I've verified that I can "mount_nfs -o > >> nfsv4 > >> ..." on the command line and see all the files in the tree so I > >> have > >> things working fine on the server side. > >> > >> I was aware our pxeboot only supports NFSv3, but hoped that by > >> specifying "newfs" and "nfsv4" in the fstype and options fields > >> respectively in fstab that things might just work when the mount > >> root > >> step after the kernel boot happens. It doesn't as I found out, > >> because > >> of two issues: > >> > >> 1. I believe there is a bug in the newnfs code. nfs_diskless.c > >> wasn't > >> copied from the old nfsclient and suitably modified for use with > >> newnfs. > >> As a result during boot, the ncl_mountroot() function in > >> nfs_clvfsops.c > >> calls nfs_setup_diskless() which calls into the old nfs code and > >> badness > >> happens from there on in. I have a patch which fixes this issue, > >> though > >> it may be completely the wrong way to do things as I'm very new (as > >> in > >> 24 hours new) to the NFS code. > >> > > Yep. I didn't see an easy way to set up the diskless root so that it > > would > > work for both clients concurrently, so I was planning on switching > > it if/when > > "newnfs" becomes the default client. (You can switch fairly easily. > > Just > > crib the code across, as it sounds like you have and then make sure > > the > > xxx_mountroot() in "newnfs" gets called instead of nfs_mountroot() > > in the > > other one. > > Yes that's exactly what I did. > > > However, that will just get a "newnfs" NFSv3 root mount to work. > > Yup, confirmed working as expected (mount output shows "newnfs" for / > whereas before it would fall back to "nfs" after the newnfs code > crapped > out. > > >> 2. pxeboot stores the filehandle and filehandle length it used to > >> grab > >> the kernel via NFS in the kernel's env and after the kernel has > >> booted, > >> it looks for these variables and reuses them i.e. at no point in > >> the > >> process does the code attempt to upgrade to NFSv4 if the bootstrap > >> uses > >> NFSv3 to grab the kernel. > >> > >> For my particular use case, I'm quite happy for the kernel to be > >> pulled > >> via NFSv3, but can't boot the client without somehow getting the > >> client > >> to switch to NFSv4 at the point where it mount's root after the > >> kernel > >> has finished booting. > >> > >> I tried a very hacky test in mountnfs() in nfs_clvfsops.c to see if > >> I > >> could set the NFSV4 flag, unset the V3 flag and tell the code to > >> forget > >> about the cached file handle set by the loader just to see if the > >> code > >> would try to renegotiate using v4... it crashed and burned. > >> > > The same file handle should work for NFSv4 (at least a FReeBSD > > server > > generates the same FH for a v3 vs v4 mount). > > Ah, interesting and good to know, thanks. So assuming the server is v4 > capable, you can just start issuing v4 RPCs to the handle established > by > pxeboot and things should keep working? > Should might be too strong a word, but at least for the FreeBSD server, yes. (I don't know about other servers. I just suspect that the FH's will be the same. The change from NFSv2 -> NFSv3 was caused by the NFSv3 making it a variable size. (Most servers fill the same FH in for NFSv2 and then just pad it to 32bytes.) > >> So, before I spend any more time on this, I hope to get your (or > >> anyone > >> else reading for that matter) thoughts on how best to proceed. Some > >> questions: > >> > >> - Could you guesstimate how much work is involved to get v4 support > >> into > >> libstand so that pxeboot can talk v4 natively? I spent quite some > >> time > >> poking at libstand's code last night but don't understand the NFSv4 > >> RPC > >> mechanism enough to attempt writing the basic code to do it yet. > >> The > >> RFC > >> explains the ordering of OPs needed quit well but I don't quite > >> grok > >> how > >> the data structures for interpreting responses work. > >> > > Lots. It will be easier to get the kernel to use v4 after pxeboot > > has > > loaded it via v3. > > ACK. > > >> - Can you think of a hacky simple way to force my client to > >> renegotiate > >> the mount as v4 at the time mount root happens? > >> > > If you are will to spend man weeks on this, you can probably get > > something to work for your lab (useless for others, because you'll > > have to hard wire a bunch of stuff into the kernel like your DNS > > domain name...). > > > > I have never intended to try and make an NFSv4 root mount work. > > (Someone said NFSv4 is NFS in name only:-) > > > > One of the most difficult parts will be the uid/gid<->name mapping. > > You would have to hack this enough so that it worked without > > nfsuserd. > > Something like hard wiring mappings into the kernel cache for enough > > entries that the root works. (Note that names look like > > root@cis.uoguelph.ca, > > so it needs to know the DNS domain as well as "root" == uid 0.) > > Then hopefully you don't need other mappings to work, because it > > would > > have to work without nfsuserd running and with nfsuserd running (in > > the > > root fs). > > > > Short answer. A severely hacked kernel might work for your lab, but > > a > > generic solution for FreeBSD would be very difficult. > > Thanks heaps for the brain dump, it really helps put things in > perspective. It's sounding like a much bigger job than I thought it > would be, even for a hacked up lab-only solution. > > > If you could move the "nullfs" mounts down a level, so the NFSv4 > > mount > > was below an NFSv3 root fs, that would be much easier. > > Agreed. The issue is we're using the ezjail management script from > ports > to manage the bootable client filesystems on the server, and it uses > nullfs mounts between a base filesystem and the client filesystems to > avoid duplicating all the utilities/libs in /bin, /sbin, /lib and > /libexec multiple times. Works well but not for this use case... oh > well. > > I guess it will be significantly easier to hack ezjail to just copy > the > dirs from the basejail into each client rather than try get the all > singing all dancing NFSv4 option going. > > Thanks again for your insights. > > Cheers, > Lawrence
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1194340518.139022.1298150424303.JavaMail.root>