From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 3 13:38:06 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CFFAB16A417 for ; Thu, 3 Jan 2008 13:38:06 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com [72.36.161.186]) by mx1.freebsd.org (Postfix) with ESMTP id B0FA813C458 for ; Thu, 3 Jan 2008 13:38:06 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from proton.storspeed.com (209-163-168-124.static.tenantsolutions.net [209.163.168.124] (may be forged)) (authenticated bits=0) by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id m03DbiG1075468; Thu, 3 Jan 2008 07:37:45 -0600 (CST) (envelope-from anderson@freebsd.org) Message-ID: <477CE528.7070404@freebsd.org> Date: Thu, 03 Jan 2008 07:37:44 -0600 From: Eric Anderson User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Danny Braniss References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com Cc: freebsd-hackers@freebsd.org Subject: Re: nfs v2/v3 and diskless boot problem X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2008 13:38:06 -0000 Danny Braniss wrote: >> Danny Braniss wrote: >>>> Danny Braniss wrote: >>>>>> Danny Braniss wrote: >>>>>>> there is an undocumented option: >>>>>>> boot-nfsroot-options >>>>>>> that the diskeless boot can use. I tried >>>>>>> boot-nfsroot-options = "nfsv3" >>>>>>> since the pxeboot does the initial mount via nfsv2, and this has at least >>>>>>> one problem: removing a file from the readonly / will hang the system. >>>>>>> >>>>>>> so, the remount to v3 works in the case that the root is served by a Freebsd >>>>>>> nfs server, but fails if it's NetAPP. The reason is that the v2 filehandle >>>>>>> is 32 bytes, and when switching to V3 it becomes 28bytes - sizeof(fhandle_t). >>>>>>> This is not liked by the NetApp, which correctly gives error 1001: BADHANDLE >>>>>>> :-) >>>>>>> >>>>>>> While I'm trying to come up with a solution, I am wondering if someone >>>>>>> can shed some light: >>>>>>> - is sizeof(fhandle_t) == 28 bytes is mystical, or changing it to >>>>>>> 32 bytes will start WW3? >>>>>> NFSv3 file handles (by spec) can be up to 64bytes. >>>>> true, but in freebsd, look at sys/nfs/nfsproto.h >>>>> #define NFSX_V2FH 32 >>>>> #define NFSX_V3FH (sizeof (fhandle_t)) >>>>> #define NFSX_V4FH 128 >>>>> >>>>> so for v3 it's 28 bytes. (fhandle_t is defined in sys/mount.h) >>>>> >>>>> >>>>>> I'm not 100% sure what is happening, but it sounds like the file handle >>>>>> for the mount point or maybe one of the directories is not getting reset >>>>>> on remount. >>>>>> >>>>>> When do you get the BADHANDLE error? Can you capture a >>>>>> tshark/wireshark/tcpdump of the remount and error? >>>>> I did, and if you look in sys/nfsclient/nfs_vfsops.c, nfs_convert_diskless is responsible >>>>> for chopping off the 4 extra bytes. BTW, I tried to change the bcopy count to NFSX_V2FH/32, and >>>>> it panics the kernel :-( >>>>> >>>>> danny >>>> oh - looks like this says it all: >>>> http://fxr.googlebit.com/source/sys/nfsclient/nfsdiskless.h?v=8-CURRENT#L51 >>>> >>> that's where the boot-nfsroot-options comes from:-) >>> if you notice, the filehandle for v3 is 64 bytes, but >>> only 28 are used. >>> >>> but as I mentioned initially, this ONLY works when the server is FreeBSD, and >>> breaks for other servers, ie NetAPP. AND the initial question stands: >>> what's in a filehandle, or can it be > 28bytes. >> >> Yea, FreeBSD is making the assumption that all NFS servers will use the >> same size FH for NFSv3. That is just wrong. >> > carful, I think this is the case only if fsb is the server, it will 'probably' > accept > filehandles of other sizes from other servers. I'm talking about the diskless root mounting code only at this point.. >> The FH is a server created opaque handle that it can create however it >> wishes. Most servers use information like inode, generation, fsid, etc >> to create it, but it's something that you can't necessarily decode. >> > yes, but the FH has information that the server can/must use to figure out > which local filesystem it refers to - remember that v2/v3 are stateless. Right, see my list right above your comment: inode, generation, fsid. Those three can uniquely identify a file on a filesystem on a server. There can be anything the server wants to stuff in the FH, or the FH can be a random number assigned to that file, etc. >> I've created a patch that might fix this, but I'm still testing and QEMU >> (which I use for my testing) keeps making my system either panic or lock >> up, so hopefully I should have something for you to try tonight. >> >> Also - can you tell me the exact 'mount' command you tried to do the >> remount/update? >> > it's only in the diskless boot, where setting > boot-nfsroot-options = "nfsv3" > in /boot/loader.conf will do the remount. Ok - I'll do a little more testing on my patch tonight and let you know. Eric