From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 3 12:46:49 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2BC1416A419 for ; Thu, 3 Jan 2008 12:46:49 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com [72.36.161.186]) by mx1.freebsd.org (Postfix) with ESMTP id ED1D013C45B for ; Thu, 3 Jan 2008 12:46:48 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from proton.storspeed.com (209-163-168-124.static.tenantsolutions.net [209.163.168.124] (may be forged)) (authenticated bits=0) by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id m03CkYq1044287; Thu, 3 Jan 2008 06:46:35 -0600 (CST) (envelope-from anderson@freebsd.org) Message-ID: <477CD92A.9090906@freebsd.org> Date: Thu, 03 Jan 2008 06:46:34 -0600 From: Eric Anderson User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Danny Braniss References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com Cc: freebsd-hackers@freebsd.org Subject: Re: nfs v2/v3 and diskless boot problem X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2008 12:46:49 -0000 Danny Braniss wrote: >> Danny Braniss wrote: >>>> Danny Braniss wrote: >>>>> there is an undocumented option: >>>>> boot-nfsroot-options >>>>> that the diskeless boot can use. I tried >>>>> boot-nfsroot-options = "nfsv3" >>>>> since the pxeboot does the initial mount via nfsv2, and this has at least >>>>> one problem: removing a file from the readonly / will hang the system. >>>>> >>>>> so, the remount to v3 works in the case that the root is served by a Freebsd >>>>> nfs server, but fails if it's NetAPP. The reason is that the v2 filehandle >>>>> is 32 bytes, and when switching to V3 it becomes 28bytes - sizeof(fhandle_t). >>>>> This is not liked by the NetApp, which correctly gives error 1001: BADHANDLE >>>>> :-) >>>>> >>>>> While I'm trying to come up with a solution, I am wondering if someone >>>>> can shed some light: >>>>> - is sizeof(fhandle_t) == 28 bytes is mystical, or changing it to >>>>> 32 bytes will start WW3? >>>> NFSv3 file handles (by spec) can be up to 64bytes. >>> true, but in freebsd, look at sys/nfs/nfsproto.h >>> #define NFSX_V2FH 32 >>> #define NFSX_V3FH (sizeof (fhandle_t)) >>> #define NFSX_V4FH 128 >>> >>> so for v3 it's 28 bytes. (fhandle_t is defined in sys/mount.h) >>> >>> >>>> I'm not 100% sure what is happening, but it sounds like the file handle >>>> for the mount point or maybe one of the directories is not getting reset >>>> on remount. >>>> >>>> When do you get the BADHANDLE error? Can you capture a >>>> tshark/wireshark/tcpdump of the remount and error? >>> I did, and if you look in sys/nfsclient/nfs_vfsops.c, nfs_convert_diskless is responsible >>> for chopping off the 4 extra bytes. BTW, I tried to change the bcopy count to NFSX_V2FH/32, and >>> it panics the kernel :-( >>> >>> danny >> >> oh - looks like this says it all: >> http://fxr.googlebit.com/source/sys/nfsclient/nfsdiskless.h?v=8-CURRENT#L51 >> > that's where the boot-nfsroot-options comes from:-) > if you notice, the filehandle for v3 is 64 bytes, but > only 28 are used. > > but as I mentioned initially, this ONLY works when the server is FreeBSD, and > breaks for other servers, ie NetAPP. AND the initial question stands: > what's in a filehandle, or can it be > 28bytes. Yea, FreeBSD is making the assumption that all NFS servers will use the same size FH for NFSv3. That is just wrong. The FH is a server created opaque handle that it can create however it wishes. Most servers use information like inode, generation, fsid, etc to create it, but it's something that you can't necessarily decode. I've created a patch that might fix this, but I'm still testing and QEMU (which I use for my testing) keeps making my system either panic or lock up, so hopefully I should have something for you to try tonight. Also - can you tell me the exact 'mount' command you tried to do the remount/update? Eric