From owner-freebsd-stable@FreeBSD.ORG Mon Mar 13 17:16:17 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0632116A400 for ; Mon, 13 Mar 2006 17:16:16 +0000 (UTC) (envelope-from miguel@anjos.strangled.net) Received: from compaq.anjos.strangled.net (87-196-139-138.net.novis.pt [87.196.139.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1D51A43D48 for ; Mon, 13 Mar 2006 17:16:03 +0000 (GMT) (envelope-from miguel@anjos.strangled.net) Received: from compaq.anjos.strangled.net (localhost [127.0.0.1]) by compaq.anjos.strangled.net (8.13.4/8.13.4) with ESMTP id k2DHG1kb044624; Mon, 13 Mar 2006 17:16:03 GMT (envelope-from miguel@compaq.anjos.strangled.net) Received: (from miguel@localhost) by compaq.anjos.strangled.net (8.13.4/8.13.4/Submit) id k2DHFxOe044623; Mon, 13 Mar 2006 17:15:59 GMT (envelope-from miguel) Date: Mon, 13 Mar 2006 17:15:59 GMT From: Miguel Lopes Santos Ramos Message-Id: <200603131715.k2DHFxOe044623@compaq.anjos.strangled.net> To: kris@obsecurity.org In-Reply-To: <20060310220452.GA33878@xor.obsecurity.org> Cc: kuriyama@imgsrc.co.jp, freebsd-stable@freebsd.org Subject: Re: rpc.lockd brokenness (2) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2006 17:16:17 -0000 > I did some further testing and it turns out that rpc.lockd is broken > in some cases when operating over NFSv2 (this is the default for nfs > root mounts). > > Tracing the lock traffic I see the client making a request, the server > replying but the client never acting on the reply (or never receiving > it), so it just retransmits every 20 seconds forever. That is what I saw Friday, using the debug log mainly. I was expecting to find some error message, but I only saw the repetition of something that seemed to be ok. I tried vainly to look into rpc.lockd but it's not at all simple. But my greatest frustration was than when I started rpc.lockd with -d on the client, the problem did never occur. It didn't occur to me that the difference between this and other clients was that the diskless mount is NFSv2. I can't be 100% sure that the problem I have is the same you observed, but locking works on this client on other mounts (home directories through amd, NFSv3). It really seems an NFSv2 specific issue. > I'm not yet sure whether this is a regression in 6.x or another case > that was broken forever. I didn't have problems in 5. I just compiled a 6.0-RELEASE kernel, and it is also broken. > Unfortunately there's currently no option to use NFSv3 for nfs root > mounts to work around this (unless you're using bootp), but it should > just be a trivial matter of adding "| NFSMNT_NFSV3" to the flags in > nfsclient/nfs_diskless.c:nfs_setup_diskless(): > > nd->root_args.flags = (NFSMNT_WSIZE | NFSMNT_RSIZE | NFSMNT_RESVPORT); It was only today that I could try your sugestion. But... I get a kernel panic, it can't find init... Looking is nfsclient/bootp_subr.c, it looks like there's a little more to do when mounting via NFSv3. Well, this doesn't work, but thanks to your sugestion, by looking in nfs_diskless.c, I found a loader option to disable lockd, boot.nfsroot.options=lockd. This option is new (it doesn't exist on 6.0). Now I can lock any file not only on /var, but also on /etc, etc. (remember this option in fstab wasn't honored for the root mount) Everything works. Locking in shared home directories also work, because they're NFSv3 mounts (I tried it already...). So, I finally have it working, and all I needed was having this in loader.conf: boot.nfsroot.options=lockd. I'm quite tired of this issue, so, for all I'm concerned, I'm done. Is the NFSv2/rpc.lockd issue reported? Is there any information more that I can provide? I'm available for further information and testing if anyone can't reproduce the bug. I'm glad you could, no daemons on my machine... I failed finding a way to reproduce it on other machines using mount_nfs -2, so aditional assistance may be needed to the developers. If the problem is reported and no further information is needed from me, then I can only thank you and congratulate you for your great effort in understanding what was wrong and pointing a way to work around it. Thank you, Kris, Miguel