From owner-freebsd-current@FreeBSD.ORG Wed May 7 12:36:34 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 299B037B405 for ; Wed, 7 May 2003 12:36:34 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id BF0A043F85 for ; Wed, 7 May 2003 12:36:32 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h47JaMM7033225 for ; Wed, 7 May 2003 12:36:25 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200305071936.h47JaMM7033225@gw.catspoiler.org> Date: Wed, 7 May 2003 12:36:22 -0700 (PDT) From: Don Lewis To: current@FreeBSD.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Subject: CFR: NFS server vnode locking patch X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 May 2003 19:36:34 -0000 I managed to find some time to take a closer look at vnode locking in the NFS server code and found that the situation was worse than I initially thought. I've put together a patch that seems to fix all the bugs that I found. With this patch, the code passes the simple tests that I wrote as well as NFS mounting a local directory on /usr/obj and running "make -j10 buildworld" (after I cranked up vfs.hirunningspace and vfs.lorunningspace by 50x to avoid the wdrain bio deadlock I mentioned yesterday), all with the DEBUG_VFS_LOCKS kernel option enabled. The NFS server code was in bad shape from being hacked on too many times before I touched it and it looks like it has accumulated some historical baggage, and my changes certainly don't help. I attempted to match the existing style and control flow since I wanted to minimize the changes at the time to avoid introducing new bugs, but this meant that I had to duplicate some code in a number of places. I saw two possible ways of getting the initial dirp attributes. One was to set LOCKPARENT on the first lookup() call in nfs_namei() and cap VOP_GETATTR() at that point. I chose the other possible implementation, which was to temporarily lock the dirp and call VOP_GETATTR() before the loop, because this change was simpler. The NFS server code badly needs a rewrite by someone who understands it well. I'm hoping to get enough review and testing of this patch so that I can get re approval to fix vnode locking in the NFS server code for 5.1.