From owner-freebsd-current@FreeBSD.ORG Fri Apr 25 01:44:46 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2D79037B401 for ; Fri, 25 Apr 2003 01:44:46 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 78CDD43FDD for ; Fri, 25 Apr 2003 01:44:45 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (scratch.catspoiler.org [192.168.101.3]) by gw.catspoiler.org (8.12.6/8.12.6) with ESMTP id h3P8idXB034162 for ; Fri, 25 Apr 2003 01:44:43 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200304250844.h3P8idXB034162@gw.catspoiler.org> Date: Fri, 25 Apr 2003 01:44:38 -0700 (PDT) From: Don Lewis To: current@FreeBSD.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Subject: vnode lockings bug in -current NFS server code X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Apr 2003 08:44:46 -0000 I just exercised the NFS server code on my -current box for the first time and stumbled across some vnode locking bugs that were caught by the DEBUG_VFS_LOCKS configuration option. I found problems in nfsrv_lookup() and nfsrv_create(). There are a number places in the NFS server code that call VOP_GETATTR() on the vnode returned through the retdirp parameter to nfs_namei(). VOP_GETATTR() wants the vnode to be locked, but nfs_namei() does not explicitly lock this vnode. This is the directory used by nfs_namei() as the starting point of it's lookup. For normal NFS lookups, I believe it will be the same as the parent directory of the filesystem object being looked up, because normal NFS lookups only process one pathname component at a time and the server doesn't follow symlinks. This is not true of WebNFS. The vnode may end up being locked if the LOCKPARENT flag has been passed to the caller and retdirp ends up pointing to the parent vnode returned by nfs_namei(), or possibly if nfs_namei() follows a symlink in the WebNFS case and retdirp and leaf object are the same vnode. Because of this, it is not safe for the code that calls nfs_namei() to just call VN_LOCK() before calling VOP_GETATTR(). It is also unsafe because another process could be attempting to lock vnodes in a different order at the same time, causing a deadlock. It appears that it may be possible to rearrange the code to defer the call to VOP_GETATTR() until after the other vnodes have been unlocked, when it would be safe to just unconditionally lock the starting directory vnode. This code is a maze of twisty little passages and would require more time to implement a proper fix than I can devote to it at the present time. If someone is feeling bored ...