From owner-freebsd-arch@FreeBSD.ORG Sat Jun 14 09:43:15 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 550DD37B401 for ; Sat, 14 Jun 2003 09:43:15 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8381B43FA3 for ; Sat, 14 Jun 2003 09:43:14 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h5EGh7M7044570 for ; Sat, 14 Jun 2003 09:43:11 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200306141643.h5EGh7M7044570@gw.catspoiler.org> Date: Sat, 14 Jun 2003 09:43:07 -0700 (PDT) From: Don Lewis To: arch@FreeBSD.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Subject: vnode/buf locking deadlock between nfsiod and getblk() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Jun 2003 16:43:15 -0000 The one remaining vnode locking issue in the NFS client code that I'm aware of is that nfs_doio() does stuff with the vnode associated with the buf passed to it that requires the vnode lock to be held, but the vnode is not locked when nfs_nfsiod() calls nfs_doio(). I've made a couple of attempts to fix nfs_nfsiod() by locking the vnode, and I've always run into deadlocks like this: 43 c63f9b58 e4b3c000 0 0 0 0000204 [SLP]nfs 0xc687f794] nfsiod 0 570 c6728790 e6ddb000 1001 563 570 0004002 [SLP]getblk 0xd28980a4] ls mi_switch(c61c6000,50,c051cc8e,cd,0) at mi_switch+0x210 msleep(c687f794,c05dffc4,50,c05299ce,0) at msleep+0x484 acquire(e4b0ec6c,1000000,600,f1,c61c6000) at acquire+0x9e lockmgr(c687f794,1010002,c687f6d8,c61c6000,d2897fd8) at lockmgr+0x387 vop_sharedlock(e4b0ec9c,0,c0524105,360,e4b0ecb0) at vop_sharedlock+0x84 vn_lock(c687f6d8,20002,c61c6000,c0529f78,0) at vn_lock+0xe9 nfssvc_iod(c060d6e0,e4b0ed48,c051a213,30e,0) at nfssvc_iod+0x12a fork_exit(c03e3e10,c060d6e0,e4b0ed48) at fork_exit+0xc0 fork_trampoline() at fork_trampoline+0x1a mi_switch(c6729390,50,c0328ad0,c6729390,0) at mi_switch+0x210 msleep(d28980a4,c05e06ac,50,c0522430,c8) at msleep+0x484 acquire(e6dbc9e8,2000020,600,f1,c6729390) at acquire+0x9e lockmgr(d28980a4,2090022,c687f6d8,c6729390,c687f6d8) at lockmgr+0x387 BUF_TIMELOCK(d2897fd8,10022,c687f6d8,c0522430,0) at BUF_TIMELOCK+0x80 getblk(c687f6d8,1,0,1000,0) at getblk+0x141 nfs_getcacheblk(c687f6d8,1,0,1000,c6729390) at nfs_getcacheblk+0xc9 nfs_bioread(c687f6d8,e6dbccb4,0,c66f2d00,165) at nfs_bioread+0x87a nfs_readdir(e6dbcc34,c05158ca,c05b6c20,c687f6d8,e6dbccb4) at nfs_readdir+0xd4 VOP_READDIR(c687f6d8,e6dbccb4,c66f2d00,e6dbcc84,0) at VOP_READDIR+0x67 getdirentries(c6729390,e6dbcd10,c0537b17,3fd,4) at getdirentries+0x11d syscall(2f,2f,2f,80e2600,80d9040) at syscall+0x26e Xint0x80_syscall() at Xint0x80_syscall+0x1d In this case, 'ls' had a vnode locked and was trying to lock a buf, and 'nfsiod' was waiting to obtain a lock on the same vnode. I finally dug around in the code and discovered that the problem is fairly fundamental. If a thread calls VOP_STRATEGY() to for asynchronous I/O on an NFS mounted filesystem, or if it calls nfs_biord() which decides to do readahead, the request is handled by nfs_asyncio(), which uses BUF_KERNPROC() to transfer ownership of the buf lock to the system, and queues the buf on nmp->nm_bufq for nfsiod to handle later. Everything is fine if nfsiod is able to service the request before another thread requests the buf. The problem occurs when another thread attempts to do I/O on the file, grabs the vnode lock and then tries to grab the buf lock before nfsiod has gotten around servicing the request. The thread requesting the I/O can't proceed until it gets the buf lock, which won't happen until the the queue request has been serviced, and nfsiod can handle the I/O request in the buf because it can't obtain the vnode lock. The only reason that we don't see this failure is that nfsiod is not requesting the vnode lock and is allowing nfs_doio() to play with an unlocked vnode (or one locked by another thread). I came up with three possible ways of fixing this, none of which sound very appealing: Fix nfs_doio() so that it and the functions that it calls don't touch any vnode fields that require the vnode lock. When attempting to lock a buf whose current lockholder is LK_KERNPROC, back off by dropping the vnode lock and retrying. When attempting to lock a buf whose current lockholder is LK_KERNPROC, steal the buf back and do the requested I/O synchronously before proceeding if the previously requested I/O was not already in progress. Comments? Suggestions?