From owner-freebsd-arch@FreeBSD.ORG  Sat Jun 14 09:43:15 2003
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 550DD37B401
	for <arch@FreeBSD.org>; Sat, 14 Jun 2003 09:43:15 -0700 (PDT)
Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8381B43FA3
	for <arch@FreeBSD.org>; Sat, 14 Jun 2003 09:43:14 -0700 (PDT)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h5EGh7M7044570
	for <arch@FreeBSD.org>; Sat, 14 Jun 2003 09:43:11 -0700 (PDT)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200306141643.h5EGh7M7044570@gw.catspoiler.org>
Date: Sat, 14 Jun 2003 09:43:07 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
To: arch@FreeBSD.org
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Subject: vnode/buf locking deadlock between nfsiod and getblk()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Jun 2003 16:43:15 -0000

The one remaining vnode locking issue in the NFS client code that I'm
aware of is that nfs_doio() does stuff with the vnode associated with
the buf passed to it that requires the vnode lock to be held, but the
vnode is not locked when nfs_nfsiod() calls nfs_doio().

I've made a couple of attempts to fix nfs_nfsiod() by locking the vnode,
and I've always run into deadlocks like this:

   43 c63f9b58 e4b3c000    0     0     0 0000204 [SLP]nfs 0xc687f794] nfsiod 0

  570 c6728790 e6ddb000 1001   563   570 0004002 [SLP]getblk 0xd28980a4] ls

mi_switch(c61c6000,50,c051cc8e,cd,0) at mi_switch+0x210
msleep(c687f794,c05dffc4,50,c05299ce,0) at msleep+0x484
acquire(e4b0ec6c,1000000,600,f1,c61c6000) at acquire+0x9e
lockmgr(c687f794,1010002,c687f6d8,c61c6000,d2897fd8) at lockmgr+0x387
vop_sharedlock(e4b0ec9c,0,c0524105,360,e4b0ecb0) at vop_sharedlock+0x84
vn_lock(c687f6d8,20002,c61c6000,c0529f78,0) at vn_lock+0xe9
nfssvc_iod(c060d6e0,e4b0ed48,c051a213,30e,0) at nfssvc_iod+0x12a
fork_exit(c03e3e10,c060d6e0,e4b0ed48) at fork_exit+0xc0
fork_trampoline() at fork_trampoline+0x1a

mi_switch(c6729390,50,c0328ad0,c6729390,0) at mi_switch+0x210
msleep(d28980a4,c05e06ac,50,c0522430,c8) at msleep+0x484
acquire(e6dbc9e8,2000020,600,f1,c6729390) at acquire+0x9e
lockmgr(d28980a4,2090022,c687f6d8,c6729390,c687f6d8) at lockmgr+0x387
BUF_TIMELOCK(d2897fd8,10022,c687f6d8,c0522430,0) at BUF_TIMELOCK+0x80
getblk(c687f6d8,1,0,1000,0) at getblk+0x141
nfs_getcacheblk(c687f6d8,1,0,1000,c6729390) at nfs_getcacheblk+0xc9
nfs_bioread(c687f6d8,e6dbccb4,0,c66f2d00,165) at nfs_bioread+0x87a
nfs_readdir(e6dbcc34,c05158ca,c05b6c20,c687f6d8,e6dbccb4) at nfs_readdir+0xd4
VOP_READDIR(c687f6d8,e6dbccb4,c66f2d00,e6dbcc84,0) at VOP_READDIR+0x67
getdirentries(c6729390,e6dbcd10,c0537b17,3fd,4) at getdirentries+0x11d
syscall(2f,2f,2f,80e2600,80d9040) at syscall+0x26e
Xint0x80_syscall() at Xint0x80_syscall+0x1d

In this case, 'ls' had a vnode locked and was trying to lock a buf, and
'nfsiod' was waiting to obtain a lock on the same vnode.


I finally dug around in the code and discovered that the problem is
fairly fundamental.  If a thread calls VOP_STRATEGY() to for
asynchronous I/O on an NFS mounted filesystem, or if it calls
nfs_biord() which decides to do readahead, the request is handled by
nfs_asyncio(), which uses BUF_KERNPROC() to transfer ownership of the
buf lock to the system, and queues the buf on nmp->nm_bufq for nfsiod to
handle later.

Everything is fine if nfsiod is able to service the request before
another thread requests the buf.  The problem occurs when another thread
attempts to do I/O on the file, grabs the vnode lock and then tries to
grab the buf lock before nfsiod has gotten around servicing the request.
The thread requesting the I/O can't proceed until it gets the buf lock,
which won't happen until the the queue request has been serviced, and
nfsiod can handle the I/O request in the buf because it can't obtain the
vnode lock.  The only reason that we don't see this failure is that
nfsiod is not requesting the vnode lock and is allowing nfs_doio() to
play with an unlocked vnode (or one locked by another thread).

I came up with three possible ways of fixing this, none of which sound
very appealing:

	Fix nfs_doio() so that it and the functions that it calls don't
        touch any vnode fields that require the vnode lock.

	When attempting to lock a buf whose current lockholder is
        LK_KERNPROC, back off by dropping the vnode lock and retrying.

	When attempting to lock a buf whose current lockholder is
        LK_KERNPROC, steal the buf back and do the requested I/O
        synchronously before proceeding if the previously requested I/O
        was not already in progress.

Comments?  Suggestions?