From owner-freebsd-current@FreeBSD.ORG  Sat Jul  7 13:26:21 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 72C4016A469
	for <current@freebsd.org>; Sat,  7 Jul 2007 13:26:21 +0000 (UTC)
	(envelope-from dfr@rabson.org)
Received: from itchy.rabson.org (mailgate.nlsystems.com [80.177.232.242])
	by mx1.freebsd.org (Postfix) with ESMTP id 2B10013C46A
	for <current@freebsd.org>; Sat,  7 Jul 2007 13:26:20 +0000 (UTC)
	(envelope-from dfr@rabson.org)
Received: from herring.rabson.org (herring.rabson.org [80.177.232.250])
	by itchy.rabson.org (8.13.3/8.13.3) with ESMTP id l67DQJAe039579
	for <current@freebsd.org>; Sat, 7 Jul 2007 14:26:19 +0100 (BST)
	(envelope-from dfr@rabson.org)
From: Doug Rabson <dfr@rabson.org>
To: current@freebsd.org
Date: Sat, 7 Jul 2007 14:26:17 +0100
User-Agent: KMail/1.9.6
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200707071426.18202.dfr@rabson.org>
X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY 
	autolearn=failed version=3.1.0
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on itchy.rabson.org
X-Virus-Scanned: ClamAV 0.87.1/3607/Fri Jul 6 00:51:19 2007 on itchy.rabson.org
X-Virus-Status: Clean
Cc: 
Subject: ZFS leaking vnodes (sort of)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 Jul 2007 13:26:21 -0000

I've been testing ZFS recently and I noticed some performance issues 
while doing large-scale port builds on a ZFS mounted /usr/ports tree. 
Eventually I realised that virtually nothing ever ended up on the vnode 
free list. This meant that when the system reached its maximum vnode 
limit, it had to resort to reclaiming vnodes from the various 
filesystem's active vnode lists (via vlrureclaim). Since those lists 
are not sorted in LRU order, this led to pessimal cache performance 
after the system got into that state.

I looked a bit closer at the ZFS code and poked around with DDB and I 
think the problem was caused by a couple of extraneous calls to vhold 
when creating a new ZFS vnode. On FreeBSD, getnewvnode returns a vnode 
which is already held (not on the free list) so there is no need to 
call vhold again.

This patch appears to fix the problem (only very lightly tested):

Index: zfs_vnops.c
===================================================================
RCS 
file: /home/ncvs/src/sys/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c,v
retrieving revision 1.22
diff -u -r1.22 zfs_vnops.c
--- zfs_vnops.c	28 May 2007 02:37:43 -0000	1.22
+++ zfs_vnops.c	7 Jul 2007 13:01:41 -0000
@@ -3493,7 +3493,7 @@
 		rele = 0;
 	vp->v_data = NULL;
 	ASSERT(vp->v_holdcnt > 1);
-	vdropl(vp);
+	VI_UNLOCK(vp);
 	if (!zp->z_unlinked && rele)
 		VFS_RELE(zfsvfs->z_vfs);
 	return (0);
Index: zfs_znode.c
===================================================================
RCS 
file: /home/ncvs/src/sys/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c,v
retrieving revision 1.8
diff -u -r1.8 zfs_znode.c
--- zfs_znode.c	6 May 2007 19:05:37 -0000	1.8
+++ zfs_znode.c	7 Jul 2007 13:17:32 -0000
@@ -115,7 +115,6 @@
 		ASSERT(error == 0);
 		zp->z_vnode = vp;
 		vp->v_data = (caddr_t)zp;
-		vhold(vp);
 		vp->v_vnlock->lk_flags |= LK_CANRECURSE;
 		vp->v_vnlock->lk_flags &= ~LK_NOSHARE;
 	} else {
@@ -601,7 +600,6 @@
 			ASSERT(err == 0);
 			vp = ZTOV(zp);
 			vp->v_data = (caddr_t)zp;
-			vhold(vp);
 			vp->v_vnlock->lk_flags |= LK_CANRECURSE;
 			vp->v_vnlock->lk_flags &= ~LK_NOSHARE;
 			vp->v_type = IFTOVT((mode_t)zp->z_phys->zp_mode);