From owner-freebsd-fs@FreeBSD.ORG  Mon Mar 22 15:10:03 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BED3B106564A
	for <freebsd-fs@hub.freebsd.org>; Mon, 22 Mar 2010 15:10:03 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 92C148FC0C
	for <freebsd-fs@hub.freebsd.org>; Mon, 22 Mar 2010 15:10:03 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2MFA3Rw024923
	for <freebsd-fs@freefall.freebsd.org>; Mon, 22 Mar 2010 15:10:03 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2MFA3Ft024915;
	Mon, 22 Mar 2010 15:10:03 GMT (envelope-from gnats)
Date: Mon, 22 Mar 2010 15:10:03 GMT
Message-Id: <201003221510.o2MFA3Ft024915@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
From: Daniel Braniss <danny@cs.huji.ac.il>
Cc: 
Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Daniel Braniss <danny@cs.huji.ac.il>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Mar 2010 15:10:03 -0000

The following reply was made to PR kern/144330; it has been noted by GNATS.

From: Daniel Braniss <danny@cs.huji.ac.il>
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: Mikolaj Golub <to.my.trociny@gmail.com>,
    Jeremy Chadwick <freebsd@jdc.parodius.com>, freebsd-fs@FreeBSD.org,
    Kai Kockro <kkockro@web.de>, bug-followup@FreeBSD.org,
    gerrit@pmp.uni-hannover.de
Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs 
Date: Mon, 22 Mar 2010 17:04:40 +0200

 > 
 > 
 > On Mon, 22 Mar 2010, Daniel Braniss wrote:
 > 
 > >
 > > well, it's much better!, but no cookies yet :-)
 > >
 > 
 > Well, that's good news. I'll try and get dfr to review it and then
 > commit it. Thanks Mikolaj, for finding this.
 > 
 > > from comparing graphs in
 > > 	ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/
 > > store-01-e.ps: a production server running newfsd - now up almost 20 days
 > > 	notice that the average used mbuf is below 1000!
 > >
 > > store-02.ps: kernel without last patch, classic nfsd
 > > 	the leak is huge.
 > >
 > > store-02++.ps: with latest patch
 > > 	the leak is much smaller but I see 2 issues:
 > > 		- the initial leap to over 2000, then a smaller leak.
 > 
 > The initial leap doesn't worry me. That's just a design constraint.
 yes, but new-nsfd does it better.
 
 > A slow leak after that is still a problem. (I might have seen the
 > slow leak in testing here. I'll poke at it and see if I can reproduce
 > that.)
 
 all I do is mount upd on a client and start a write process.
 
 > 
 > >
 > > could someone explain replay_prune() to me?
 > >
 > I just looked at it and I think it does the following:
 >  	- when it thinks the cache is too big (either too many entries
 >            or too much mbuf data) it loops around until:
 >  		- no longer too much or can't free any more
 >                  (when an entry is free'd, rc_size and rc_count are
 >                   reduced)
 >            (the loop is from the end of the tailq, so it is freeing
 >             the least recently used entries)
 >  	- the test for rce_repmsg.rm_xid != 0 avoids freeing ones
 >            that are in progress, since rce_repmsg is all zeroed until
 >            the reply has been generated
 
 thanks for the information, it's what i thought, but the coding made it look 
 as something
 else could happen - why else start the search of the queue after each match?> 
 
 > I did notice that the call to replay_prune() from replay_setsize() does 
 > not lock the mutex before calling it, so it doesn't look smp safe to me 
 > for this case, but I doubt that would cause a slow leak. (I think this is
 > only called when the number of mbuf clusters in the kernel changes and
 > might cause a kernel crash if the tailq wasn't in a consistent state as
 > it rattled through the list in the loop.)
 > 
 there seems to be an NFSLOCK involved before calling replay_setsize ...
 
 well, the server is a 2 cpu quad nehalem, so maybe I should try several 
 clients ...
 
 > rick
 > 
 btw, the new-nfsd has been running on a production server for almost 20 days
 and all seeems fine.
 
 anyways, things are looking better,
 cheers,
 	danny