From owner-freebsd-stable@FreeBSD.ORG  Mon Apr 30 09:54:05 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DB26D1065674;
	Mon, 30 Apr 2012 09:54:05 +0000 (UTC)
	(envelope-from danny@cs.huji.ac.il)
Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84])
	by mx1.freebsd.org (Postfix) with ESMTP id 4E57E8FC12;
	Mon, 30 Apr 2012 09:54:05 +0000 (UTC)
Received: from pampa.cs.huji.ac.il ([132.65.80.32])
	by kabab.cs.huji.ac.il with esmtp
	id 1SOnIu-000Dhp-Tf; Mon, 30 Apr 2012 12:54:01 +0300
X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.3
To: Rick Macklem <rmacklem@uoguelph.ca>
In-reply-to: <99719742.103996.1335732516952.JavaMail.root@erie.cs.uoguelph.ca>
References: <99719742.103996.1335732516952.JavaMail.root@erie.cs.uoguelph.ca>
Comments: In-reply-to Rick Macklem <rmacklem@uoguelph.ca>
	message dated "Sun, 29 Apr 2012 16:48:36 -0400."
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 30 Apr 2012 12:54:00 +0300
From: Daniel Braniss <danny@cs.huji.ac.il>
Message-ID: <E1SOnIu-000Dhp-Tf@kabab.cs.huji.ac.il>
Cc: killing@multiplay.co.uk, Hiroki Sato <hrs@FreeBSD.org>,
	freebsd-stable@FreeBSD.org, ob@e-Gitt.NET
Subject: Re: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 30 Apr 2012 09:54:05 -0000

> Daniel Braniss wrote:
> > > Daniel Braniss wrote:
> > > > > Daniel Braniss wrote:
> > > > > > > ----Security_Multipart(Fri_Apr_27_13_35_56_2012_748)--
> > > > > > > Content-Type: Text/Plain; charset=us-ascii
> > > > > > > Content-Transfer-Encoding: 7bit
> > > > > > >
> > > > > > > Rick Macklem <rmacklem@uoguelph.ca> wrote
> > > > > > >   in
> > > > > > >   <1527622626.3418715.1335445225510.JavaMail.root@erie.cs.uoguelph.ca>:
> > > > > > >
> > > > > > > rm> Steven Hartland wrote:
> > > > > > > rm> > ---- Original Message -----
> > > > > > > rm> > From: "Rick Macklem" <rmacklem@uoguelph.ca>
> > > > > > > rm> > > At a glance, it looks to me like 8.x is affected.
> > > > > > > Note
> > > > > > > that
> > > > > > > the
> > > > > > > rm> > > bug only affects the new NFS server (the
> > > > > > > experimental
> > > > > > > one
> > > > > > > for 8.x)
> > > > > > > rm> > > when exporting ZFS volumes. (UFS exported volumes
> > > > > > > don't
> > > > > > > leak)
> > > > > > > rm> > >
> > > > > > > rm> > > If you are running a server that might be affected,
> > > > > > > just:
> > > > > > > rm> > > # vmstat -z | fgrep -i namei
> > > > > > > rm> > > on the server and see if the 3rd number shown is
> > > > > > > increasing.
> > > > > > > rm> >
> > > > > > > rm> > Many thanks Rick wasnt aware we had anything
> > > > > > > experimental
> > > > > > > enabled
> > > > > > > rm> > but I think that would be a yes looking at these
> > > > > > > number:-
> > > > > > > rm> >
> > > > > > > rm> > vmstat -z | fgrep -i namei
> > > > > > > rm> > NAMEI: 1024, 0, 1, 1483, 25285086096, 0
> > > > > > > rm> > vmstat -z | fgrep -i namei
> > > > > > > rm> > NAMEI: 1024, 0, 0, 1484, 25285945725, 0
> > > > > > > rm> >
> > > > > > > rm> ^
> > > > > > > rm> I don't think so, since the 3rd number (USED) is 0 here.
> > > > > > > rm> If that # is increasing over time, you have the leak.
> > > > > > > You
> > > > > > > are
> > > > > > > rm> probably running the old (default in 8.x) NFS server.
> > > > > > >
> > > > > > >  Just a report, I confirmed it affected 8.x servers running
> > > > > > >  newnfs.
> > > > > > >
> > > > > > >  Actually I have been suffered from memory starvation
> > > > > > >  symptom on
> > > > > > >  that
> > > > > > >  server (24GB RAM) for a long time and watching vmstat -z
> > > > > > >  periodically. It stopped working once a week. I
> > > > > > >  investigated
> > > > > > >  the
> > > > > > >  vmstat log again and found the amount of NAMEI leak was
> > > > > > >  11,543,956
> > > > > > >  (about 11GB!) just before the locked-up. After applying the
> > > > > > >  patch,
> > > > > > >  the leak disappeared. Thank you for fixing it!
> > > > > > >
> > > > > > > -- Hiroki
> > > > > And thanks Hiroki for testing it on 8.x.
> > > > >
> > > > > > this is on 8.2-STABLE/amd64 from around August:
> > > > > > same here, this zfs+newnfs has been hanging every few months,
> > > > > > and
> > > > > > I
> > > > > > can see
> > > > > > now the leak, it's slowly increasing:
> > > > > > NAMEI: 1024, 0, 122975, 529, 15417248, 0
> > > > > > NAMEI: 1024, 0, 122984, 520, 15421772, 0
> > > > > > NAMEI: 1024, 0, 123002, 502, 15424743, 0
> > > > > > NAMEI: 1024, 0, 123008, 496, 15425464, 0
> > > > > >
> > > > > > cheers,
> > > > > > danny
> > > > > Maybe you could try the patch, too.
> > > > >
> > > > > It's at:
> > > > >    http://people.freebsd.org/~rmacklem/namei-leak.patch
> > > > >
> > > > > I'll commit it to head soon with a 1 month MFC, so that
> > > > > hopefully
> > > > > Oliver will have a chance to try it on his production server
> > > > > before
> > > > > the MFC.
> > > > >
> > > > > Thanks everyone, for your help with this, rick
> > > >
> > > > I haven't applied the patch yet, but in the meanime I have been
> > > > running some
> > > > experiments on a zfs/nfs server running 8.3-STABLE, and don't see
> > > > any
> > > > leaks
> > > > what triggers the leak?
> > > >
> > > Fortunately Oliver isolated this. It should leak when you do a
> > > successful
> > > "rm" or "rmdir" while running the new/experimental server.
> > >
> > but that's what I did, I'm running the new/experimental nfs server
> > (or so I think :-), and did a huge rm -rf and nothing, nada, no leak.
> > To check the patch, I have to upgrade the production server, the one
> > with the
> > leak,
> > but I wanted to test it on a non production first. Anyways, ill patch
> > the
> > kernel
> > and try it on the leaking production server tomorrow.
> > 
> Well, I think the patch should be harmless.
> 
> You can check which server you are running by doing:
> # nfsstat -e -s
> - and see if the numbers are increasing
>   if they're zero or not increasing, you are running the old (default on 8.x)
>     server
was running the wrong nfsd, now all is ok, and the patch works (obviously :-)
BTW, if the if the experimental is not running then
# nfsstat -e -s
nfsstat: experimental client/server not loaded
#

danny