From owner-freebsd-stable@FreeBSD.ORG  Wed Sep  1 21:53:47 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C58B416A4CE
	for <freebsd-stable@freebsd.org>;
	Wed,  1 Sep 2004 21:53:47 +0000 (GMT)
Received: from ganymede.hub.org (blk-222-46-91.eastlink.ca [24.222.46.91])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5D79543D31
	for <freebsd-stable@freebsd.org>;
	Wed,  1 Sep 2004 21:53:47 +0000 (GMT)	(envelope-from scrappy@hub.org)
Received: by ganymede.hub.org (Postfix, from userid 1000)
	id BFBB136F64; Wed,  1 Sep 2004 18:53:47 -0300 (ADT)
Received: from localhost (localhost [127.0.0.1])
	by ganymede.hub.org (Postfix) with ESMTP id B26513676D;
	Wed,  1 Sep 2004 18:53:47 -0300 (ADT)
Date: Wed, 1 Sep 2004 18:53:47 -0300 (ADT)
From: "Marc G. Fournier" <scrappy@hub.org>
To: Allan Fields <bsd@afields.ca>
In-Reply-To: <20040901214006.GD34157@afields.ca>
Message-ID: <20040901184826.M47186@ganymede.hub.org>
References: <20040831205907.O31538@ganymede.hub.org>
	<20040901214006.GD34157@afields.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
cc: freebsd-stable@freebsd.org
Subject: Re: vnodes - is there a leak?  where are they going?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Sep 2004 21:53:48 -0000

On Wed, 1 Sep 2004, Allan Fields wrote:

> On Tue, Aug 31, 2004 at 09:21:09PM -0300, Marc G. Fournier wrote:
>>
>> I have two servers, both running 4.10 of within a few days (Aug 5 for
>> venus, Aug 7 for neptune) ... both running jail environments ... one with
>> ~60 running, the other with ~80 ... the one with 60 has been running for
>> ~25 days now, and is at the border of running out of vnodes:
>>
>> Aug 31 20:58:00 venus root: debug.numvnodes: 519920 - debug.freevnodes:
>> 11058 - debug.vnlru_nowhere: 256463 - vlrup
>> Aug 31 20:59:01 venus root: debug.numvnodes: 519920 - debug.freevnodes:
>> 13155 - debug.vnlru_nowhere: 256482 - vlrup
>> Aug 31 21:00:03 venus root: debug.numvnodes: 519920 - debug.freevnodes:
>> 13092 - debug.vnlru_nowhere: 256482 - vlruwt
>>
>> [..]
>>
>> I've tried shutting down all of the VMs on venus, and umount'd all of the
>> unionfs mounts, as well as the one nfs mount we have ... the above #s are
>> after the VMs (and mounts are recreated ...
>>
>> Now, my understanding of the vnodes is that for every file opened, a vnode
>> is created ... in my case, since I'm using unionfs, there are two vnodes
>> per file ... if it possible that there are 'stale' vnodes that aren't
>> being freed up?  Is there some way of 'viewing' the vnode structure?
>>
>> For instance, fstat shows:
>>
>> venus# fstat | wc -l
>>    19531
>
> You can also try pstat -f|more from the user side.

Even less:

venus# fstat | wc -l; pstat -f | wc -l
    20930
     6555

> You might want to setup for remote kernel debugging and peek around the 
> system / further examine vnode structures.  (If you have physical access 
> to two machines you can setup a null modem cable.)

Unfortunately, I'm working with a remote server here, so am quite limited 
right now in what I can do ... anything I can, I will though ...

>> So, where else are the vnodes going?  Is there a 'leak'?  What can I look
>> at to try and narrow this down / provide more information?
>
> If the use count isn't decremented (to zero) vnodes wont
> be placed on the freelist.  Perhaps something isn't
> calling vrele() where it should in unionfs?  You should check the
> reference counts: v_usecount and v_holdcnt on some of the suspect
> vnodes.

How do I do that?  I'm at the limit of my current knowledge right now ... 
willing to do the foot work, just don't know the directions to take from 
here :(

> Any specific things you might suspect as possible cause?

Nothing specific, no ...

> Any messages preceeding the ones you listed above?

The above is a script that I put together over a year ago to generate some 
simple reports that I could look at after a crash ...

>> Even some way of determining a specific process that is sucking back alot
>> of them, to move that to a different machine ... ?
>
> While this only works for open file entries you can get a top 10
> by using:
>
> fstat|perl -ane '
>  $sum{$F[1]}++;
>  END{print "$_: $sum{$_}\n" for sort {$sum{$b}<=>$sum{$a}} keys %sum}
> '|head -10

sh /tmp/t
httpd: 7416
master: 6618
syslogd: 1117
qmgr: 780
pickup: 779
smtpd: 609
sshd: 503
cron: 495
perl: 279
trivial-rewrite: 274

but, again, those are known/open files ... fstat | wc -l only accounts for 
~20k or so of that list :(


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664