From owner-freebsd-stable@FreeBSD.ORG Wed Sep 1 21:53:47 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C58B416A4CE for ; Wed, 1 Sep 2004 21:53:47 +0000 (GMT) Received: from ganymede.hub.org (blk-222-46-91.eastlink.ca [24.222.46.91]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5D79543D31 for ; Wed, 1 Sep 2004 21:53:47 +0000 (GMT) (envelope-from scrappy@hub.org) Received: by ganymede.hub.org (Postfix, from userid 1000) id BFBB136F64; Wed, 1 Sep 2004 18:53:47 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id B26513676D; Wed, 1 Sep 2004 18:53:47 -0300 (ADT) Date: Wed, 1 Sep 2004 18:53:47 -0300 (ADT) From: "Marc G. Fournier" To: Allan Fields In-Reply-To: <20040901214006.GD34157@afields.ca> Message-ID: <20040901184826.M47186@ganymede.hub.org> References: <20040831205907.O31538@ganymede.hub.org> <20040901214006.GD34157@afields.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed cc: freebsd-stable@freebsd.org Subject: Re: vnodes - is there a leak? where are they going? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Sep 2004 21:53:48 -0000 On Wed, 1 Sep 2004, Allan Fields wrote: > On Tue, Aug 31, 2004 at 09:21:09PM -0300, Marc G. Fournier wrote: >> >> I have two servers, both running 4.10 of within a few days (Aug 5 for >> venus, Aug 7 for neptune) ... both running jail environments ... one with >> ~60 running, the other with ~80 ... the one with 60 has been running for >> ~25 days now, and is at the border of running out of vnodes: >> >> Aug 31 20:58:00 venus root: debug.numvnodes: 519920 - debug.freevnodes: >> 11058 - debug.vnlru_nowhere: 256463 - vlrup >> Aug 31 20:59:01 venus root: debug.numvnodes: 519920 - debug.freevnodes: >> 13155 - debug.vnlru_nowhere: 256482 - vlrup >> Aug 31 21:00:03 venus root: debug.numvnodes: 519920 - debug.freevnodes: >> 13092 - debug.vnlru_nowhere: 256482 - vlruwt >> >> [..] >> >> I've tried shutting down all of the VMs on venus, and umount'd all of the >> unionfs mounts, as well as the one nfs mount we have ... the above #s are >> after the VMs (and mounts are recreated ... >> >> Now, my understanding of the vnodes is that for every file opened, a vnode >> is created ... in my case, since I'm using unionfs, there are two vnodes >> per file ... if it possible that there are 'stale' vnodes that aren't >> being freed up? Is there some way of 'viewing' the vnode structure? >> >> For instance, fstat shows: >> >> venus# fstat | wc -l >> 19531 > > You can also try pstat -f|more from the user side. Even less: venus# fstat | wc -l; pstat -f | wc -l 20930 6555 > You might want to setup for remote kernel debugging and peek around the > system / further examine vnode structures. (If you have physical access > to two machines you can setup a null modem cable.) Unfortunately, I'm working with a remote server here, so am quite limited right now in what I can do ... anything I can, I will though ... >> So, where else are the vnodes going? Is there a 'leak'? What can I look >> at to try and narrow this down / provide more information? > > If the use count isn't decremented (to zero) vnodes wont > be placed on the freelist. Perhaps something isn't > calling vrele() where it should in unionfs? You should check the > reference counts: v_usecount and v_holdcnt on some of the suspect > vnodes. How do I do that? I'm at the limit of my current knowledge right now ... willing to do the foot work, just don't know the directions to take from here :( > Any specific things you might suspect as possible cause? Nothing specific, no ... > Any messages preceeding the ones you listed above? The above is a script that I put together over a year ago to generate some simple reports that I could look at after a crash ... >> Even some way of determining a specific process that is sucking back alot >> of them, to move that to a different machine ... ? > > While this only works for open file entries you can get a top 10 > by using: > > fstat|perl -ane ' > $sum{$F[1]}++; > END{print "$_: $sum{$_}\n" for sort {$sum{$b}<=>$sum{$a}} keys %sum} > '|head -10 sh /tmp/t httpd: 7416 master: 6618 syslogd: 1117 qmgr: 780 pickup: 779 smtpd: 609 sshd: 503 cron: 495 perl: 279 trivial-rewrite: 274 but, again, those are known/open files ... fstat | wc -l only accounts for ~20k or so of that list :( ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664