From owner-freebsd-current@FreeBSD.ORG Sat Sep 4 03:49:06 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CF8AA16A4CE; Sat, 4 Sep 2004 03:49:06 +0000 (GMT) Received: from ganymede.hub.org (blk-222-46-91.eastlink.ca [24.222.46.91]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5AA2443D2D; Sat, 4 Sep 2004 03:49:06 +0000 (GMT) (envelope-from scrappy@hub.org) Received: by ganymede.hub.org (Postfix, from userid 1000) id 9F5A83AF92; Sat, 4 Sep 2004 00:49:05 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id 998F93AF8C; Sat, 4 Sep 2004 00:49:05 -0300 (ADT) Date: Sat, 4 Sep 2004 00:49:05 -0300 (ADT) From: "Marc G. Fournier" To: Allan Fields In-Reply-To: <20040901224632.O72978@ganymede.hub.org> Message-ID: <20040904004706.O812@ganymede.hub.org> References: <20040901151405.G47186@ganymede.hub.org> <20040901200257.GA92717@afields.ca><41365746.2030605@samsco.org> <20040902013534.GD9327@afields.ca> <20040901224632.O72978@ganymede.hub.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed cc: freebsd-current@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: vnode leak in FFS code ... ? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Sep 2004 03:49:07 -0000 Just as a followup to this ... the server crashed on Thursday night around 22:00ADT, only just came back up after a very long fsck ... with all 62 VMs started up, and 1008 processes running, vnodes currently look like: Sep 4 00:43:00 venus root: debug.numvnodes: 58370 - debug.freevnodes: 824 - debug.vnlru_nowhere: 0 - vlruwt Sep 4 00:44:00 venus root: debug.numvnodes: 58370 - debug.freevnodes: 782 - debug.vnlru_nowhere: 0 - vlruwt Sep 4 00:45:01 venus root: debug.numvnodes: 58370 - debug.freevnodes: 977 - debug.vnlru_nowhere: 0 - vlruwt Sep 4 00:46:00 venus root: debug.numvnodes: 58370 - debug.freevnodes: 582 - debug.vnlru_nowhere: 0 - vlruwt venus# ps aux | wc -l 1008 venus# about a tenth of what they were when the server crashed :( On Wed, 1 Sep 2004, Marc G. Fournier wrote: > On Wed, 1 Sep 2004, Allan Fields wrote: > >>> It's really hard to tell if there is a vnode leak here. The vnode pool >>> is fairly fluid and has nothing to do with the number of files that are >>> actually 'open'. Vnodes get created when the VFS layer wants to access >>> an object that isn't already in the cache, and only get destroyed when >>> the object is destroyed. A vnode that reprents a file that was opened >>> will stay 'active' in the system long after the file has been closed, >>> because it's cheaper to keep it active in the cache than it is to >>> discard it and then risk having to go through the pain of a namei() >>> and VOP_LOOKUP() again later. Only if the maxvnode limit is hit will >>> old vnodes start getting recycled to represent other objects. [...] >>> >>> So you've obviously bumped up kern.maxvnodes well above the limits that >>> are normally generated from the auto-tuner. Why did you do that, if not >>> because you knew that you'd have a large working set of referenced (but >>> maybe not open all at once) filesystem objects? [...] >> >> There was a pevious thread I've found which also helps explains >> this further: >> http://lists.freebsd.org/pipermail/freebsd-stable/2003-May/001266.html >> >> Really the same issue now as then? > > I'm not getting the hangs now, it is freeing up vnodes ... but its having to > work very hard to do so, or so it seems: > > venus# ps aux | grep vnlru > root 7 3.0 0.0 0 0 ?? DL 5Aug04 606:34.54 (vnlru) > > I started up the script for monitoring this on Aug 29th ... since then, there > have been 4331 entries to the log file, of which 1927 are in 'vlrup', which > I believe is vnlru running through its lists trying to find some to free up, > if I recall the code ... ? > > venus# grep vnode /var/log/syswatch | wc -l > 4331 > venus# grep vnode /var/log/syswatch | grep vlrup | wc -l > 1927 > > and this is based on a check every minute ... > > The other server, running ~19 more VMs (~100 more processes), only up 2 days > now, seems to be fairing better: > > debug.numvnodes: 344062 - debug.freevnodes: 168285 - debug.vnlru_nowhere: 0 - > vlruwt > > I've schedualed 'maintenance' on that server for Saturday ... am going to > shut down all 'non-host server' processes, and unmount the large file system > (where all the VMs run off of) ... see if that cleans up any of the vnodes > without having to do a reboot ... > > If that doesn't work, I could cause a panic and have it dump core, if that > would provide for easier/better debugging ... ? > > I have limited flexibility with the server, but it is a 'real' server without > a fake load on it, and as solid as I've always considered FreeBSD to be, I > seem to have a knack for pushing it and breaking it :( ... so whatever data I > can provide to make it that much more solid, even if it involves a little bit > of downtime to get a good core dump, I'm willing to do ... > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664 > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664