Date: Wed, 1 Sep 2004 23:07:46 -0300 (ADT) From: "Marc G. Fournier" <scrappy@hub.org> To: Allan Fields <bsd@afields.ca> Cc: freebsd-current@freebsd.org Subject: Re: vnode leak in FFS code ... ? Message-ID: <20040901224632.O72978@ganymede.hub.org> In-Reply-To: <20040902013534.GD9327@afields.ca> References: <20040901151405.G47186@ganymede.hub.org> <20040901200257.GA92717@afields.ca><41365746.2030605@samsco.org> <20040902013534.GD9327@afields.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 1 Sep 2004, Allan Fields wrote: >> It's really hard to tell if there is a vnode leak here. The vnode pool >> is fairly fluid and has nothing to do with the number of files that are >> actually 'open'. Vnodes get created when the VFS layer wants to access >> an object that isn't already in the cache, and only get destroyed when >> the object is destroyed. A vnode that reprents a file that was opened >> will stay 'active' in the system long after the file has been closed, >> because it's cheaper to keep it active in the cache than it is to >> discard it and then risk having to go through the pain of a namei() >> and VOP_LOOKUP() again later. Only if the maxvnode limit is hit will >> old vnodes start getting recycled to represent other objects. [...] >> >> So you've obviously bumped up kern.maxvnodes well above the limits that >> are normally generated from the auto-tuner. Why did you do that, if not >> because you knew that you'd have a large working set of referenced (but >> maybe not open all at once) filesystem objects? [...] > > There was a pevious thread I've found which also helps explains > this further: > http://lists.freebsd.org/pipermail/freebsd-stable/2003-May/001266.html > > Really the same issue now as then? I'm not getting the hangs now, it is freeing up vnodes ... but its having to work very hard to do so, or so it seems: venus# ps aux | grep vnlru root 7 3.0 0.0 0 0 ?? DL 5Aug04 606:34.54 (vnlru) I started up the script for monitoring this on Aug 29th ... since then, there have been 4331 entries to the log file, of which 1927 are in 'vlrup', which I believe is vnlru running through its lists trying to find some to free up, if I recall the code ... ? venus# grep vnode /var/log/syswatch | wc -l 4331 venus# grep vnode /var/log/syswatch | grep vlrup | wc -l 1927 and this is based on a check every minute ... The other server, running ~19 more VMs (~100 more processes), only up 2 days now, seems to be fairing better: debug.numvnodes: 344062 - debug.freevnodes: 168285 - debug.vnlru_nowhere: 0 - vlruwt I've schedualed 'maintenance' on that server for Saturday ... am going to shut down all 'non-host server' processes, and unmount the large file system (where all the VMs run off of) ... see if that cleans up any of the vnodes without having to do a reboot ... If that doesn't work, I could cause a panic and have it dump core, if that would provide for easier/better debugging ... ? I have limited flexibility with the server, but it is a 'real' server without a fake load on it, and as solid as I've always considered FreeBSD to be, I seem to have a knack for pushing it and breaking it :( ... so whatever data I can provide to make it that much more solid, even if it involves a little bit of downtime to get a good core dump, I'm willing to do ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040901224632.O72978>