From owner-freebsd-stable@FreeBSD.ORG Sun Jul 17 15:10:37 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A012C16A41C for ; Sun, 17 Jul 2005 15:10:37 +0000 (GMT) (envelope-from scrappy@hub.org) Received: from hub.org (hub.org [200.46.204.220]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2E4D343D45 for ; Sun, 17 Jul 2005 15:10:37 +0000 (GMT) (envelope-from scrappy@hub.org) Received: from localhost (unknown [200.46.204.144]) by hub.org (Postfix) with ESMTP id 485E2A2469C for ; Sun, 17 Jul 2005 12:10:36 -0300 (ADT) Received: from hub.org ([200.46.204.220]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 08897-02 for ; Sun, 17 Jul 2005 15:10:36 +0000 (GMT) Received: from ganymede.hub.org (blk-224-176-51.eastlink.ca [24.224.176.51]) by hub.org (Postfix) with ESMTP id 31015A2465A for ; Sun, 17 Jul 2005 12:10:35 -0300 (ADT) Received: by ganymede.hub.org (Postfix, from userid 1000) id B1A8A47705; Sun, 17 Jul 2005 12:10:33 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id B0BDA47703 for ; Sun, 17 Jul 2005 12:10:33 -0300 (ADT) Date: Sun, 17 Jul 2005 12:10:33 -0300 (ADT) From: "Marc G. Fournier" To: freebsd-stable@freebsd.org In-Reply-To: <20050715120008.H66818@ganymede.hub.org> Message-ID: <20050717120926.R66818@ganymede.hub.org> References: <20050715120008.H66818@ganymede.hub.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: by amavisd-new at hub.org Subject: vnode leak in NFS (Was: Re: 4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ?) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Jul 2005 15:10:37 -0000 Wow, now this was unexpected ... figured I try a quick theory this morning ... debug.freevnodes was down to: debug.freevnodes: 103987 umount /nfs and ... # sysctl debug.freevnodes debug.freevnodes: 332106 The vnode leak isn't in the unionfs code ... its in the nfs code :( On Fri, 15 Jul 2005, Marc G. Fournier wrote: > > Recently, I started having problems with one of my newest servers ... > figuring that it might have somethign to do with the fact that I went SATA > for this one (all others are SCSI), I figured it might be a driver issue > causing the problems, since everything else is the same as the other 5 > servers on our network ... > > Today, I'm starting to wonder if I've been just looking at the "most obvious" > cause, instead of looking deeper ... > > The problem that manifests itself is similar to the old 'ran out of vnode' > issue I used to experience under 4.x ... the server would still run, be > totally pingable, and you could even get the motd when you tried to ssh in, > but you couldn't get a prompt, and all processes were hung ... > > I just upgraded the kernel on this machine (mercury) on the 13th of July, and > its been running 1 day, 12 hrs now ... there is hardly anything running on > this machine (10 jails), and vnode usage is: > > debug.numvnodes: 336460 - debug.freevnodes: 5275 - debug.vnlru_nowhere: 0 - > vlruwt > > One of my older servers (neptune), running kernels from Feb 13th of this > year, and with 81 jails running on it, is using up *significantly less* > vnodes (uptime: 1 day, 10 hours): > > debug.numvnodes: 279710 - debug.freevnodes: 91442 - debug.vnlru_nowhere: 0 - > vlruwt > > Now, compared to neptune, mercury isn't running anything special ... several > apache 1 processes, postfix, cyrus-imapd and that's it ... neptune on the > other hand, is running the full gambit ... aolserver, java, apache 1 and 2, > postfix, etc ... > > So, I'm starting to think that the problem isn't "hardware related", but the > kernel itself ... the latest 4.11-STABLE kernel seems to have brought in new > vnode leakage, or ... vnlru isn't working as it should be to free up vnodes > ... > > Looking at that process on mercury: > > # ps aux | grep vnlru > root 7 0.0 0.0 0 0 ?? DL Wed11PM 0:00.65 (vnlru) > > whereas on neptune: > > # ps aux | grep vnlru > root 9 0.0 0.0 0 0 ?? DL Thu01AM 0:00.79 (vnlru) > > so about the same about of CPU time being expended ... a bit more on the more > loaded server, but not a major amount ... > > I'd like to try and debug this, but don't know where to start ... I realize > that 4.x isn't being pushed anymore, but there are alot of us that haven't > moved to 5.x yet (am working on that for our next server, but its going to > take me several months before I can convert all our existing servers up) ... > > I do have a serial console on this server, if that helps to debug things ... > > I've heard that there was some work done on 5.x to clean up some of the vnode > leaks ... not sure if that is fact or just rumor ... but, if so, would any of > them be MFCable to 4.x? > > Thanks ... > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664