From owner-freebsd-stable@FreeBSD.ORG Fri Jul 15 15:34:25 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9F2BF16A41C for ; Fri, 15 Jul 2005 15:34:25 +0000 (GMT) (envelope-from scrappy@hub.org) Received: from hub.org (hub.org [200.46.204.220]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3063B43D45 for ; Fri, 15 Jul 2005 15:34:25 +0000 (GMT) (envelope-from scrappy@hub.org) Received: from localhost (unknown [200.46.204.144]) by hub.org (Postfix) with ESMTP id 4E78EA24656 for ; Fri, 15 Jul 2005 12:34:24 -0300 (ADT) Received: from hub.org ([200.46.204.220]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 74267-01 for ; Fri, 15 Jul 2005 15:34:24 +0000 (GMT) Received: from ganymede.hub.org (blk-224-176-51.eastlink.ca [24.224.176.51]) by hub.org (Postfix) with ESMTP id D4DA0A24653 for ; Fri, 15 Jul 2005 12:34:23 -0300 (ADT) Received: by ganymede.hub.org (Postfix, from userid 1000) id 04450467CB; Fri, 15 Jul 2005 12:34:23 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id F2EA3467BA for ; Fri, 15 Jul 2005 12:34:22 -0300 (ADT) Date: Fri, 15 Jul 2005 12:34:22 -0300 (ADT) From: "Marc G. Fournier" To: freebsd-stable@freebsd.org Message-ID: <20050715120008.H66818@ganymede.hub.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: by amavisd-new at hub.org Subject: 4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jul 2005 15:34:25 -0000 Recently, I started having problems with one of my newest servers ... figuring that it might have somethign to do with the fact that I went SATA for this one (all others are SCSI), I figured it might be a driver issue causing the problems, since everything else is the same as the other 5 servers on our network ... Today, I'm starting to wonder if I've been just looking at the "most obvious" cause, instead of looking deeper ... The problem that manifests itself is similar to the old 'ran out of vnode' issue I used to experience under 4.x ... the server would still run, be totally pingable, and you could even get the motd when you tried to ssh in, but you couldn't get a prompt, and all processes were hung ... I just upgraded the kernel on this machine (mercury) on the 13th of July, and its been running 1 day, 12 hrs now ... there is hardly anything running on this machine (10 jails), and vnode usage is: debug.numvnodes: 336460 - debug.freevnodes: 5275 - debug.vnlru_nowhere: 0 - vlruwt One of my older servers (neptune), running kernels from Feb 13th of this year, and with 81 jails running on it, is using up *significantly less* vnodes (uptime: 1 day, 10 hours): debug.numvnodes: 279710 - debug.freevnodes: 91442 - debug.vnlru_nowhere: 0 - vlruwt Now, compared to neptune, mercury isn't running anything special ... several apache 1 processes, postfix, cyrus-imapd and that's it ... neptune on the other hand, is running the full gambit ... aolserver, java, apache 1 and 2, postfix, etc ... So, I'm starting to think that the problem isn't "hardware related", but the kernel itself ... the latest 4.11-STABLE kernel seems to have brought in new vnode leakage, or ... vnlru isn't working as it should be to free up vnodes ... Looking at that process on mercury: # ps aux | grep vnlru root 7 0.0 0.0 0 0 ?? DL Wed11PM 0:00.65 (vnlru) whereas on neptune: # ps aux | grep vnlru root 9 0.0 0.0 0 0 ?? DL Thu01AM 0:00.79 (vnlru) so about the same about of CPU time being expended ... a bit more on the more loaded server, but not a major amount ... I'd like to try and debug this, but don't know where to start ... I realize that 4.x isn't being pushed anymore, but there are alot of us that haven't moved to 5.x yet (am working on that for our next server, but its going to take me several months before I can convert all our existing servers up) ... I do have a serial console on this server, if that helps to debug things ... I've heard that there was some work done on 5.x to clean up some of the vnode leaks ... not sure if that is fact or just rumor ... but, if so, would any of them be MFCable to 4.x? Thanks ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664