From owner-freebsd-current@FreeBSD.ORG Sun Jan 2 14:51:09 2011 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3CBF106566B; Sun, 2 Jan 2011 14:51:08 +0000 (UTC) (envelope-from beat@chruetertee.ch) Received: from marvin.chruetertee.ch (marvin.chruetertee.ch [217.150.245.55]) by mx1.freebsd.org (Postfix) with ESMTP id 742E88FC0C; Sun, 2 Jan 2011 14:51:08 +0000 (UTC) Received: from daedalus.network.local (231-225.77-83.cust.bluewin.ch [83.77.225.231]) (authenticated bits=0) by marvin.chruetertee.ch (8.14.3/8.14.3) with ESMTP id p02Ep64K062193 (version=TLSv1/SSLv3 cipher=DHE-DSS-CAMELLIA256-SHA bits=256 verify=NO); Sun, 2 Jan 2011 14:51:06 GMT (envelope-from beat@chruetertee.ch) Message-ID: <4D2090DE.1080706@chruetertee.ch> Date: Sun, 02 Jan 2011 15:51:10 +0100 From: =?ISO-8859-1?Q?Beat_G=E4tzi?= User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.1.16) Gecko/20101210 Lightning/1.0b1 Thunderbird/3.0.11 MIME-Version: 1.0 To: Julian Elischer References: <4D1F1AE8.5040704@chruetertee.ch> <20110101151008.GA7762@freebsd.org> <4D1F4A48.6080604@chruetertee.ch> <20110101154537.GW90883@deviant.kiev.zoral.com.ua> <4D1F4FB8.3030303@chruetertee.ch> <20110101161254.GX90883@deviant.kiev.zoral.com.ua> <4D1F5992.8030309@chruetertee.ch> <20110101164649.GY90883@deviant.kiev.zoral.com.ua> <4D1F5D5E.8030602@chruetertee.ch> <20110101172635.GZ90883@deviant.kiev.zoral.com.ua> <4D1FC888.8060906@freebsd.org> In-Reply-To: <4D1FC888.8060906@freebsd.org> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Kostik Belousov , Alexander Best , current@freebsd.org Subject: Re: Suddenly slow lstat syscalls on CURRENT from Juli X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Jan 2011 14:51:09 -0000 On 02.01.2011 01:36, Julian Elischer wrote: > On 1/1/11 9:26 AM, Kostik Belousov wrote: >> On Sat, Jan 01, 2011 at 05:59:10PM +0100, Beat G?tzi wrote: >>> On 01.01.2011 17:46, Kostik Belousov wrote: >>>> On Sat, Jan 01, 2011 at 05:42:58PM +0100, Beat G?tzi wrote: >>>>> On 01.01.2011 17:12, Kostik Belousov wrote: >>>>>> On Sat, Jan 01, 2011 at 05:00:56PM +0100, Beat G?tzi wrote: >>>>>>> On 01.01.2011 16:45, Kostik Belousov wrote: >>>>>>>> Check the output of sysctl kern.maxvnodes and vfs.numvnodes. I >>>>>>>> suspect >>>>>>>> they are quite close or equial. If yes, consider increasing >>>>>>>> maxvnodes. >>>>>>>> Another workaround, if you have huge nested directories >>>>>>>> hierarhy, is >>>>>>>> to set vfs.vlru_allow_cache_src to 1. >>>>>>> Thanks for the hint. kern.maxvnodes and vfs.numvnodes were equal: >>>>>>> # sysctl kern.maxvnodes vfs.numvnodes >>>>>>> kern.maxvnodes: 100000 >>>>>>> vfs.numvnodes: 100765 >>>>>>> >>>>>>> I've increased kern.maxvnodes and the problem was gone until >>>>>>> vfs.numvnodes reached the value of kern.maxvnodes again: >>>>>>> # sysctl kern.maxvnodes vfs.numvnodes >>>>>>> kern.maxvnodes: 150000 >>>>>>> vfs.numvnodes: 150109 >>>>>> The processes should be stuck in "vlruwk" state, that can be >>>>>> checked with ps or '^T' on the terminal. >>>>> Yes, there are various processes in "vlruwk" state, >>>>> >>>>>>> As the directory structure is quite huge on this server I've set >>>>>>> vfs.vlru_allow_cache_src to one now. >>>>>> Did it helped ? >>>>> No, it doesn't looks like setting vfs.vlru_allow_cache_src helped. The >>>>> problem was gone when I increased kern.maxvnodes until vfs.numvnodes >>>>> reached that level. I've stopped all running deamons but numvnodes >>>>> doesn't decrease. >>>> Stopping the daemons would not decrease the count of cached vnodes. >>>> What you can do is to call unmount on the filesystems. Supposedly, the >>>> filesystems are busy and unmount shall fail, but it will force freed >>>> the vnodes that are unused by any process. >>> That freed around 1500 vnodes. At the moment the vfs.numvnodes doesn't >>> increase rapidly and the server is usable. I will keep an eye it to see >>> if I run into the same problem again. >> This is too small amount of vnodes to be freed for the typical system, >> and it feels like a real vnode leak. It would be helpful if you tried >> to identify the load that causes the situation to occur. >> >> You are on the UFS, right ? > try running sockstat to a file and looking to see what is open.. > it could just be a normal leak. The sockstat output looks normal at least to me: http://tmp.chruetertee.ch/tinderbox-sockstat Thanks, Beat