From owner-freebsd-stable@FreeBSD.ORG Fri Dec 19 00:38:56 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC7531065679 for ; Fri, 19 Dec 2008 00:38:56 +0000 (UTC) (envelope-from bsdlist@cogeco.ca) Received: from fep9.cogeco.net (smtp.cogeco.net [216.221.81.25]) by mx1.freebsd.org (Postfix) with ESMTP id 5EF118FC19 for ; Fri, 19 Dec 2008 00:38:56 +0000 (UTC) (envelope-from bsdlist@cogeco.ca) Received: from [192.168.1.126] (d150-251-98.home.cgocable.net [24.150.251.98]) by fep9.cogeco.net (Postfix) with ESMTP id 9EC0F136F; Thu, 18 Dec 2008 19:38:55 -0500 (EST) Message-ID: <494AED9E.9090900@cogeco.ca> Date: Thu, 18 Dec 2008 19:41:02 -0500 From: Paul MacKenzie User-Agent: Thunderbird 3.0a1pre (Windows/2008022014) MIME-Version: 1.0 To: Ivan Voras References: <4949673B.2070701@elehost.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: 7.1-PRERELEASE: arcmsr write performance problem X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Dec 2008 00:38:56 -0000 >> last pid: 46013; load averages: 105.30, 67.67, >> 34.45 up 4+23:59:42 19:08:40 >> 629 processes: 89 running, 540 sleeping >> CPU: 21.9% user, 0.0% nice, 74.5% system, 3.1% interrupt, 0.4% idle >> Mem: 1538M Active, 11G Inact, 898M Wired, 303M Cache, 214M Buf, 1346M Free >> Swap: 8192M Total, 1036K Used, 8191M Free >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU >> COMMAND >> 46000 www 1 65 0 86728K 15008K RUN 1 0:01 12.06% httpd >> 45994 www 1 56 0 86728K 15008K CPU1 3 0:01 10.16% httpd >> 46002 www 1 -4 0 150M 20648K RUN 3 0:00 6.98% httpd >> 45195 www 1 68 0 121M 19748K RUN 1 0:29 6.88% httpd >> 45991 www 1 53 0 150M 21060K select 3 0:01 6.59% httpd >> 45997 www 1 -4 0 150M 20992K ufs 5 0:01 6.59% httpd >> 45950 www 1 57 0 153M 23388K RUN 2 0:01 6.49% httpd >> 45999 www 1 -4 0 150M 20640K ufs 6 0:00 5.96% httpd >> 45189 www 1 66 0 161M 29660K RUN 6 0:26 5.76% httpd >> 45974 www 1 -4 0 151M 21564K ufs 3 0:01 5.76% httpd >> > > The number of httpd processes in "ufs" state is too high. Are you using > PHP? And if you do, check how large is your PHP sessions' directory. Use > a "sharded" layout if it's of any significant size (see php.ini for > details). > Sorry for the messed up posts and thanks for your suggestion. I apologize if any posts have been duplicated as there seems to be an issue with delivery from cogeco at times. Yes I was thinking the same thing and It was actually one of the first things I looked. I found in the temporary folder there are only about 50-200 there at any one time approximately and most of the time 5-15 files. PHP seems to be only one way to bring it forward. I increased the dirhash when I first started on this problem but the numbers of files in the folders were not nearly like some of the other people reporting a similar connection and I seem to be able to get the system into the state a number of ways. Do you think I should still do this even with the small number of files? vfs.ufs.dirhash_maxmem=10485760 I actually find that running Wusage 8.0 a few times even with nice-19 may be implicated in getting the system to spiral downwards. I hesitate to mention this as it seems to be working fine on another 7.X server. I believe that Wusage is tied to 6.X libraries and I wonder if somehow this may initiate the problem. I also have another sio/com based program running every few minutes which is also connected to the 6.X library (scom thermal application for temperature monitoring) and turning both of these off seems to help. I am going to try a 24 hour period without either of these two running after a fresh reboot and we will see if this is indeed one source to my abominable problem. Once the system spirals down into its locking then the io performance never seems to recover unless I reboot it or somehow find the process that is locked and kill it. I wondered if possibly my problem is related to this identified issue 104406? [ufs] Processes get stuck in "ufs" state under persistent CPU load http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat= As the processes do get stuck in the ufs mode from what I can tell I thought this was an interesting connection. When the system is in this state even a simple make depend is agonizingly slow. If so I wondered if there was any way to quickly determine via programming which process is stuck and to get it unstuck as a temporary workaround? The problem reports mentions using 'kill -STOP' and continued with 'kill -CONT', which allows other processes to access the filesystem (until another such failure occurs). I have tried a number of things but so far no luck. Thanks, Paul