From owner-freebsd-stable@FreeBSD.ORG Wed May 13 16:52:32 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5E331065677 for ; Wed, 13 May 2009 16:52:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 883448FC0A for ; Wed, 13 May 2009 16:52:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 3E69D46B51; Wed, 13 May 2009 12:52:32 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 1E48F8A026; Wed, 13 May 2009 12:52:31 -0400 (EDT) From: John Baldwin To: "Marc G. Fournier" Date: Wed, 13 May 2009 12:52:14 -0400 User-Agent: KMail/1.9.7 References: <20090513040719.D17646@hub.org> <200905131009.00403.jhb@freebsd.org> <20090513133143.M17646@hub.org> In-Reply-To: <20090513133143.M17646@hub.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905131252.15171.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 13 May 2009 12:52:31 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-stable@freebsd.org Subject: Re: More data on 7.2-RELEASE "hangs" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2009 16:52:33 -0000 On Wednesday 13 May 2009 12:34:39 pm Marc G. Fournier wrote: > On Wed, 13 May 2009, John Baldwin wrote: > > > On Wednesday 13 May 2009 3:09:33 am Marc G. Fournier wrote: > >> > >> Don't know if this helps with anything, but it just hung after 2days again > >> ... nothing on the console ... top process running at the time shows the > >> following ... anything there look "concerning"? > > > > Is this a 2 CPU system? If so, both CPUs are actually running something, so > > it is not a deadlock per se. > > > >> 99402 www 1 96 0 163M 29892K CPU1 1 0:03 0.00% httpd > >> 13635 88 34 96 0 92340K 25604K CPU0 0 0:00 0.05% mysqld > > Here is what vmstat shows ~10 minutes before (or as) it hung solid last > time. I didn't think to save the one that ran just before this one (the > script runs every 5 minutes), but for the 'r b w' columns 'b' was around > 10ish, while 'w' was 0 ... within a 5 minute period of time, 'w' > literally skyrockets: > > procs memory page disks faults > cpu > r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id > 107 266 122 16155620 23084 3255 22 1 2 3358 1605 0 0 377 17835 5231 19 7 73 > 6 285 382 16446348 22532 111705 21155 1391 10049 51966 2187328 143 0 36344 499098 423971 3 2 95 > 0 73 386 16440468 23072 7052 1155 85 44 1292 73 372 0 1030 18631 8334 18 12 70 > 0 77 388 16440468 23088 126 1050 0 6 21 27 169 0 521 4186 4125 2 3 94 > 0 66 389 16440468 23104 4 713 0 13 44 58 227 0 352 2217 3504 0 5 95 Well, you had a whole lot of page faults and other VM activity, plus 500k syscalls. The 'w' is a count of swapped processes, so basically your box is swapping a whole lot it seems. I think your box is just overloaded. -- John Baldwin