From owner-freebsd-stable@FreeBSD.ORG Wed May 13 17:44:56 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5804E106564A; Wed, 13 May 2009 17:44:56 +0000 (UTC) (envelope-from scrappy@hub.org) Received: from hub.org (hub.org [200.46.204.220]) by mx1.freebsd.org (Postfix) with ESMTP id 08EA58FC0A; Wed, 13 May 2009 17:44:55 +0000 (UTC) (envelope-from scrappy@hub.org) Received: from maia.hub.org (maia-4.hub.org [200.46.204.183]) by hub.org (Postfix) with ESMTP id 8717653BC93; Wed, 13 May 2009 14:44:55 -0300 (ADT) Received: from hub.org ([200.46.204.220]) by maia.hub.org (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024) with ESMTP id 98813-01; Wed, 13 May 2009 14:44:55 -0300 (ADT) Received: by hub.org (Postfix, from userid 1002) id 3172053BC8B; Wed, 13 May 2009 14:44:55 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by hub.org (Postfix) with ESMTP id 2DA9153BC7F; Wed, 13 May 2009 14:44:55 -0300 (ADT) Date: Wed, 13 May 2009 14:44:55 -0300 (ADT) From: "Marc G. Fournier" To: John Baldwin In-Reply-To: <200905131252.15171.jhb@freebsd.org> Message-ID: <20090513142806.V17646@hub.org> References: <20090513040719.D17646@hub.org> <200905131009.00403.jhb@freebsd.org> <20090513133143.M17646@hub.org> <200905131252.15171.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org Subject: Re: More data on 7.2-RELEASE "hangs" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2009 17:44:56 -0000 On Wed, 13 May 2009, John Baldwin wrote: > Well, you had a whole lot of page faults and other VM activity, plus 500k > syscalls. The 'w' is a count of swapped processes, so basically your box is > swapping a whole lot it seems. I think your box is just overloaded. I knew I was going to regret posting that :( What I posted was what vmstat 5 shows after the issue *starts*, not what it normally looks like ... right now, after 10 hours of uptime, and all the same processes running, it looks like: io# vmstat 5 (10 hours uptime now) procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id 0 1 0 10477M 301M 3503 13 1 2 3620 286 0 0 331 45491 4566 26 8 66 0 1 0 10430M 305M 278 7 0 0 550 0 18 0 186 19243 2917 4 3 93 1 1 0 10474M 295M 511 0 0 0 359 0 91 0 253 11632 3516 7 3 90 0 1 0 10447M 310M 819 3 0 0 1473 0 14 0 143 29575 2486 8 3 89 0 1 0 10558M 295M 5008 18 13 5 4128 0 121 0 345 24212 4215 16 7 77 Right now, IO is running ~775 processes ... at the time of the vmstat I provided earlier, it was up to 1400 processes ... since there is only 5 minutes between script runs, something is causing it to go from zero swap -> high swap within a very short period of time, but since things get badly locked up when it happens, I can't isolate where ... I've got the following two ps outputs at the time of the high paging: /bin/ps -aucxHl -O jid > ps-long.out /bin/ps -aux -O jid > ps-short.out Is there anything in there that I could look at as far as what is putting things over the edge? ==== As to the 'overloaded server', here is another server, with more running on it, but exact same configuration: neptune# vmstat 5 (3 days, 18 hours uptime now) procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id 0 0 0 12521M 303M 3969 15 5 3 2271 1603 0 0 444 6491 5165 37 19 44 0 0 0 12464M 309M 3009 1 0 15 2833 0 104 0 296 9378 3689 7 5 88 23 0 0 12476M 297M 3845 3 0 0 2627 0 31 0 279 10545 2986 14 5 81 0 1 0 12530M 266M 5259 0 1 0 2551 0 145 0 432 18070 4133 45 8 47 1 0 0 12587M 237M 7049 0 1 0 4484 0 171 0 357 15953 4715 29 7 64 So, normally these servers purr ... and are highly responsive ... In fact, here is an older 32bit server, less RAM, run about 50% more processes then neptune: mercury# vmstat 5 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id 3 14 1 6817M 114M 641 7 3 1 1036 386 0 0 1109 464 157 5 5 90 0 8 0 6817M 224M 596 33 0 5 5667 3850 86 0 1303 5768 3885 6 7 87 1 10 0 6824M 220M 4332 32 2 0 3228 0 17 0 755 9689 3057 8 7 85 0 9 0 6798M 219M 430 0 0 0 712 0 12 0 1274 4276 3877 2 2 95 0 11 0 6830M 205M 1026 4 1 3 481 0 84 0 1503 5586 4370 6 4 89 ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664