From owner-freebsd-stable@FreeBSD.ORG Wed May 13 21:51:16 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 50BD1106564A for ; Wed, 13 May 2009 21:51:16 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 23DAC8FC0C for ; Wed, 13 May 2009 21:51:16 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id CA8A746B0C; Wed, 13 May 2009 17:51:15 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id B76E18A025; Wed, 13 May 2009 17:51:14 -0400 (EDT) From: John Baldwin To: "Marc G. Fournier" Date: Wed, 13 May 2009 14:02:40 -0400 User-Agent: KMail/1.9.7 References: <20090513040719.D17646@hub.org> <200905131252.15171.jhb@freebsd.org> <20090513142806.V17646@hub.org> In-Reply-To: <20090513142806.V17646@hub.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905131402.41104.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 13 May 2009 17:51:14 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00, DATE_IN_PAST_03_06,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-stable@freebsd.org Subject: Re: More data on 7.2-RELEASE "hangs" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2009 21:51:16 -0000 On Wednesday 13 May 2009 1:44:55 pm Marc G. Fournier wrote: > On Wed, 13 May 2009, John Baldwin wrote: > > > Well, you had a whole lot of page faults and other VM activity, plus 500k > > syscalls. The 'w' is a count of swapped processes, so basically your box is > > swapping a whole lot it seems. I think your box is just overloaded. > > I knew I was going to regret posting that :( > > What I posted was what vmstat 5 shows after the issue *starts*, not what > it normally looks like ... right now, after 10 hours of uptime, and all > the same processes running, it looks like: > > io# vmstat 5 (10 hours uptime now) > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id > 0 1 0 10477M 301M 3503 13 1 2 3620 286 0 0 331 45491 4566 26 8 66 > 0 1 0 10430M 305M 278 7 0 0 550 0 18 0 186 19243 2917 4 3 93 > 1 1 0 10474M 295M 511 0 0 0 359 0 91 0 253 11632 3516 7 3 90 > 0 1 0 10447M 310M 819 3 0 0 1473 0 14 0 143 29575 2486 8 3 89 > 0 1 0 10558M 295M 5008 18 13 5 4128 0 121 0 345 24212 4215 16 7 77 > > Right now, IO is running ~775 processes ... at the time of the vmstat I > provided earlier, it was up to 1400 processes ... since there is only 5 > minutes between script runs, something is causing it to go from zero swap > -> high swap within a very short period of time, but since things get > badly locked up when it happens, I can't isolate where ... > > I've got the following two ps outputs at the time of the high paging: > > /bin/ps -aucxHl -O jid > ps-long.out > /bin/ps -aux -O jid > ps-short.out Perhaps do 'sort -n -k6 < ps-short.out' to find which processes have large virtual memory sizes? Something is using a lot of memory and causing your box to thrash. -- John Baldwin