From owner-freebsd-stable@FreeBSD.ORG  Wed May 13 21:51:16 2009
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 50BD1106564A
	for <freebsd-stable@freebsd.org>; Wed, 13 May 2009 21:51:16 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 23DAC8FC0C
	for <freebsd-stable@freebsd.org>; Wed, 13 May 2009 21:51:16 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id CA8A746B0C;
	Wed, 13 May 2009 17:51:15 -0400 (EDT)
Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id B76E18A025;
	Wed, 13 May 2009 17:51:14 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: "Marc G. Fournier" <scrappy@hub.org>
Date: Wed, 13 May 2009 14:02:40 -0400
User-Agent: KMail/1.9.7
References: <20090513040719.D17646@hub.org>
	<200905131252.15171.jhb@freebsd.org>
	<20090513142806.V17646@hub.org>
In-Reply-To: <20090513142806.V17646@hub.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200905131402.41104.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Wed, 13 May 2009 17:51:14 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,
	DATE_IN_PAST_03_06,RDNS_NONE autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: freebsd-stable@freebsd.org
Subject: Re: More data on 7.2-RELEASE "hangs"
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 13 May 2009 21:51:16 -0000

On Wednesday 13 May 2009 1:44:55 pm Marc G. Fournier wrote:
> On Wed, 13 May 2009, John Baldwin wrote:
> 
> > Well, you had a whole lot of page faults and other VM activity, plus 500k
> > syscalls.  The 'w' is a count of swapped processes, so basically your box is
> > swapping a whole lot it seems.  I think your box is just overloaded.
> 
> I knew I was going to regret posting that :(
> 
> What I posted was what vmstat 5 shows after the issue *starts*, not what 
> it normally looks like ... right now, after 10 hours of uptime, and all 
> the same processes running, it looks like:
> 
> io# vmstat 5 (10 hours uptime now)
>   procs      memory      page                    disks     faults         cpu
>   r b w     avm    fre   flt  re  pi  po    fr  sr da0 pa0   in   sy   cs us sy id
>   0 1 0  10477M   301M  3503  13   1   2  3620 286   0   0  331 45491 4566 26  8 66
>   0 1 0  10430M   305M   278   7   0   0   550   0  18   0  186 19243 2917 4  3 93
>   1 1 0  10474M   295M   511   0   0   0   359   0  91   0  253 11632 3516 7  3 90
>   0 1 0  10447M   310M   819   3   0   0  1473   0  14   0  143 29575 2486 8  3 89
>   0 1 0  10558M   295M  5008  18  13   5  4128   0 121   0  345 24212 4215 16  7 77
> 
> Right now, IO is running ~775 processes ... at the time of the vmstat I 
> provided earlier, it was up to 1400 processes ... since there is only 5 
> minutes between script runs, something is causing it to go from zero swap 
> -> high swap within a very short period of time, but since things get 
> badly locked up when it happens, I can't isolate where ...
> 
> I've got the following two ps outputs at the time of the high paging:
> 
> /bin/ps -aucxHl -O jid > ps-long.out
> /bin/ps -aux -O jid > ps-short.out

Perhaps do 'sort -n -k6 < ps-short.out' to find which processes have large
virtual memory sizes?  Something is using a lot of memory and causing your
box to thrash.

-- 
John Baldwin