From owner-freebsd-performance@FreeBSD.ORG  Sat Mar  8 22:23:31 2008
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1CA411065670
	for <freebsd-performance@freebsd.org>;
	Sat,  8 Mar 2008 22:23:31 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id B82428FC2A
	for <freebsd-performance@freebsd.org>;
	Sat,  8 Mar 2008 22:23:30 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 7D23546BAD;
	Sat,  8 Mar 2008 17:23:29 -0500 (EST)
Date: Sat, 8 Mar 2008 22:23:29 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Steven Hartland <killing@multiplay.co.uk>
In-Reply-To: <056601c8814c$516c0370$b6db87d4@multiplay.co.uk>
Message-ID: <20080308221441.E11432@fledge.watson.org>
References: <056601c8814c$516c0370$b6db87d4@multiplay.co.uk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org
Subject: Re: rrdtool / mtr causing stalling on 7.0
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Mar 2008 22:23:31 -0000

On Sat, 8 Mar 2008, Steven Hartland wrote:

> We've been suffering on our stats box for some time now where by the machine 
> will just stall for several seconds preventing everything from tab 
> completion to vi newfile.txt.
>
> I was hoping an upgrade to 7.0 and ULE may help the situation but 
> unfortunately it hasn't.
>
> I've attached both dmesg and output from lock profiling during a 5 minute 
> period where I know the stall happened at least once.
>
> Any advice / pointers would be gratefully received.

It looks like the attachment got lost on the way through the mailing list.

I think the first starting point is: what sort of stall is this?  Is it, for 
example, all network communication stalling, all disk I/O stalling, or the 
entire kernel and all processes stalling?  The usual diagnostics are:

- Does the machine stop responding to pings while stalled, and/or possibly
   "catch up" all at once when it recovers?

- If you run the following loop on the machine without any network or console
   I/O, do you see gaps in time stamps:

 	while (1) {
 		sleep 1
 		date >> date.log
 	}

- If you write a short C program that looks a lot like the above loop, but
   logs time stamps into an in-memory buffer, and have it look for gaps in the
   sequence of >3 seconds, does it run across the stall?

Robert N M Watson
Computer Laboratory
University of Cambridge