From owner-freebsd-performance@FreeBSD.ORG Sat Mar 8 22:23:31 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1CA411065670 for ; Sat, 8 Mar 2008 22:23:31 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id B82428FC2A for ; Sat, 8 Mar 2008 22:23:30 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 7D23546BAD; Sat, 8 Mar 2008 17:23:29 -0500 (EST) Date: Sat, 8 Mar 2008 22:23:29 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Steven Hartland In-Reply-To: <056601c8814c$516c0370$b6db87d4@multiplay.co.uk> Message-ID: <20080308221441.E11432@fledge.watson.org> References: <056601c8814c$516c0370$b6db87d4@multiplay.co.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org Subject: Re: rrdtool / mtr causing stalling on 7.0 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Mar 2008 22:23:31 -0000 On Sat, 8 Mar 2008, Steven Hartland wrote: > We've been suffering on our stats box for some time now where by the machine > will just stall for several seconds preventing everything from tab > completion to vi newfile.txt. > > I was hoping an upgrade to 7.0 and ULE may help the situation but > unfortunately it hasn't. > > I've attached both dmesg and output from lock profiling during a 5 minute > period where I know the stall happened at least once. > > Any advice / pointers would be gratefully received. It looks like the attachment got lost on the way through the mailing list. I think the first starting point is: what sort of stall is this? Is it, for example, all network communication stalling, all disk I/O stalling, or the entire kernel and all processes stalling? The usual diagnostics are: - Does the machine stop responding to pings while stalled, and/or possibly "catch up" all at once when it recovers? - If you run the following loop on the machine without any network or console I/O, do you see gaps in time stamps: while (1) { sleep 1 date >> date.log } - If you write a short C program that looks a lot like the above loop, but logs time stamps into an in-memory buffer, and have it look for gaps in the sequence of >3 seconds, does it run across the stall? Robert N M Watson Computer Laboratory University of Cambridge