From owner-freebsd-performance@FreeBSD.ORG  Wed Aug 11 21:43:47 2010
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ACF5C1065674
	for <freebsd-performance@freebsd.org>;
	Wed, 11 Aug 2010 21:43:47 +0000 (UTC)
	(envelope-from markham_breitbach@ssimicro.com)
Received: from mail.ssimicro.com (mail.ssimicro.com [64.247.129.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 797D48FC17
	for <freebsd-performance@freebsd.org>;
	Wed, 11 Aug 2010 21:43:47 +0000 (UTC)
Received: from beaver.ssimicro.com (beaver.ssimicro.com [199.247.84.12])
	(authenticated bits=0)
	by mail.ssimicro.com (8.14.4/8.14.4) with ESMTP id o7BLdCGY058963
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
	Wed, 11 Aug 2010 15:39:12 -0600 (MDT)
X-Virus-Status: Clean
X-Virus-Scanned: clamav-milter 0.96.1 at mail.ssimicro.com
Message-ID: <4C63198F.4040003@ssimicro.com>
Date: Wed, 11 Aug 2010 15:43:43 -0600
From: markham breitbach <markham_breitbach@ssimicro.com>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
	rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
MIME-Version: 1.0
To: Julian Elischer <julian@elischer.org>
References: <4C62D827.2030409@ssimicro.com>	<949C0FF2-04AA-4440-82B0-F44A13B8F0C2@mac.com>
	<4C62F272.4030703@ssimicro.com> <4C630156.6060203@elischer.org>
In-Reply-To: <4C630156.6060203@elischer.org>
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-performance@freebsd.org
Subject: Re: massive load average spikes
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Aug 2010 21:43:47 -0000


> load average is a time averaged thing and in the case of a
> 'thundering herd' problem you will see the LA spike up and
> come down again over time.
>
> Do you see any problem as a result of this? Or is it just curiosity?
>
> you might want to use KTR or ktrace with scheduling events if you
> really want to see the reason for this. It could just be a sampling
> error when some 'tick' coincides with the sampling..
>
>
I have not seen any noticeable performance degradation when the LA spikes like this, and
the main nuisance of this was Sendmail's behaviour.  I have since set the options
"RefuseLA=0" and "QueueLA=0" to avoid long stretches of SMTP being unavailable while the
load averaged itself out.

At this point it is really just a nagging feeling that something is misbehaving and it's
going to bite me when I least expect it (it always does!), so I would like to try and
track down the source of the problems, but I'm not even sure where to begin looking. 

I have run some ktrace on sendmail and dovecot, but did not see anything that stood out,
although I don't really know if I would recognize the problem in a kdump anyway (Too much
information!)  I'm not at all familiar with KTR, however.  Is this something that can be
run on a production host or should it be isolated to a dev box?  I have cloned the jail
into a dev environment on identical hardware, but only see the issue under production. 
I'm not sure if this is a factor of insufficient load or just not enough random
strangeness outside of production. 

Any suggestions for how KTR might help pin this down or what to look for?