From owner-freebsd-hackers@FreeBSD.ORG  Sun Nov 11 17:26:52 2007
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4793416A417;
	Sun, 11 Nov 2007 17:26:52 +0000 (UTC)
	(envelope-from lol@chistydom.ru)
Received: from comtv.ru (comtv.ru [217.10.32.17])
	by mx1.freebsd.org (Postfix) with ESMTP id 551E513C4C5;
	Sun, 11 Nov 2007 17:26:50 +0000 (UTC)
	(envelope-from lol@chistydom.ru)
X-UCL: actv
Received: from yoda.org.ru ([83.167.98.162] verified)
	by comtv.ru (CommuniGate Pro SMTP 4.1.8)
	with ESMTP id 17241732; Sun, 11 Nov 2007 20:26:37 +0300
Received: from [192.168.102.10] (unknown [192.168.102.10])
	(Authenticated sender: llp@soekris.ru)
	by yoda.org.ru (Postfix) with ESMTP id 8216928CEB;
	Sun, 11 Nov 2007 20:26:48 +0300 (MSK)
Message-ID: <47373B43.9060406@chistydom.ru>
Date: Sun, 11 Nov 2007 20:26:27 +0300
From: Alexey Popov <lol@chistydom.ru>
User-Agent: Thunderbird 2.0.0.6 (X11/20070924)
MIME-Version: 1.0
To: Kris Kennaway <kris@FreeBSD.org>
References: <47137D36.1020305@chistydom.ru> <47140906.2020107@FreeBSD.org>
	<47146FB4.6040306@chistydom.ru> <47147E49.9020301@FreeBSD.org>
	<47149E6E.9000500@chistydom.ru> <4715035D.2090802@FreeBSD.org>
	<4715C297.1020905@chistydom.ru> <4715C5D7.7060806@FreeBSD.org>
	<471EE4D9.5080307@chistydom.ru> <4723BF87.20302@FreeBSD.org>
	<47344E47.9050908@chistydom.ru> <47349A17.3080806@FreeBSD.org>
In-Reply-To: <47349A17.3080806@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Mailman-Approved-At: Sun, 11 Nov 2007 17:47:17 +0000
Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: amrd disk performance drop after running under high load
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Nov 2007 17:26:52 -0000

Hi.

Kris Kennaway wrote:
>>> In the "good" case you are getting a much higher interrupt rate but 
>>> with the data you provided I can't tell where from.  You need to run 
>>> vmstat -i at regular intervals (e.g. every 10 seconds for a minute) 
>>> during the "good" and "bad" times, since it only provides counters 
>>> and an average rate over the uptime of the system.
>>
>> Now I'm running 10-process lighttpd and the problem became no so big.
>>
>> I collected interrupt stats and it shows no relation beetween 
>> ionterrupts and slowdowns. Here is it:
>> http://83.167.98.162/gprof/intr-graph/
>>
>> Also I have similiar statistics on mutex profiling and it shows 
>> there's no problem in mutexes. 
>> http://83.167.98.162/gprof/mtx-graph/mtxgifnew/
>>
>> I have no idea what else to check.

> I don't know what this graph is showing me :)  When precisely is the 
> system behaving poorly?
Take a look at "Disk Load %" picture at 
http://83.167.98.162/gprof/intr-graph/

At ~ 17:00, 03:00-04:00, 13:00-14:00, 00:30-01:30, 11:00-13:00 it shows 
peaks of disk activity which really never happen. As I said in the 
beginning of the thread in this "peak" moments disk becomes slow and 
vmstat shows 100% disk load while performing < 10 tps. Other grafs at 
this page shows that there's no relation to interrupts rate of amr or em 
device. You advised me to check it.

When I was using single-process lighttpd the problem was much harder as 
you can see at http://83.167.98.162/gprof/graph/ . At first picture on 
this page you can see disk load peaks at 18:00 and 15:00 which leaded to 
decreasing network output because disk was too slow.

Back in this thread we suspected UMA mutexes. In order to check it I 
collected mutex profiling stats and draw graphs over time and they also 
didn't show anything interesting. All mutex graphs were smooth while 
disk load peaks. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/

With best regards,
Alexey Popov