From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 31 13:52:49 2007
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3098116A418;
	Wed, 31 Oct 2007 13:52:49 +0000 (UTC)
	(envelope-from lol@chistydom.ru)
Received: from hermes.hw.ru (hermes.hw.ru [80.68.240.91])
	by mx1.freebsd.org (Postfix) with ESMTP id EE26413C4B2;
	Wed, 31 Oct 2007 13:52:47 +0000 (UTC)
	(envelope-from lol@chistydom.ru)
Received: from [80.68.244.40] (account a_popov@rbc.ru [80.68.244.40] verified)
	by hermes.hw.ru (CommuniGate Pro SMTP 5.0.13)
	with ESMTPA id 197371779; Wed, 31 Oct 2007 14:56:05 +0300
Message-ID: <47286CF2.4090804@chistydom.ru>
Date: Wed, 31 Oct 2007 14:54:26 +0300
From: Alexey Popov <lol@chistydom.ru>
User-Agent: Thunderbird 2.0.0.6 (X11/20070924)
MIME-Version: 1.0
To: Kris Kennaway <kris@FreeBSD.org>
References: <47137D36.1020305@chistydom.ru> <47140906.2020107@FreeBSD.org>
	<47146FB4.6040306@chistydom.ru> <47147E49.9020301@FreeBSD.org>
	<47149E6E.9000500@chistydom.ru> <4715035D.2090802@FreeBSD.org>
	<4715C297.1020905@chistydom.ru> <4715C5D7.7060806@FreeBSD.org>
	<471EE4D9.5080307@chistydom.ru> <4723BF87.20302@FreeBSD.org>
In-Reply-To: <4723BF87.20302@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Mailman-Approved-At: Wed, 31 Oct 2007 14:46:07 +0000
Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: amrd disk performance drop after running under high load
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Oct 2007 13:52:49 -0000

Hi

Kris Kennaway wrote:
>>>>>> So I can conclude that FreeBSD has a long standing bug in VM that 
>>>>>> could be triggered when serving large amount of static data (much 
>>>>>> bigger than memory size) on high rates. Possibly this only applies 
>>>>>> to large files like mp3 or video. 
>>>>> It is possible, we have further work to do to conclude this though.
>>>> I forgot to mention I have pmc and kgmon profiling for good and bad 
>>>> times. But I have not enough knowledge to interpret it right and not 
>>>> sure if it can help.
>>> pmc would be useful.
>> pmc profiling attached.
> OK, the pmc traces do seem to show that it's not a lock contention 
> issue.  That being the case I don't think the fact that different 
> servers perform better is directly related. 
But it was evidence of mbuf lock contention in mutex profiling, wasn't 
it? As far as I understand, mutex problems can exist without increasing 
CPU load in pmc stats, right?

> There is also no evidence of a VM problem.  What your vmstat and pmc 
> traces show is that your system really isn't doing much work at all, 
> relatively speaking.
> There is also still no evidence of a disk problem.  In fact your disk 
> seems to be almost idle in both cases you provided, only doing between 1 
> and 10 operations per second, which is trivial.
vmstat and network output graphs shows that the problem exists. If it is 
not a disk or network or VM problem, what else could be wrong?

> In the "good" case you are getting a much higher interrupt rate but with 
> the data you provided I can't tell where from.  You need to run vmstat 
> -i at regular intervals (e.g. every 10 seconds for a minute) during the 
> "good" and "bad" times, since it only provides counters and an average 
> rate over the uptime of the system.
I'll try this, but AFAIR there was no strangeness with interrupts.

I believe the reason of high interrupt rate in "good" cases is that 
server sends much traffic.

> What there is evidence of is an interrupt aliasing problem between em 
> and USB:
> irq16: uhci0                  1464547796       1870
> irq64: em0                    1463513610       1869
I tried disabling USB in kernel, this ussie was gone, but the main 
problem was left. Also I have this issue with interrupt aliasing on many 
servers without problems.

With best regards,
Alexey Popov