From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 31 13:52:49 2007 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3098116A418; Wed, 31 Oct 2007 13:52:49 +0000 (UTC) (envelope-from lol@chistydom.ru) Received: from hermes.hw.ru (hermes.hw.ru [80.68.240.91]) by mx1.freebsd.org (Postfix) with ESMTP id EE26413C4B2; Wed, 31 Oct 2007 13:52:47 +0000 (UTC) (envelope-from lol@chistydom.ru) Received: from [80.68.244.40] (account a_popov@rbc.ru [80.68.244.40] verified) by hermes.hw.ru (CommuniGate Pro SMTP 5.0.13) with ESMTPA id 197371779; Wed, 31 Oct 2007 14:56:05 +0300 Message-ID: <47286CF2.4090804@chistydom.ru> Date: Wed, 31 Oct 2007 14:54:26 +0300 From: Alexey Popov User-Agent: Thunderbird 2.0.0.6 (X11/20070924) MIME-Version: 1.0 To: Kris Kennaway References: <47137D36.1020305@chistydom.ru> <47140906.2020107@FreeBSD.org> <47146FB4.6040306@chistydom.ru> <47147E49.9020301@FreeBSD.org> <47149E6E.9000500@chistydom.ru> <4715035D.2090802@FreeBSD.org> <4715C297.1020905@chistydom.ru> <4715C5D7.7060806@FreeBSD.org> <471EE4D9.5080307@chistydom.ru> <4723BF87.20302@FreeBSD.org> In-Reply-To: <4723BF87.20302@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Wed, 31 Oct 2007 14:46:07 +0000 Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org Subject: Re: amrd disk performance drop after running under high load X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Oct 2007 13:52:49 -0000 Hi Kris Kennaway wrote: >>>>>> So I can conclude that FreeBSD has a long standing bug in VM that >>>>>> could be triggered when serving large amount of static data (much >>>>>> bigger than memory size) on high rates. Possibly this only applies >>>>>> to large files like mp3 or video. >>>>> It is possible, we have further work to do to conclude this though. >>>> I forgot to mention I have pmc and kgmon profiling for good and bad >>>> times. But I have not enough knowledge to interpret it right and not >>>> sure if it can help. >>> pmc would be useful. >> pmc profiling attached. > OK, the pmc traces do seem to show that it's not a lock contention > issue. That being the case I don't think the fact that different > servers perform better is directly related. But it was evidence of mbuf lock contention in mutex profiling, wasn't it? As far as I understand, mutex problems can exist without increasing CPU load in pmc stats, right? > There is also no evidence of a VM problem. What your vmstat and pmc > traces show is that your system really isn't doing much work at all, > relatively speaking. > There is also still no evidence of a disk problem. In fact your disk > seems to be almost idle in both cases you provided, only doing between 1 > and 10 operations per second, which is trivial. vmstat and network output graphs shows that the problem exists. If it is not a disk or network or VM problem, what else could be wrong? > In the "good" case you are getting a much higher interrupt rate but with > the data you provided I can't tell where from. You need to run vmstat > -i at regular intervals (e.g. every 10 seconds for a minute) during the > "good" and "bad" times, since it only provides counters and an average > rate over the uptime of the system. I'll try this, but AFAIR there was no strangeness with interrupts. I believe the reason of high interrupt rate in "good" cases is that server sends much traffic. > What there is evidence of is an interrupt aliasing problem between em > and USB: > irq16: uhci0 1464547796 1870 > irq64: em0 1463513610 1869 I tried disabling USB in kernel, this ussie was gone, but the main problem was left. Also I have this issue with interrupt aliasing on many servers without problems. With best regards, Alexey Popov