From owner-freebsd-stable@FreeBSD.ORG Tue Oct 16 11:23:48 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C90216A418; Tue, 16 Oct 2007 11:23:48 +0000 (UTC) (envelope-from lol@chistydom.ru) Received: from hermes.hw.ru (hermes.hw.ru [80.68.240.91]) by mx1.freebsd.org (Postfix) with ESMTP id 2B4D313C467; Tue, 16 Oct 2007 11:23:45 +0000 (UTC) (envelope-from lol@chistydom.ru) Received: from [80.68.244.40] (account a_popov@rbc.ru [80.68.244.40] verified) by hermes.hw.ru (CommuniGate Pro SMTP 5.0.13) with ESMTPA id 194370688; Tue, 16 Oct 2007 15:21:06 +0400 Message-ID: <47149E6E.9000500@chistydom.ru> Date: Tue, 16 Oct 2007 15:20:14 +0400 From: Alexey Popov User-Agent: Thunderbird 2.0.0.6 (X11/20070924) MIME-Version: 1.0 To: Kris Kennaway References: <47137D36.1020305@chistydom.ru> <47140906.2020107@FreeBSD.org> <47146FB4.6040306@chistydom.ru> <47147E49.9020301@FreeBSD.org> In-Reply-To: <47147E49.9020301@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org Subject: Re: amrd disk performance drop after running under high load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2007 11:23:48 -0000 Hi. Kris Kennaway wrote: >>>> After some time of running under high load disk performance become >>>> expremely poor. At that periods 'systat -vm 1' shows something like >>>> this: >>> What does "high load" mean? You need to explain the system workload >>> more. >> This web service is similiar to YouTube. This server is video store. I >> have around 200G of *.flv (flash video) files on the server. >> >> I run lighttpd as a web server. Disk load is usually around 50%, network >> output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle. >> >> As you can see it is a trivial service - sending files to network via >> HTTP. > Does lighttpd actually use HTTP accept filters? Don't know how to make sure, but is seems to run appropriate setsockopt (truss output): setsockopt(0x4,0xffff,0x1000,0x7fffffffe620,0x100) = 0 (0x0) > Are you using ipfilter and ipfw? You are paying a performance penalty > for having them. I'm using ipfw and one of the first rules is to pass all TCP established. ipfilter is not used on this server, but it is present in kernel as it can be used on other servers. I have 95% CPU idle, so I think packet filters does not produce significant load on this server. > You might try increasing BUCKET_MAX in sys/vm/uma_core.c. I don't > really understand the code here, but you seem to be hitting a threshold > behaviour where you are constantly running out of space in the per CPU > caches. Thanks, I'll try this. > This can happen if your workload is unbalanced between the CPUs and you > are always allocating on one but freeing on another, but I wouldn't > expect it should happen on your workload. Maybe it can also happen if > your turnover is high enough. This is very unlikely, because I have 5 another video storage servers of the same hardware and software configurations and they feel good. On the other side, all other servers were put in production before or after problematic servers and were filled with content in the other ways and therefore they could have slightly differerent load pattern. Totally I faced this bug three times: 1. The first time there was AFAIR 5.4-RELEASE on DELL 2850 with the same configuration as now. It was mp3 store and I used thttpd as HTTP server to serve mp3's. That time the problems were not so frequent and also it took too long to get back to normal operation so we had to reboot servers once a week or so. The problems began when we moved to new hardware - Dell 2850. That time we suspected amrd driver and had no time to dig in, bacause all the servers of the project were problematic. Installing Linux helped. 2. The second time it was server for static files of the very popular blog. The http server was nginx and disk contented puctures, mp3's and videos. It was Dell 1850 2x146 SCSI mirror. Linux also solved the problem. 3. The problem we see now. At first glance one can say that problem is in Dell's x850 series or amr(4), but we run this hardware on many other projects and they work well. Also Linux on them works. And few hours ago I received feed back from Andrzej Tobola, he has the same problem on FreeBSD 7 with Promise ATA software mirror: === Subject: Re: amrd disk performance drop after running under high load Date: Tue, 16 Oct 2007 10:59:34 +0200 From: Andrzej Tobola To: Alexey Popov Exactly the same here but on big ata RAID0 with big trafic (~10GB/24h): amper% df -h /ftp/priv Filesystem Size Used Avail Capacity Mounted one /dev/ar0a 744G 679G 4.7G 99% /ftp/priv amper% grep ^ar /var/run/dmesg.boot ar0: 763108MB status: READY ar0: disk0 READY using ad6 at ata3-master ar0: disk1 READY using ad4 at ata2-master amper% uname -a FreeBSD xxx 7.0-CURRENT-200709 FreeBSD 7.0-CURRENT-200709 #0: Tue Sep 11 04:44:48 UTC 2007 root@almeida.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386 I am rebooting if I reach this state (approx. a week). It is old bug - a few months ;) cheers, -a === So I can conclude that FreeBSD has a long standing bug in VM that could be triggered when serving large amount of static data (much bigger than memory size) on high rates. Possibly this only applies to large files like mp3 or video. > What does vmstat -z show during the good and bad times? I'll send this data when the bad times will happen next time. With best regards, Alexey Popov