From owner-freebsd-stable@FreeBSD.ORG Mon Oct 15 15:47:05 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2524416A418; Mon, 15 Oct 2007 15:47:05 +0000 (UTC) (envelope-from lol@chistydom.ru) Received: from hermes.hw.ru (hermes.hw.ru [80.68.240.91]) by mx1.freebsd.org (Postfix) with ESMTP id 3A2B013C45D; Mon, 15 Oct 2007 15:47:02 +0000 (UTC) (envelope-from lol@chistydom.ru) Received: from [80.68.244.40] (account a_popov@rbc.ru [80.68.244.40] verified) by hermes.hw.ru (CommuniGate Pro SMTP 5.0.13) with ESMTPA id 194160120; Mon, 15 Oct 2007 18:47:05 +0400 Message-ID: <47137D36.1020305@chistydom.ru> Date: Mon, 15 Oct 2007 18:46:14 +0400 From: Alexey Popov User-Agent: Thunderbird 2.0.0.6 (X11/20070924) MIME-Version: 1.0 To: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: amrd disk performance drop after running under high load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2007 15:47:05 -0000 Hi. I have 3 Dell 2850 with DELL PERC4 SCSI RAID5 6x300GB running lighttpd serving flash video at around 200Mbit/s. %grep amr /var/run/dmesg.boot amr0: mem 0xf80f0000-0xf80fffff,0xfe9c0000-0xfe9fffff irq 46 at device 14.0 on pci2 amr0: Using 64-bit DMA amr0: delete logical drives supported by controller amr0: Firmware 521X, BIOS H430, 256MB RAM amr0: delete logical drives supported by controller amrd0: on amr0 amrd0: 1430400MB (2929459200 sectors) RAID 5 (optimal) Trying to mount root from ufs:/dev/amrd0s1a %uname -a FreeBSD ???.ru 6.2-STABLE FreeBSD 6.2-STABLE #2: Mon Oct 8 16:25:20 MSD 2007 llp@???.ru:/usr/obj/usr/src/sys/SMP-amd64-HWPMC amd64 % After some time of running under high load disk performance become expremely poor. At that periods 'systat -vm 1' shows something like this: Disks amrd0 KB/t 85.39 tps 5 MB/s 0.38 % busy 99 It shows 100% load and just 2-10 tps. There's nothing bad in /var/log/messages or 'netstat -m' or 'vmstat -z' or anywhere else. This continues 15 - 30 minutes or so and everything becomes fine again. After some time - 10 - 12 hours it repeats. Apart of all, I tried to make mutex profiling and here's the results (sorted by the total number of acquisitions): Bad case: 102 223514 273977 0 14689 1651568 /usr/src/sys/vm/uma_core.c:2349 (512) 950 263099 273968 0 15004 14427 /usr/src/sys/vm/uma_core.c:2450 (512) 108 150422 175840 0 10978 22988519 /usr/src/sys/vm/uma_core.c:1888 (mbuf) 352 160635 173663 0 10896 9678 /usr/src/sys/vm/uma_core.c:2209 (mbuf) 110 134910 173575 0 10838 9464 /usr/src/sys/vm/uma_core.c:2104 (mbuf) 1104 1335319 106888 12 27 1259 /usr/src/sys/netinet/tcp_output.c:253 (so_snd) 171 77754 97685 0 176 207 /usr/src/sys/net/pfil.c:71 (pfil_head_mtx) 140 77104 97685 0 151 128 /usr/src/sys/netinet/ip_fw2.c:164 (IPFW static rules) 100 76543 97685 0 146 45450 /usr/src/sys/netinet/ip_fw2.c:156 (IPFW static rules) 82 77149 97685 0 243 141221 /usr/src/sys/net/pfil.c:63 (pfil_head_mtx) 1644 914481 97679 9 739 949977 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2320 (ipf filter load/unload mutex) 1642 556643 97679 5 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2455 (ipf filter rwlock) 107 89413 97679 0 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2142 (ipf cache rwlock) 907 148940 81439 1 3 7447 /usr/src/sys/kern/kern_lock.c:168 (lockbuilder mtxpool) 1764 152282 63435 2 438 336763 /usr/src/sys/net/route.c:197 (rtentry) And in the good case: 1738 821795 553033 1 41 284 /usr/src/sys/netinet/tcp_output.c:253 (so_snd) 2770 983643 490815 2 6 54 /usr/src/sys/kern/kern_lock.c:168 (lockbuilder mtxpool) 106 430941 477500 0 5555 4507 /usr/src/sys/netinet/ip_fw2.c:164 (IPFW static rules) 95 423926 477500 0 4412 5645 /usr/src/sys/netinet/ip_fw2.c:156 (IPFW static rules) 94 427239 477500 0 6323 7453 /usr/src/sys/net/pfil.c:63 (pfil_head_mtx) 82 432359 477500 0 5244 5768 /usr/src/sys/net/pfil.c:71 (pfil_head_mtx) 296 4751550 477498 9 20837 23019 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2320 (ipf filter load/unload mutex) 85 2913118 477498 6 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2455 (ipf filter rwlock) 55 473891 477498 0 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2142 (ipf cache rwlock) 59 291035 309222 0 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2169 (ipf cache rwlock) 1627 697811 305094 2 2161 2535 /usr/src/sys/net/route.c:147 (radix node head) 232 804172 305094 2 12193 6500 /usr/src/sys/net/route.c:197 (rtentry) 148 892580 303518 2 594 649 /usr/src/sys/net/route.c:1281 (rtentry) 145 584970 303518 1 13479 11148 /usr/src/sys/net/route.c:1265 (rtentry) 121 282669 303518 0 3529 886 /usr/src/sys/net/if_ethersubr.c:409 (em0) Here you can see that high UMA activity happens in periods of low disk performance. But I'm not sure whether this is a root of the problem, not a consequence. I have similiar servers around doing the same things, and they work fine. Also I had the same problem a year ago with another project and that time nothing helped and i had to install Linux. I can provide additional information regarding this server if needed. What else can I try to solve the problem??? With best regards, Alexey Popov