Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Oct 2007 18:46:14 +0400
From:      Alexey Popov <lol@chistydom.ru>
To:        freebsd-stable@freebsd.org,  freebsd-hackers@freebsd.org
Subject:   amrd disk performance drop after running under high load
Message-ID:  <47137D36.1020305@chistydom.ru>

next in thread | raw e-mail | index | archive | help
Hi.

I have 3 Dell 2850 with DELL PERC4 SCSI RAID5 6x300GB running lighttpd 
serving flash video at around 200Mbit/s.

%grep amr /var/run/dmesg.boot
amr0: <LSILogic MegaRAID 1.53> mem 
0xf80f0000-0xf80fffff,0xfe9c0000-0xfe9fffff irq 46 at device 14.0 on pci2
amr0: Using 64-bit DMA
amr0: delete logical drives supported by controller
amr0: <LSILogic PERC 4e/Di> Firmware 521X, BIOS H430, 256MB RAM
amr0: delete logical drives supported by controller
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 1430400MB (2929459200 sectors) RAID 5 (optimal)
Trying to mount root from ufs:/dev/amrd0s1a

%uname -a
FreeBSD ???.ru 6.2-STABLE FreeBSD 6.2-STABLE #2: Mon Oct  8 16:25:20 MSD 
2007     llp@???.ru:/usr/obj/usr/src/sys/SMP-amd64-HWPMC  amd64
%

After some time of running under high load disk performance become 
expremely poor. At that periods 'systat -vm 1' shows something like this:

Disks amrd0
KB/t  85.39
tps       5
MB/s   0.38
% busy   99

It shows 100% load and just 2-10 tps. There's nothing bad in 
/var/log/messages or 'netstat -m' or 'vmstat -z' or anywhere  else. This 
continues 15 - 30 minutes or so and everything becomes fine again. After 
some time - 10 - 12 hours it repeats.

Apart of all, I tried to make mutex profiling and here's the results 
(sorted by the total number of acquisitions):

Bad case:

  102 223514 273977 0 14689 1651568 /usr/src/sys/vm/uma_core.c:2349 (512)
  950 263099 273968 0 15004 14427 /usr/src/sys/vm/uma_core.c:2450 (512)
  108 150422 175840 0 10978 22988519 /usr/src/sys/vm/uma_core.c:1888 (mbuf)
  352 160635 173663 0 10896 9678 /usr/src/sys/vm/uma_core.c:2209 (mbuf)
  110 134910 173575 0 10838 9464 /usr/src/sys/vm/uma_core.c:2104 (mbuf)
  1104 1335319 106888 12 27 1259 /usr/src/sys/netinet/tcp_output.c:253 
(so_snd)
  171 77754 97685 0 176 207 /usr/src/sys/net/pfil.c:71 (pfil_head_mtx)
  140 77104 97685 0 151 128 /usr/src/sys/netinet/ip_fw2.c:164 (IPFW 
static rules)
  100 76543 97685 0 146 45450 /usr/src/sys/netinet/ip_fw2.c:156 (IPFW 
static rules)
  82 77149 97685 0 243 141221 /usr/src/sys/net/pfil.c:63 (pfil_head_mtx)
  1644 914481 97679 9 739 949977 
/usr/src/sys/contrib/ipfilter/netinet/fil.c:2320 (ipf filter load/unload 
mutex)
  1642 556643 97679 5 0 0 
/usr/src/sys/contrib/ipfilter/netinet/fil.c:2455 (ipf filter rwlock)
  107 89413 97679 0 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2142 
(ipf cache rwlock)
  907 148940 81439 1 3 7447 /usr/src/sys/kern/kern_lock.c:168 
(lockbuilder mtxpool)
  1764 152282 63435 2 438 336763 /usr/src/sys/net/route.c:197 (rtentry)

And in the good case:

  1738 821795 553033 1 41 284 /usr/src/sys/netinet/tcp_output.c:253 (so_snd)
  2770 983643 490815 2 6 54 /usr/src/sys/kern/kern_lock.c:168 
(lockbuilder mtxpool)
  106 430941 477500 0 5555 4507 /usr/src/sys/netinet/ip_fw2.c:164 (IPFW 
static rules)
  95 423926 477500 0 4412 5645 /usr/src/sys/netinet/ip_fw2.c:156 (IPFW 
static rules)
  94 427239 477500 0 6323 7453 /usr/src/sys/net/pfil.c:63 (pfil_head_mtx)
  82 432359 477500 0 5244 5768 /usr/src/sys/net/pfil.c:71 (pfil_head_mtx)
  296 4751550 477498 9 20837 23019 
/usr/src/sys/contrib/ipfilter/netinet/fil.c:2320 (ipf filter load/unload 
mutex)
  85 2913118 477498 6 0 0 
/usr/src/sys/contrib/ipfilter/netinet/fil.c:2455 (ipf filter rwlock)
  55 473891 477498 0 0 0 
/usr/src/sys/contrib/ipfilter/netinet/fil.c:2142 (ipf cache rwlock)
  59 291035 309222 0 0 0 
/usr/src/sys/contrib/ipfilter/netinet/fil.c:2169 (ipf cache rwlock)
  1627 697811 305094 2 2161 2535 /usr/src/sys/net/route.c:147 (radix 
node head)
  232 804172 305094 2 12193 6500 /usr/src/sys/net/route.c:197 (rtentry)
  148 892580 303518 2 594 649 /usr/src/sys/net/route.c:1281 (rtentry)
  145 584970 303518 1 13479 11148 /usr/src/sys/net/route.c:1265 (rtentry)
  121 282669 303518 0 3529 886 /usr/src/sys/net/if_ethersubr.c:409 (em0)

Here you can see that high UMA activity happens in periods of low disk 
performance. But I'm not sure whether this is a root of the problem, not 
a consequence.

I have similiar servers around doing the same things, and they work 
fine. Also I had the same problem a year ago with another project and 
that time nothing helped and i had to install Linux.

I can provide additional information regarding this server if needed.
What else can I try to solve the problem???

With best regards,
Alexey Popov



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47137D36.1020305>