From owner-freebsd-performance@FreeBSD.ORG Tue Dec 21 12:58:30 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 558B716A4CF for ; Tue, 21 Dec 2004 12:58:30 +0000 (GMT) Received: from mutare.noc.clara.net (mutare.noc.clara.net [195.8.70.95]) by mx1.FreeBSD.org (Postfix) with ESMTP id 98C1D43D45 for ; Tue, 21 Dec 2004 12:58:29 +0000 (GMT) (envelope-from ollie@mutare.noc.clara.net) Received: from ollie by mutare.noc.clara.net with local (Exim 4.43) id 1Cgjau-000BWA-EY for freebsd-performance@freebsd.org; Tue, 21 Dec 2004 12:58:28 +0000 Date: Tue, 21 Dec 2004 12:58:28 +0000 From: Ollie Cook To: freebsd-performance@freebsd.org Message-ID: <20041221125828.GE42562@mutare.noc.clara.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-Operating-System: FreeBSD 4.10-STABLE i386 X-NCC-RegID: uk.claranet Sender: Ollie Cook X-Mailman-Approved-At: Tue, 21 Dec 2004 13:33:07 +0000 Subject: Re: Odd news transit performance problem X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Dec 2004 12:58:30 -0000 On Mon, Dec 20, 2004 at 06:24:49PM +0000, Dave Williams wrote: > Various reporting tools (systat, vmstat, top, etc) report that the > box is idle - there's no significant contention for memory, disk, > network, etc. that we can see and actually bouncing the box seems > to bring performance back up to speed again for a period - restarting > innd doesn't have the same effect. I'd like to flesh out this thread with some more detail. I've put my interpretation of the detail in too, but if I'm not correct, clarification would be appreciated! The host is built as follows: - Intel Xeon 2.8GHz - 3GB RAM - 1x fxp NIC - 2x em NICs - 2x sym SCSI controllers <1010-66> - 14x 18GB U160 SCSI disks (not quite split equally across the two SCSI controllers) - vinum is used to stripe volumes across multiple spindles. In particular the history database is striped over 10 devices. The host is fed news from only two sources which on average equates to 25 streams. The total volume this host is handling per day is of the order of 1.4TB inbound. At present it's handling ~50 articles per second (average article size is 350KB). News is then fed out to a number of other hosts which equates to a further 40 streams. We are seeing this host not keep up with all the news being offered to it, and consequently the hosts feeding it are keeping a backlog of articles to offer it later. The host that is backlogging is behind the first em interface. 'top' shows the CPU to be on average 25% idle, and that little swap is being used: CPU states: 11.3% user, 1.6% nice, 47.1% system, 15.2% interrupt, 24.9% idle Mem: 438M Active, 1815M Inact, 658M Wired, 102M Cache, 199M Buf, 4348K Free Swap: 4096M Total, 156K Used, 4096M Free 'vmstat 1' shows a number of pages being paged out per second, but few ever being paged in? How does this tally with the fact that < 1MB of swap is in use? I think my understanding of paging could be at fault here. :) procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id 3 0 0 576472 88672 4854 0 0 139 9465 0 0 1 1474 10821 3655 18 48 33 2 0 0 577492 148356 4796 1 0 94 9566 19789 0 0 1675 12267 4471 17 53 29 1 6 0 576088 129444 5655 1 0 139 10133 0 0 1 1511 11115 4033 20 51 29 1 6 0 576172 106752 5110 0 0 146 10618 0 0 0 1462 12521 4411 24 50 26 2 0 0 577168 167000 4753 0 1 142 8541 19797 0 4 1664 11800 4307 20 52 28 3 0 0 580476 143228 6905 2 1 93 11867 0 0 1 1426 10999 3798 19 51 30 2 6 0 579984 126212 4678 0 0 209 8677 0 0 0 1492 14254 5125 15 44 41 4 0 0 575016 112616 6526 0 0 94 11638 0 0 1 1736 11860 4002 21 44 35 2 5 0 576932 93844 5016 0 0 93 8178 0 0 0 1477 8863 2898 16 45 39 0 7 0 579076 164308 2546 1 7 3553 4244 19793 13 1 3921 10493 2388 9 43 47 Both 'systat -vmstat' and 'iostat' show the disks are not busy. They are transferring approximately 2.0MB/s on average. 'systat -vmstat' indicates the devices are <20% busy. Indeed, writing from /dev/zero to a vinum volume striped over ten disks I can achieve a further 5MB/s per disk over and above what the system is usually generating. This still only pushes the disks to 50% busy. Network-wise the fxp interface does ~45mbit/s (majority outbound), em0 does 20mbit (majority inbound) does and em1 does ~150mbit/s (majority outbound). Duplex settings are all correct and 'netstat' shows very few errors on host interfaces: Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll fxp0 1500 00:e0:18:a4:d4:4d 96710493 0 103018264 2 0 em0 1500 00:e0:18:a4:d4:4c 393932814 0 429947240 0 0 em1 1500 00:07:e9:0f:9a:34 465099760 0 559720416 0 0 There appears not to be a shortage of mbufs: # netstat -m 1307/5936/262144 mbufs in use (current/peak/max): 1307 mbufs allocated to data 1301/4970/65536 mbuf clusters in use (current/peak/max) 11424 Kbytes allocated to network (5% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines There doesn't appear to be another bottleneck in the network subsystem because an scp from one host to the other over em0 generates a further 30mbit/s of traffic. Receive queues are about 20k on each inbound news stream. Does this figure indicate that data that has not yet been transferred from the kernel to user space? The following sysctl's have been set over the years: net.inet.tcp.inflight_enable=1 vfs.vmiodirenable=1 vfs.lorunningspace=2097152 vfs.hirunningspace=4194304 kern.maxfiles=262144 kern.maxfilesperproc=32768 net.inet.tcp.rfc1323=1 net.inet.tcp.delayed_ack=0 net.inet.tcp.sendspace=131070 net.inet.tcp.recvspace=131070 net.inet.udp.recvspace=65535 net.inet.udp.maxdgram=57344 net.local.stream.recvspace=65535 net.local.stream.sendspace=65535 kern.polling.enable=1 as well as the following kernel configuration options: options NMBCLUSTERS=65536 options MAXDSIZ="(1024*1024*1024)" options MAXSSIZ="(256*1024*1024)" options DFLDSIZ="(256*1024*1024)" options DEVICE_POLLING options HZ=1000 Given that the CPU is not wedged at 100%, that there is free memory, that the disks have plenty of bandwidth left, and likewise the network interfaces, I'm convinced that this host ought to be keeping up and not causing other hosts to keep a backlog for it. Does anyone have any suggestions for where else we might look to see why this host doesn't appear to be performing as well as one might expect it do? Yours, Ollie -- Ollie Cook Systems Architect, Claranet UK ollie@uk.clara.net +44 20 7685 8065