From owner-freebsd-net@FreeBSD.ORG Mon Jul 7 15:35:26 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E7F31065673; Mon, 7 Jul 2008 15:35:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id C03C48FC26; Mon, 7 Jul 2008 15:35:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m67FZJfk011715 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 8 Jul 2008 01:35:21 +1000 Date: Tue, 8 Jul 2008 01:35:19 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Andre Oppermann In-Reply-To: <4871E618.1080500@freebsd.org> Message-ID: <20080708002228.G680@besplex.bde.org> References: <4867420D.7090406@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr><4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Net , Bart Van Kerckhove , Ingo Flaschberger , Paul Subject: Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 15:35:26 -0000 On Mon, 7 Jul 2008, Andre Oppermann wrote: > Paul, > > to get a systematic analysis of the performance please do the following > tests and put them into a table for easy comparison: > > 1. inbound pps w/o loss with interface in monitor mode (ifconfig em0 > monitor) >... I won't be running many of these tests, but found this one interesting -- I didn't know about monitor mode. It gives the following behaviour: -monitor ttcp receiving on bge0 at 397 kpps: 35% idle (8.0-CURRENT) 13.6 cm/p monitor ttcp receiving on bge0 at 397 kpps: 83% idle (8.0-CURRENT) 5.8 cm/p -monitor ttcp receiving on em0 at 580 kpps: 5% idle (~5.2) 12.5 cm/p monitor ttcp receiving on em0 at 580 kpps: 65% idle (~5.2) 4.8 cm/p cm/p = k8-dc-misses (bge0 system) cm/p = k7-dc-misses (em0 system) So it seems that the major overheads are not near the driver (as I already knew), and upper layers are responsible for most of the cache misses. The packet header is accessed even in monitor mode, so I think most of the cache misses in upper layers are not related to the packet header. Maybe they are due mainly to perfect non-locality for mbufs. Other cm/p numbers: ttcp sending on bge0 at 640 kpps: (~5.2) 11 cm/p ttcp sending on bge0 at 580 kpps: (8.0-CURRENT) 9 cm/p (-current is 10% slower despite having lower cm/p. This seems to be due to extra instructions executed) ping -fq -c1000000 localhost at 171 kpps: (8.0-CURRENT) 12-33 cm/p (This is certainly CPU-bound. lo0 is much slower than bge0. Latency (rtt) is 2 us. It is 3 us in ~5.2 and was 4 in -current until very recently.) ping -fq -c1000000 etherhost at 40 kpps: (8.0-CURRENT) 55 cm/p (The rate is quite low because flood ping doesn't actually flood. It tries to limit the rate to max(100, 1/latency), but it tends to go at a rate of ql(t)/latency where ql(t) is the average hardware queue length at the current time t. ql(t) starts at 1 and builds up after a minute or 2 to a maximum of about 10 on my hardware. Latency is always ~100 us, so the average ql(t) must have been ~4.) ping -fq -c1000000 etherhost at 20 kpps: (8.0-CURRENT) 45 cm/p (Another run to record the average latency (it was 121) showed high variance.) netblast sending on bge0 at 582 kpps: (8.0-CURRENT) 9.8 cm/p (Packet blasting benchmarks actually flood, unlike flood ping. This is hard to implement, since select() for output-ready doesn't work. netblast has to busy wait, while ttcp guesses how long to sleep but cannot sleep for a short enough interval unless queues are too large or hz is too small. My systems are configured with HZ = 100 and snd.ifq too large so that sleeping for 1/Hz works for ttcp. netblast still busy-waits. This gives an interesting difference for netblast. It tries to send 800 k packets in 1 second by only successfully sends 582 k. 9.8 cm/p is for #misses / 582k. The 300k unsuccessful sends apparently don't cause many cache misses. But variance is high...) ttcp sending on bge0 at 577 kpps: (8.0-CURRENT) 15.5 cm/p (Another run shows high variance.) ttcp rates have low variance for a given kernel but high variance for different kernels (an extra unrelated byte in the text section can cause a 30% change). High variance would also be explained by non-locality of mbufs. Cycling through lots of mbufs would maximize cache misses but random reuse of mbufs would give variance. Or the cycling and variance might be more in general allocation. There is sillyness in getsockaddr(): sendit() calls getsockaddr() and getsockaddr() always uses malloc(), but allocation on the stack works for at the call from sendit(). This malloc() seemed to be responsible for a cache miss or two, but when I changed it to use the stack the results were inconclusive. Bruce