Date: Thu, 11 Nov 2004 22:36:52 +0000 (GMT) From: Robert Watson <rwatson@freebsd.org> To: TM4526@aol.com Cc: questions@freebsd.org Subject: Re: FreeBSD 5.3 Network performance tests Message-ID: <Pine.NEB.3.96L.1041111222349.6545F-100000@fledge.watson.org> In-Reply-To: <82.1aea0101.2ec5090f@aol.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 11 Nov 2004 TM4526@aol.com wrote: > Given these results, I would conclude that the raw routing stack in 5.3 > is 35-40% slower than its 4.x counterpart. > > The tests are easy enough to duplicate, so there is no reason to > question the numbers. Feel free to try it yourself. Obviously different > Mobos and CPUs will yield different numbers, but my experience with this > test is that the "differences" between the OS versions are linearly > similar on different systems. (was just pointed at this thread, sorry if I missed other posts) FreeBSD 5.3 sees an observably higher per-packet processing costs than the 4.x branch due to in-progress changes to the synchronization and queueing models. Specifically, the SMPng work has changed the interrupt and synchronization models throughout the kernel in order to increase concurrency and preemptibility (i.e., lower latency in interrupt-based processing). However, this has increaseed the overall overhead of synchronization on the stack. The network stack forwarding path is particularly sensitive to this, so while other parts of the system see immediate concurrency benefits (i.e., socket-centric web servers that now see less contention on SMP, and more preemption on UP), this path still runs slower for many workloads. We're actively working to remedy this, and you will see changes merged to the 6.x and 5.x branches over the next couple of months that will cut into the numbers you see above by quite a bit. Off the top of my head, I would have expected to see more around a 15% overhead on UP for the workload you're seeing, but as you point out, results can and do vary. There are a number of tunables presenting in 5.3 that can improve performance, which you may want to explore: - net.isr.enable, which enables direct dispatch of the network stack from the ithread, rather than context switching to the netisr, which adds overhead. This is an experimental feature, but works quite well in a number of environments to lower both latency (time to process) and overhead (cost to process). There is a known bug in inbound UDP processing with multiple packet sources on 5.3 with net.isr.enable enabled (hence it being experimental), but I will be backporting the fix shortly. However, for your workload, this bug won't manifest, as it's in address processing for locally delivered packets, not for forwarded packets. - Make sure that if this is a UP box, you're compiling with a non-SMP kernel, as that substantially lowers the synchronization overhead. - Device polling, which eliminates the overhead of high rate interrupts, which can cause substantial context switching. - If your ethernet device supports interrupt coalescing but the thresholds are tuned wrong, you may be able to improve their tuning. - Disable entropy harvesting for ethernet devices and interrupts. There are optimizations present in 6.x that have not yet been backported that improve the overhead of entropy harvesting, but you can get the same benefits by disabling it. In your environment, it's likely not needed. I hope to backport these changes in a couple of weeks to 5-STABLE. - If other devices share the same IRQ with your ethernet devices, you may want to look at compiling out support for the devices. For example, I have a number of Dell boxes where the USB hardware uses the same interrupt as the ethernet device on the motherboard. The additional overhead associated with processing other devices is non-trivial, especially if the order of processing has changed in 5.x due to hardware probe order changes, ACPI, etc. Something I'd be interested in seeing you measure, since you have a specific test environment configured, is the incremental cost of adding a thousand firewall rules. The synchronize costs for firewall processing are based on entry to the firewall code, and don't apply to each rule. So you may find that while the cost of entering the first rule is higher in 5.x, the cost to process additional rules is the same or lower, due to other optimizations, compiler improvements, etc. You can find information on the on-going network performance work at the following locations: http://www.watson.org/~robert/freebsd/netperf/ I've just put a new web page online at: http://www.freebsd.org/projects/netperf/ However, that page has probably not been rebuilt on most of the web server mirrors yet, so it might take a day or two to become reachable. There's quite an active team working on the netperf work, so as I mentioned above, while there is additional overhead for some paths currently, you should see improvements in the near future pipeline. Packet bridging and packet forwarding are both considered critical optimization targets for 5.4 (and 5-STABLE before then). One of the things we would find most helpful is people with interesting and useful workloads who are able to measure the impact of change proposals to improve performance. So if you're able to use this test environment to help us test changes in the pipeline, it would be much appreciated. Thanks, Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1041111222349.6545F-100000>