Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Nov 2004 22:36:52 +0000 (GMT)
From:      Robert Watson <rwatson@freebsd.org>
To:        TM4526@aol.com
Cc:        questions@freebsd.org
Subject:   Re: FreeBSD 5.3 Network performance tests
Message-ID:  <Pine.NEB.3.96L.1041111222349.6545F-100000@fledge.watson.org>
In-Reply-To: <82.1aea0101.2ec5090f@aol.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Thu, 11 Nov 2004 TM4526@aol.com wrote:

> Given these results, I would conclude that the raw routing stack in 5.3
> is 35-40% slower than its 4.x counterpart. 
> 
> The tests are easy enough to duplicate, so there is no reason to
> question the numbers. Feel free to try it yourself. Obviously different
> Mobos and CPUs will yield different numbers, but my experience with this
> test is that the "differences" between the OS versions are linearly
> similar on different systems. 

(was just pointed at this thread, sorry if I missed other posts)

FreeBSD 5.3 sees an observably higher per-packet processing costs than the
4.x branch due to in-progress changes to the synchronization and queueing
models.  Specifically, the SMPng work has changed the interrupt and
synchronization models throughout the kernel in order to increase
concurrency and preemptibility (i.e., lower latency in interrupt-based
processing).  However, this has increaseed the overall overhead of
synchronization on the stack.  The network stack forwarding path is
particularly sensitive to this, so while other parts of the system see
immediate concurrency benefits (i.e., socket-centric web servers that now
see less contention on SMP, and more preemption on UP), this path still
runs slower for many workloads.  We're actively working to remedy this,
and you will see changes merged to the 6.x and 5.x branches over the next
couple of months that will cut into the numbers you see above by quite a
bit.  Off the top of my head, I would have expected to see more around a
15% overhead on UP for the workload you're seeing, but as you point out,
results can and do vary.

There are a number of tunables presenting in 5.3 that can improve
performance, which you may want to explore:

- net.isr.enable, which enables direct dispatch of the network stack from
  the ithread, rather than context switching to the netisr, which adds
  overhead.  This is an experimental feature, but works quite well in a
  number of environments to lower both latency (time to process) and
  overhead (cost to process).  There is a known bug in inbound UDP
  processing with multiple packet sources on 5.3 with net.isr.enable
  enabled (hence it being experimental), but I will be backporting the fix
  shortly.  However, for your workload, this bug won't manifest, as it's
  in address processing for locally delivered packets, not for forwarded
  packets.

- Make sure that if this is a UP box, you're compiling with a non-SMP
  kernel, as that substantially lowers the synchronization overhead.

- Device polling, which eliminates the overhead of high rate interrupts,
  which can cause substantial context switching.

- If your ethernet device supports interrupt coalescing but the thresholds
  are tuned wrong, you may be able to improve their tuning.

- Disable entropy harvesting for ethernet devices and interrupts.  There
  are optimizations present in 6.x that have not yet been backported that
  improve the overhead of entropy harvesting, but you can get the same
  benefits by disabling it.  In your environment, it's likely not needed. 
  I hope to backport these changes in a couple of weeks to 5-STABLE. 

- If other devices share the same IRQ with your ethernet devices, you may
  want to look at compiling out support for the devices.  For example, I
  have a number of Dell boxes where the USB hardware uses the same
  interrupt as the ethernet device on the motherboard.  The additional
  overhead associated with processing other devices is non-trivial,
  especially if the order of processing has changed in 5.x due to hardware
  probe order changes, ACPI, etc.

Something I'd be interested in seeing you measure, since you have a
specific test environment configured, is the incremental cost of adding a
thousand firewall rules.  The synchronize costs for firewall processing
are based on entry to the firewall code, and don't apply to each rule.  So
you may find that while the cost of entering the first rule is higher in
5.x, the cost to process additional rules is the same or lower, due to
other optimizations, compiler improvements, etc. 

You can find information on the on-going network performance work at the
following locations:

  http://www.watson.org/~robert/freebsd/netperf/

I've just put a new web page online at:

  http://www.freebsd.org/projects/netperf/

However, that page has probably not been rebuilt on most of the web server
mirrors yet, so it might take a day or two to become reachable.

There's quite an active team working on the netperf work, so as I
mentioned above, while there is additional overhead for some paths
currently, you should see improvements in the near future pipeline. Packet
bridging and packet forwarding are both considered critical optimization
targets for 5.4 (and 5-STABLE before then).  One of the things we would
find most helpful is people with interesting and useful workloads who are
able to measure the impact of change proposals to improve performance.  So
if you're able to use this test environment to help us test changes in the
pipeline, it would be much appreciated.

Thanks,

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Principal Research Scientist, McAfee Research



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1041111222349.6545F-100000>