From owner-freebsd-smp  Wed Apr  3 12:35:20 2002
Delivered-To: freebsd-smp@freebsd.org
Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1016C37B416; Wed,  3 Apr 2002 12:34:47 -0800 (PST)
Received: from pool0355.cvx40-bradley.dialup.earthlink.net ([216.244.43.100] helo=mindspring.com)
	by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #1)
	id 16srSs-0000li-00; Wed, 03 Apr 2002 12:34:42 -0800
Message-ID: <3CAB674A.F6BC6F9D@mindspring.com>
Date: Wed, 03 Apr 2002 12:34:18 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Robert Watson <rwatson@FreeBSD.org>
Cc: Stefan Saroiu <tzoompy@cs.washington.edu>,
	freebsd-smp@FreeBSD.org
Subject: Re: Interrupt Handlers and Multiple CPUs
References: <Pine.NEB.3.96L.1020403111230.47067B-100000@fledge.watson.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-smp.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-smp>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-smp>
X-Loop: FreeBSD.org

Robert Watson wrote:
> On Tue, 2 Apr 2002, Stefan Saroiu wrote:
> > The application still gets 20% of the CPU which is quite good actually.
> > Although I'm not familiar with Druschel's work, I'm not sure whether
> > better scheduling will help me here.
> >
> > I've been toying with the idea to change the driver to raise interrupts
> > only once every 100 packets or something like that. Currently it is 1
> > interrupt per 1 packet.
> 
> Ouch.  No wonder you're having problems.  You definitely want to implement
> one of coalesced interrupt delivery or polled device access.  In theory,
> we have both in 4.x and 5.x, but support for coalesced delivery is on a
> per-card basis.  5.x will allow you do to the kinds of things you want
> (eventually) once the network stack is fine-grained enough, but it sounds
> like the big problem is the driver model.  I believe fxp and em drivers
> support this, and might be a good model to look at.  You might want to
> consider posting to freebsd-net for pointers in this space.

I missed this part.

Your best bet is coelesced interrupt delivery in hardware,
which should be handled in all gigabit drivers already.

I personally provided patches for soft interrupt coelescing
for some of the more popular 10/100 drivers.

Basically, it takes advantage of Bill Paul's seperation of
interrupt processing into rx_eof and tx_eof routines, and
then adds a return code to indicate if they've done any
work.  If they have, it recalls them from the interrupt
routine until there is no more work to do.  You can put a
high watermark on it by counting the number of times that
it does this, and bailing, if that's all it's doing.

In addition, you might want to look at Luigi Rizzo's patcehs
for polling, and John Lemons patches for diminishing the
NETISR requirements by processing some operations until
completion at interrupt time (a "poor man's LRP").  Both of
these are in -current (5.x).  The problem with both John and
Luigi's patches when used with a user space program is that
they do not provide weighted fair share scheduling to ensure
that a user space program that requires an arbitrary number
of cycles to run gets the CPU time it needs.  In effect, you
must manually tune the CPU time ratio so that the user space
program doesn't starvel, and likewise so that you don't stall
the kernel polling of packets "because it's time to do user
space processing", if you don't have any user space processing
to do.

The Druschel and Aron references I made specifically deal
with this issue the way the Mogul reference I made suggests
that you deal with it, which is to measure the queue depth
to user space, and disable interrupts when it hits a high
watermark, and reenable them when it hits a low watermark
(indicating that the user space application has processed
down the data to be processed).

Using LRP itself has the additional benefit of removing
latency from the processing path.  The improved performance
is by about a factor of 4 (measured, it went from 7,000
connections a second to 32,000 connections a second,
processing through the TCP stack to completion, as opposed
to at NETISR, and that's without the SYN cache stuff).

If you simply don't have time to do this work, because your
work is tangentially related (i.e. you need the performance
in order to investigate a network on wich other research is
taking place), then FreeBSD may not be the answer.

You could apply the LRP patches to FreeBSD rather trivially;
here are the patches for FreeBSD 4.3 (these are a port of
the Rice University code, made by Duke University):

	http://www.cs.duke.edu/~anderson/freebsd/rescon/

I dislike resource containers, because they are primarily an
accounting niceity which has no bearing on an embedded system
application, other than to slow it down.


If you want to ignore this problem entirely, then you might
want to use QLinux, instead.  QLinux uses a number of the
ideas I've noted so far in this thread (plus a couple others,
which I personally consider to be bad ideas) to improve the
overall performance.

Here is a pointer to QLinux:

	http://lass.cs.umass.edu/~lass/software/qlinux/

In general, if you hit performance problems with QLinux,
throwing more CPUs at the problem isn't going to help you
out any more there than it does in FreeBSD.  The problem
you have is not ammenable to being forcibly scaled by being
spammed with more hardware.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message