From owner-freebsd-arch  Thu Jun 20 19:48:11 2002
Delivered-To: freebsd-arch@freebsd.org
Received: from scaup.mail.pas.earthlink.net (scaup.mail.pas.earthlink.net [207.217.120.49])
	by hub.freebsd.org (Postfix) with ESMTP id BB33437B40E
	for <freebsd-arch@freebsd.org>; Thu, 20 Jun 2002 19:47:52 -0700 (PDT)
Received: from pool0544.cvx21-bradley.dialup.earthlink.net ([209.179.194.34] helo=mindspring.com)
	by scaup.mail.pas.earthlink.net with esmtp (Exim 3.33 #2)
	id 17LESk-0000n0-00; Thu, 20 Jun 2002 19:47:50 -0700
Message-ID: <3D1293AE.FDEC441D@mindspring.com>
Date: Thu, 20 Jun 2002 19:47:10 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Gary Thorpe <gat7634@hotmail.com>
Cc: freebsd-arch@freebsd.org
Subject: Re: multiple threads for interrupts
References: <F112l4rhYQYx3G5aYY6000252ae@hotmail.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG

Gary Thorpe wrote:
> >Seigo Tanimura wrote:
> > > One solution is to run multiple threads for each of the interrupt
> > > types.  Since I noticed this issue first during my work of network
> > > locking, I have been tweaking the swi subsystem so that it runs
> > > multiple threads for an swi type.  For those who are interested, the
> > > patch can be found at:
> > >
> > > http://people.FreeBSD.org/~tanimura/patches/swipool.diff.gz
> >
> >Benchmarks before and after, demonstrating an improvement?
> >
> >-- Terry
> 
> I am not a kernel programmer, but I have read a paper which concludes that
> making threads have an "affinity" or "stickiness" to the last CPU it was run
> on is benifical because it leads to less cache flushing/refilling. Maybe
> this will be a factor in having multiple threads for interrupt handling?

THat's a general scheduling problem.  The solution is well known,
and implemented in Dynix, then IRIX, now Linux.  Alfred Perlstein
has some patches that take it most of the way there, but leave an
interlock, by maintaining a global queue.

The solution I'm talking about is per CPU scheduling queues, where
threads are only migrated between scheduling queues under extraordinary
conditiona, so most of the scheduling never requires the locking the
FreeBSD-current has today.

This solves the affinity problem.

I'm not sure the affinity fix solves the NETISR problem, because I
think that the issue there is that the affinity you want in that
case is mbuf<->cpu affinity.  Basically, if you take the network
interrupt on CPU 3, then you want to run the NETISR code that is
associated with the protocol processing on CPU 3, as well, to avoid
cache busting.

The way I would suggest doing this is to run the protocol processing
up to user space at interrupt time (LRP).  This gets rid of NETISR.

A lot of people complain that this won't allow you to receive as
many packets in a given period of time.  They are missing the fact
that this only affect the burst rate until poll saturation occurs,
at which point in time the number of packets that you receive is in
fact clocked by buffer availability, and buffer availability is
clocked by the ability of NETISR to process the packets up to the
user space boundary, and that in turn is clocked by the ability to
process the packets out to the user space programs on the other end
of the sockets.

What this all boils down to is that you should only permit receive
data interrupts to occur at the rate that you can move the data from
the wire, all the way through the system, to completion.

The feedback process in the absence of available mbufs is to take
the interrupt, and then replace the contents of the mubuf receive
buffer ring with the new contents.  The mbuf's only ever get pushed
up the stack if there is a replacement mbuf allocable from the
system in order to put on the ring in place of the received mbuf.
Effectively, we are talking about receive ring overflow here.

If you trace the dependency graph on mbuf availability all the
way to user space, you will see that if you are receiving packets
faster than you can process them, then you end up spending all
your time servicing interrupts, and that takes away from your
time to actually push data through.

Jeff Mogul of DEC Western Research Laboratories described this as
"receiver livelock" back early in the last decade.

Luigi's and Jon Lemon's work only partially mitigates the problem.
Turning off interrupts doesn't deal with the NETISR triggering,
which only occurs when you splx() down from a hardware interrupt
level so that the SWL list is run.  Running the packets on partially
up the stack doesn't resolve the problems up to the user/kernel
barrier.  So both are only partial solutions.


I'm convinced that CPU affinity needs to happen.

I'm also convinced that, for the most part, running NETISR in
kernel threads, rather than to completion at interrupt, is the
wrong way to go.

I'm currently agnostic on the idea of whether interrupt threads
will help in areas outside of networking.  My instinct is that
the added contention will mean that they will not.  I'm reserving
judgement pending seeing real benchmarks.

To me, it looks like a lot of people are believing something is
better because they are being told that it is better, not because
they have personally measured it, gotten better numbers, and have
proven to themselves that those better numbers were a result of
what they thought they were measuring, rather than an artifact
that could be exploited in the old code, as well.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message