From owner-freebsd-current  Sat Oct 13 23:30: 8 2001
Delivered-To: freebsd-current@freebsd.org
Received: from swan.mail.pas.earthlink.net (swan.mail.pas.earthlink.net [207.217.120.123])
	by hub.freebsd.org (Postfix) with ESMTP id C948C37B401
	for <freebsd-current@freebsd.org>; Sat, 13 Oct 2001 23:29:56 -0700 (PDT)
Received: from mindspring.com (dialup-209.247.143.225.Dial1.SanJose1.Level3.net [209.247.143.225])
	by swan.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id XAA12529;
	Sat, 13 Oct 2001 23:29:50 -0700 (PDT)
Message-ID: <3BC93111.E0804A37@mindspring.com>
Date: Sat, 13 Oct 2001 23:30:41 -0700
From: Terry Lambert <tlambert2@mindspring.com>
Reply-To: tlambert2@mindspring.com
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Mike Silbersack <silby@silby.com>
Cc: freebsd-current@freebsd.org
Subject: Re: Some interrupt coalescing tests
References: <20000101045805.G4587-200000@patrocles.silby.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

Mike Silbersack wrote:
> Well, I've been watching everyone argue about the value of interrupt
> coalescing in the net drivers, so I decided to port terry's patch to 4.4 &
> -current to see what the results are.

Thanks!


> The network is 100mbps, switched.  To simulate load, I used a syn flooder
> aimed at an unused port.  icmp/rst response limiting was enabled.
> 
> With the -current box attacking the -stable box, I was able to notice a
> slight drop in interrupts/second with the patch applied.  The number of
> packets was ~57000/second.
> 
> Before: ~46000 ints/sec, 57-63% processor usage due to interrupts.
> After: ~38000 ints/sec, 50-60% processor usage due to interrupts.
> 
> In both cases, the box felt responsive.

One issue to be careful of here is that the removal of the
tcptmpl actually causes a performance hit that wasn't there
in the 4.3 code.  My original complaint about tcptmpl taking
up 256 instead of 60 bytes stands, but I'm more than half
convinced that making it take up 60 bytes is OK... or at
least is more OK than allocating and deallocating each time,
and I don't yet have a better answer to the problem.  4.3
doesn't have this change, but 4.4 does.


> With the -stable box attacking the -current box, the patch made no
> difference.  The box bogged down at only ~25000 ints/sec, and response
> limiting reported the number of packets to be ~44000/second.
> 
> I'm not sure if the number was lower because the celeron couldn't run the
> flooder as quickly, or if the -current box was dropping packets.  I
> suspect the latter, as the -current box was NOTICEABLY slowed down; I
> could watch systat refresh the screen.

This is unfortunate; it's an effect that I expected with
the -current code, because of the change to the interrupt
processing path.

To clarify here, the slowdown occurred both with and without
the patch, right?

The problem here is that when you hit livelock (full CPU
utilization), then you are pretty much unable to do anything
at all, unless the code path goes all the way to the top of
the stack.


> The conclusion?  I think that the dc driver does a good enough job of
> grabbing multiple packets at once, and won't be helped by Terry's patch
> except in a few very cases.

10% is a good improvement; my gut feeling is that it would
have been less than that.  This is actually good news for
me, since it means that my 30% number is bounded by the
user space program not being run (in other words, I should
be able to get considerably better performance, using a
weighted fair share scheduler).  As long as it doesn't
damage performance, I think that it's proven itself.


> In fact, I have a sneaky suspicion that Terry's patch may
> increase bus traffic slightly.  I'm not sure how much of
> an issue this is, perhaps Bill or Luigi could comment.

This would be interesting to me, as well.  I gave Luigi an
early copy of the patch to play with a while ago, and also
copied Bill.

I'm interested in how you think it could increase traffic;
the only credible reason I've been able to come up with is
the ability to push more packets through, when they would
otherwise end up being dropped because of the queue full
condition -- if this is the case, the bus traffic is real
work, and not additonal overhead.

If you weren't getting any packets, or had a very slow
packet rate, it might increase bus traffic, in that doing
an extra check might always return a negative response (in
the test case in question, that's not true, since it's not
doing more work than it would with the same load, using
interrupts to trigger the same bus traffic.  Note that it
is only a consideration in the case that there is bus
traffic involved when polling an empty ring to see if DMA
has been done to a particular mbuf or cluster, so it takes
an odd card for it to be a problem.


> In short, if we're going to try to tackle high interrupt load,
> it should be done by disabling interrupts and going to polling
> under high load;

I would agree with this, except that it's only really a
useful observation if FreeBSD is being used as purely a
network processor.  Without interrupts, the polling will
take a significant portion of the available CPU to do, and
you can't burn that CPU if, for example, you have an SSL
card that does your handshakes, but you need to run the SSL
sessions themselves up in user space.

For example, the current ClickArray "Array 1000" product
does around 700 1024 bit SSL connection setups a second, and,
since it uses a Broadcom card, the card is only doing the
handshaking, and not the rest of the crypto processing.  The
crypto stream processing has to be done in user space, in
the SSL proxy code living there, and as such, would suffer
from doing poling.


> the patch proposed here isn't worth the extra complexity.

I'd argue that the complexity is coming, no matter what.  If
you seperate out the tx_eof and rx_eof entry points, and
externalize them into the ethernet driver interface, in order
to enable polling, you are going to need to have a return
value on them, as well.

To implement scheduling, this return value is going to need
to give a packet count, so that you can forego polling every
N packets (N > fair share threshold), or else you are not
going to be able to do any additional processing.

The if_dc driver is problematic because of its organization;
if you look at the if_ti driver, or try to apply the same
idea to the if_tg driver, it becomes 16 lines of code; to
externalize the interfaces and make the necessary changes,
without adding the while loop into the ISR, you are talking
60 lines of code (including structure changes to support the
new entry points, and excluding code reorganization for the
other cards).

Also, there are a number of cards that will not transfer
additional data until the interrupt is acknowledged.  You
would need to go to pure polling, or you would need to not
do polling at all on those cards.


NB: If you are interested in pure connection rate, and you
want to protect against SYN flood, then your best bet is
actually to simply put a SYN-cookie implementation into the
firmware for the card, and deal with connection setup that
way.  With that approach, you should be able to easily
support a quarter million connections a second.


> I suppose this would all change if we were using LRP and doing lots of
> processing in the interrupt handler... but we aren't.

This is probably a back-handed poke at me not making the
code available.  I have to either clear it with my employer,
or reimplement it from scratch (not hard; I could probably
improve it significantly, were I to do this, knowing what I
now know).  I'm in the process of getting an approval list
together.

Even so, 10% is nothing to sneeze at...

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message