From owner-freebsd-current  Thu Oct 18  7: 4:51 2001
Delivered-To: freebsd-current@freebsd.org
Received: from avocet.mail.pas.earthlink.net (avocet.mail.pas.earthlink.net [207.217.121.50])
	by hub.freebsd.org (Postfix) with ESMTP id 7010637B405
	for <freebsd-current@freebsd.org>; Thu, 18 Oct 2001 07:04:45 -0700 (PDT)
Received: from dialup-209.247.141.141.dial1.sanjose1.level3.net ([209.247.141.141] helo=mindspring.com)
	by avocet.mail.pas.earthlink.net with esmtp (Exim 3.33 #1)
	id 15uDmr-0003Tl-00; Thu, 18 Oct 2001 07:04:42 -0700
Message-ID: <3BCEE1A8.3D0AE7F0@mindspring.com>
Date: Thu, 18 Oct 2001 07:05:28 -0700
From: Terry Lambert <tlambert2@mindspring.com>
Reply-To: tlambert2@mindspring.com
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Mike Silbersack <silby@silby.com>
Cc: freebsd-current@freebsd.org
Subject: Re: Some interrupt coalescing tests
References: <20011017123225.B47595-100000@achilles.silby.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

Mike Silbersack wrote:
> What probably should be done, if you have time, is to add a bit of
> profiling to your patch to find out how it helps most.  I'm curious how
> many times it ends up looping, and also why it is looping (whether this is
> due to receive or transmit.)  I think knowing this information would help
> optimize the drivers further, and perhaps suggest a tact we haven't
> thought of.

On 960 megabits per second on a Tigon III (full wire speed,
non-jumbogram), the looping is almost entirely (~85%) on
the receive side.

It loops for 75% of the hardware interrupts in the LRP case
(reduction of interrupts from 12,000 to 8,000 -- 33%).

This is really expected, since in the LRP case, the receive
processing is significantly higher, and even in that case,
we are not driving the CPU to the wall in interrupt processing.

In the non-LRP case, the percentage drop in interrupt overhead
is ~10% (as has been observed by others).  THis makes sense,
too, if you consider that NETISR driving of receives means
less time in interrupt processing.  If we multiply the 15%
(100% - 85% = 15% in transmit) by 3 (12000/(12000-8000) =
100% / 33% = 3), then we get 45% in transmit in the non-LRP
case.

It would be nice if someone could confirm that slightly less
than 1/2 of the looping is on the transmit side for a non-LRP
kernel, but that's about what we should expect...


> > I don't know if anyone has tested what happens to apache in
> > a denial of service attack consisting of a huge number of
> > partial "GET" requests that are incomplete, and so leave state
> > hanging around in the HTTP server...
> 
> I'm sure it would keel over and die, since it needs a process
> per socket.  If you're talking about sockets in TIME_WAIT or
> such, see netkill.pl.

I was thinking in terms of connections not getting dropped.

The most correct way to handle this is probably an accept
filter for <CR><LF><CR><LF>, indicating a complete GET
request (still leaves POST, though, which has a body), with
dropping of long duration incomplete requests.  Unfortunately,
without going into the "Content-Length:" parsing, we are
pretty much screwed on POST, and very big POSTs still screw
you badly (imagine a "Content-Length: 1000000000").  You
can mitigate that by limiting request size, but you are
still talking about putting HTTP parsing in the kernel,
above and beyond simple accept filters.

I'm really surprised abuse of the HTTP protocol itself in
denial of service attacks isn't more common.


> > Yes.  Floyd and Druschel recommend using high and low
> > watermarks on the amount of data pending processing in
> > user space.  The most common approach is to use a fair
> > share scheduling algorithm, which reserves a certain
> > amount of CPU for user space processing, but this is
> > somewhat wasteful, if there is no work, since it denies
> > quantum to the interrupt processing, potentially wrongly.
> 
> I'm not sure such an algorithm would be wasteful - there must be data
> coming in to trigger such a huge amount of interrupts.  I guess this would
> depend on how efficient your application is, how you set the limits, etc.

Yes.  The "waste" comment is aimed at the idea that you
will most likely have a heterogeneous loading, so you can
not accurately predict ahead of time that you will spend
80% of your time in the kernel, and 20% processing in user
space, or whatever ratio you come up with.  This becomes
much more of an issue when you have an attack, which will,
by definition, end up being asymmetric.

In practice, however, no one out there has a pipe size in
excess of 400 Mbits outside of a lab, so most people never
really need 1Gbit of throughput, anyway.  If you can make
your system handle full wire speed for 1Gbit, you are pretty
much safe from any attack that someone might want to throw
at you, at least until the pipes get larger.

Even ignoring this, there's a pretty clear off the shelf
hardware path to a full 10 gigabits, with PCI-X (8 gigabits
times 2 busses gets you there, which is 25 times the largest
UUNet hosting center pipe size today).

Fair share is more a problem for slower interfaces without
hardware coelescing, and software is an OK band-aid for
them (IMO).

I suspect that you will want to spend most of your CPU time
doing processing, rather than interrupt handing, in any case.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message