From owner-freebsd-net@FreeBSD.ORG  Wed Nov 21 07:32:47 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 63578F20
 for <freebsd-net@freebsd.org>; Wed, 21 Nov 2012 07:32:47 +0000 (UTC)
 (envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
 by mx1.freebsd.org (Postfix) with ESMTP id ACD4E8FC12
 for <freebsd-net@freebsd.org>; Wed, 21 Nov 2012 07:32:46 +0000 (UTC)
Received: (qmail 516 invoked from network); 21 Nov 2012 09:05:27 -0000
Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2])
 (envelope-sender <andre@freebsd.org>)
 by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
 for <khatfield@socllc.net>; 21 Nov 2012 09:05:27 -0000
Message-ID: <50AC8393.3060001@freebsd.org>
Date: Wed, 21 Nov 2012 08:32:35 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:16.0) Gecko/20121026 Thunderbird/16.0.2
MIME-Version: 1.0
To: khatfield@socllc.net
Subject: Re: FreeBSD boxes as a 'router'...
References: <1353448328.76219.YahooMailClassic@web121602.mail.ne1.yahoo.com>	<E1F4816E-676C-4630-9FA1-817F737D007D@netgate.com>	<50AC08EC.8070107@mu.org>	<832757660.33924.1353460119408@238ae4dab3b4454b88aea4d9f7c372c1.nuevasync.com>
 <CAJ-Vmok8Ybdi+Y8ZguMTKC7+F5=OxVDog27i4UgY-s3MCZkGcQ@mail.gmail.com>
 <250266404.35502.1353464214924@238ae4dab3b4454b88aea4d9f7c372c1.nuevasync.com>
In-Reply-To: <250266404.35502.1353464214924@238ae4dab3b4454b88aea4d9f7c372c1.nuevasync.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Barney Cordoba <barney_cordoba@yahoo.com>,
 Adrian Chadd <adrian@freebsd.org>, Alfred Perlstein <bright@mu.org>,
 Jim Thompson <jim@netgate.com>,
 "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Nov 2012 07:32:47 -0000

On 21.11.2012 03:16, khatfield@socllc.net wrote:
> I may be misstating.
>
> Specifically under high burst floods either routed or being dropped by pf we would see the system
> go unresponsive to user-level applications / SSH for example.
>
> The system would still function but it was inaccessible. To clarify as well this was any number
> of floods or attacks to any ports, the behavior remained. (These were not SSH ports being hit)

I'm working on a hybrid interrupt/polling with live-lock prevention
scheme in my svn branch.  It works with a combination of disabling
interrupts in interrupt context and then having an ithread loop over
the RX DMA queue until it reaches the hardware and is done.  Only
then interrupts are re-enabled again.  On a busy system it may never
go back to interrupt.  To prevent live-lock the ithread gives up the
CPU after a normal quantum to let other threads/processes run as well.
After that it gets immediately re-scheduled again with a sufficient
high priority not get starved out by userspace.

With multiple RX queues and MSI-X interrupts as many ithreads as available
cores can be run and none of them will live-lock.  I'm also looking at
using the CoDel algorithm for totally maxed out systems to prevent long
FIFO packet drop chains in the NIC.  Think of it as RED queue management
but for the input queue.  That way we can use distributed single packet
loss as a signalling mechanism for the senders to slow down.  For a
misbehaving sender blasting away this obviously doesn't help much.  It
improves the chance of good packets making it through though.

While live-lock prevention is good you still won't be able to log in
via ssh through an overloaded interface.  Any other interface will
work w/o packet loss instead.

So far I've fully converted fxp(4) to this new scheme because it is one
of the simpler drivers with sufficient documentation.  And 100Mbit is
easy to saturate.

The bge(4) driver is mostly converted but not tested due to lack of
hardware, which should arrive later this week though.

The em(4), and with it due to similarity igb(4) and ixgbe(4) family,
is in the works as well.  Again hardware is on the way for testing.

When this work has stabilized I'm looking for testers to put it through
the paces.  If you're interested and have a suitable test bed then drop
me an email to get notified.

-- 
Andre

> Now we did a lot of sysctl resource tuning to correct this with some floods but high rate would
> still cause the behavior. Other times the system would simply drop all traffic (like a buffer
> filled or max connections) but it was not either case.
>
> The attacks were also well within bandwidth capabilities for the pipe and network gear.
>
> All of these issues stopped upon adding polling or the overall threshold was increased
> tremendously with polling.
>
> Yet, polling has some downsides not necessarily due to FreeBSD but application issues. Haproxy is
> one example where we had handshake/premature connections terminated with polling. Those issues
> were not present with polling disabled.
>
> So that is my reasoning for saying that it was perfect for some things and not for others.
>
> In the end, we spent years tinkering and it was always satisfactory but never perfect. Finally we
> grew to the point of replacing the edge with MX80's and left BSD to load balancing and the like.
> This finally resolved all issues for us.
>
> Albeit, we were a DDoS mitigation company running high PPS and lots of bursting. BSD was
> beautiful until we ended up needing 10Gps+ on the edge and it was time to go Juniper.
>
> I still say BSD took us from nothing to a $30M company. So despite something's requiring
> tinkering with I think it is still worth the effort to put in the testing to find what is best
> for your gear and environment.
>
> I got off-track but we did find one other thing. We found ipfw did seem to reduce load on the
> interrupts (likely because we couldn't do near the scrubbing with it vs pf) at any rate less
> filtering may also fix the issue with the op.
>
> Your forwarding - we found doing forwarding via a simple pf rule and a GRE tunnel to an app
> server or by using a tool like haproxy on the router itself seemed to reduce a large majority of
> our original stability issues (verses pure fw-based packet forwarding)
>
> *I also agree because as I mentioned in a previous email... (To me) our overall PPS seemed to
> decrease from FBSD 7 to 9. No idea why but we seemed to begin having less effect with polling as
> we seemed to get with polling on 7.4.
>
> Not to say that this wasn't due to error on our part  or some issue with the Juniper switches but
> we seemed to just run into more issues with newer releases when it came to performance with Intel
> 1Gbps NIC's. this later caused us to move more app servers to Linux because we never could get to
> the bottom of some of those things. We do intend to revisit BSD with our new CDN company to see
> if we can restandardize it for high volume traffic servers.
>
> Best, Kevin
>
>
>
> On Nov 20, 2012, at 7:19 PM, "Adrian Chadd" <adrian@freebsd.org> wrote:
>
>> Ok, so since people are talking about it, and i've been knee deep in at least the older intel
>> gige interrupt moderation - at maximum pps, how exactly is the interrupt moderation giving you
>> a livelock scenario?
>>
>> The biggest benefit I found when doing some forwarding work a few years ago was to write a
>> little daemon that actually sat there and watched the interrupt rates and packet drop rates
>> per-interface - and then tuned the interrupt moderation parameters to suit. So at the highest
>> pps rates I wasn't swamped with interrupts.
>>
>> I think polling here is hiding some poor choices in driver design and network stack design..
>>
>>
>>
>> adrian
> _______________________________________________ freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
>
>