From owner-freebsd-hackers@FreeBSD.ORG  Sat Feb 28 20:56:29 2004
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 03FE816A4CE
	for <freebsd-hackers@freebsd.org>;
	Sat, 28 Feb 2004 20:56:29 -0800 (PST)
Received: from aries.ai.net (aries.ai.net [205.134.163.4])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6F2C743D2F
	for <freebsd-hackers@freebsd.org>;
	Sat, 28 Feb 2004 20:56:28 -0800 (PST)	(envelope-from deepak@ai.net)
Received: from ai.net (mikej@pool-151-200-114-38.res.east.verizon.net
	[151.200.114.38])
	by aries.ai.net (8.9.3/8.9.3) with ESMTP id XAA23891;
	Sat, 28 Feb 2004 23:56:23 -0500 (EST)	(envelope-from deepak@ai.net)
Message-ID: <404170F6.2060003@ai.net>
Date: Sat, 28 Feb 2004 23:56:22 -0500
From: Deepak Jain <deepak@ai.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.6) Gecko/20040113
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Don Bowman <don@sandvine.com>
References: <FE045D4D9F7AED4CBFF1B3B813C85337045D8307@mail.sandvine.com>
In-Reply-To: <FE045D4D9F7AED4CBFF1B3B813C85337045D8307@mail.sandvine.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-hackers@freebsd.org
Subject: Re: em0, polling performance, P4 2.8ghz FSB 800mhz
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 29 Feb 2004 04:56:29 -0000


Don Bowman wrote:

>>It was kindly pointed out that I didn't including the symptoms of the 
>>problem:
>>
>>
>>Without polling on, I get 70+% interrupt load, and I get live lock.
>>
>>With polling on, I start getting huge amounts of input errors, packet 
>>loss, and general unresponsiveness to the network. The web 
>>server on it 
>>doesn't respond though it occassionally will open the 
>>connection, just 
>>not respond. accept_filter on/off makes no difference. I have 
>>read other 
>>posts that say em systems can more >200kpps without serious incident.
>>
>>Thanks in advance,
>>
>>DJ
> 
> 
> You may need to increase the MAX_RXD inside your em driver to e.g. 512.

I didn't know if my card had a buffer bigger than the default 256. I can 
increase it, but I didn't know how to determine how big a MAX_RXD my 
card would support. When the system was under load, it was generating 
2xHZ clock ticks (2000 when HZ was 1000) is that normal?

> With similar system, em can handle ~800Kpps of bridging.

What settings did you use?

> Your earlier email showed a very large number of RST messages,
> which makes me suspect the blackhole actually wasn't enabled.
> 
> Not exactly sure what you're trying to do here. It sounds like
> you are trying to generate a SYN flood on port 80, and your listen
> queue is backing up. You've increased kern.ipc.somaxconn? Does your
> application specify a fixed listen queue depth? Could it be increased?
> Are you using apache as the server? Could you use a kqueue-enabled
> one like thttpd?

Using apache, might go to squid or thttpd. Didn't think it should make a 
big deal. Increased somaxconn. Basically the system is getting hammered 
(after all filtering at the router) with valid get requests on port 80.

> Have you checked net.inet.ip.intr_queue_drops?
> If its showing >0, check net.inet.ip.intr_queue_maxlen, perhaps
> increase it.

net.inet.ip.intr_queue_maxlen: 500
net.inet.ip.intr_queue_drops: 0
p1003_1b.sigqueue_max: 0

No intr drops.

> 
> Have you sufficient mbufs and clusters? netstat -m.
> 

1026/5504/262144 mbufs in use (current/peak/max):
         1026 mbufs allocated to data
1024/5460/65536 mbuf clusters in use (current/peak/max)
12296 Kbytes allocated to network (6% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

mbufs look fine.

> If you want to spend more time in kernel, perhaps change
> kern.polling.user_frac to 10?

I'll do that.
> 
> I might have HZ @ 2500 as well.
> 
> You could use ipfw to limit the damage of a syn flood, e.g.
> a keep-state rule with a limit of ~2-5 per source IP, lower the
> timeouts, increase the hash buckets in ipfw, etc. This would
> use a mask on src-ip of all bits.
> something like:
> allow tcp from any to any setup limit src-addr 2

This is a great idea. We were trapping those who crossed our connection 
thresholds and blackholing them upstream (automatically, with a script).


> 
> this would only allow 2 concurrent TCP sessions per unique
> source address. Depends on the syn flood you are expecting
> to experience. You could also use dummynet to shape syn
> traffic to a fixed level i suppose.
> 
> now... this will switch the DoS condition to elsewhere in
> the kernel, and it might not win you anything.
> net.inet.ip.fw.dyn_buckets=16384
> net.inet.ip.fw.dyn_syn_lifetime=5
> net.inet.ip.fw.dyn_max=32000
> 
> might be called for if you try that approach.
> 

I see where that should get us. We'll see.

Thanks!

DJ