From owner-freebsd-net@FreeBSD.ORG  Sun Nov  9 15:15:48 2003
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id C3D0516A4CE; Sun,  9 Nov 2003 15:15:48 -0800 (PST)
Received: from westhost42.westhost.net (westhost42.westhost.net
	[216.71.84.238])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7591243FB1; Sun,  9 Nov 2003 15:15:47 -0800 (PST)
	(envelope-from mini@freebsd.org)
Received: from [10.0.1.20] (12-228-13-123.client.attbi.com [12.228.13.123])
	by westhost42.westhost.net (8.11.6/8.11.6) with ESMTP id hA9NFjU20868;
	Sun, 9 Nov 2003 17:15:46 -0600
In-Reply-To: <3FAEC407.F10E7BA@pipeline.ch>
References: <3FAE68FB.64D262FF@pipeline.ch>
	<ACD9C291-12F7-11D8-87D8-000A95CD3CF8@freebsd.org>
	<3FAEC407.F10E7BA@pipeline.ch>
Mime-Version: 1.0 (Apple Message framework v606)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Message-Id: <A740BB86-130A-11D8-87D8-000A95CD3CF8@freebsd.org>
Content-Transfer-Encoding: 7bit
From: Jonathan Mini <mini@freebsd.org>
Date: Sun, 9 Nov 2003 15:15:48 -0800
To: Andre Oppermann <oppermann@pipeline.ch>
X-Mailer: Apple Mail (2.606)
cc: mb@imp.ch
cc: freebsd-current@freebsd.org
cc: ume@freebsd.org
cc: sam@errno.com
cc: freebsd-net@freebsd.org
Subject: Re: tcp hostcache and ip fastforward for review
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 09 Nov 2003 23:15:49 -0000


On Nov 9, 2003, at 2:47 PM, Andre Oppermann wrote:

> Jonathan Mini wrote:
>>
>> On Nov 9, 2003, at 8:19 AM, Andre Oppermann wrote:
>>
>>>   - DoS attack 2: make MSS very low on local side of connection
>>>     and send maaaany small packet to remote host. For every packet
>>>     (eg. 2 bytes payload) a sowakeup is done to the listening
>>>     process. Consumes a lot of CPU there.
>>>
>>
>> This sounds as if it might be worthwhile to add a delay to
>> the TF_NODELAY case for receive processing as well.
>
> Unfortunatly it is not that easy. We can't just do that unconditionally
> to all connections. It would probably break or delay many things. You
> never know how much data is outstanding and whether it's just this
> packet with 2 bytes outstanding...

This would be disastrous to the performance of interactive
sockets, however theoretically those connections have
NODELAY set. My above comment is a bit confusing: I meant the
"non TF_NODELAY" case, that is when Nagling is enabled.

In this situation, you would be delay an sowakeup until
either a timeout or SO_RCVLOWAT-set value was reached.  The normal
SO_RCVLOWAT case delays until SO_RCVTIMEO is reached.  I suppose
the application could simulate this with a large SO_RCVLOWAT and a
small SO_RCVTIMEO, but I was wondering about the effects of such a
change as part of !TF_NODELAY.

Sadly, there's this PSH bit in the TCP header that's completely
unreliable and could be used for scenarios like this.

> As an application aware of this problematic you have currently two
> options: use accept filters (FreeBSD only) or set SO_RCVLOWAT to some
> higher value than the default 1 byte. Only the first one is workable
> if you don't know what and how much the clients send to you. Relying
> on the application to activate any such option to prevent this kind
> of DoS is unfortunatly whishful thinking.

I was not suggesting that we use this to counter an attack, only asking
if it might be a worthwhile performance optimization to consider.
This is an unlikely case (many small packets sent to a non-interactive
application), so I can't see the improvement as being globally useful.

> The code I've put in here simply caps off the extreme cases. It
> counts all packets and bytes in any given second and computes the
> average payload size per packet. If that is less than we have defined
> for minmss it will reset and drop the connection. However it will only
> start to compute the average if there are more than 1'000 packets per
> second on the same tcp connection. I've chosen this quite high value
> to never disconnect any ligitimate connection which just happens to
> send many small packets. In my tests I've seen telnet/ssh sending
> close to 100 small packets per second (some large copy-pasting and
> cat'ing of many small files). Probably 500 packets per second is a
> better cut-off value but I just want to be sure to never hit a false
> positive.

This is actually a small value for TCP connections which are being
used to forward messages, especially on gigabit links.  
Heavily-intensive
web applications that are using small HTTP requests (pipelined inside a
persistent connection) to update small manipulations of state are
a good example of this.  I wouldn't be surprised to see chatter
between SQL servers follow similar patterns.  Applications which
use XML-based messaging often send several small packets per message,
which is unfortunate.

On the other hand, I'm used to looking at proxies, which are not
the general case.  This is why the limits are tunable, after all. =)

-- 
Jonathan Mini
mini@freebsd.org
http://www.freebsd.org