From owner-freebsd-net@FreeBSD.ORG Mon Nov 10 01:40:13 2003 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 40E7616A4CF for ; Mon, 10 Nov 2003 01:40:13 -0800 (PST) Received: from mailtoaster1.pipeline.ch (mailtoaster1.pipeline.ch [62.48.0.70]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5ED1543FE1 for ; Mon, 10 Nov 2003 01:40:09 -0800 (PST) (envelope-from oppermann@pipeline.ch) Received: (qmail 61745 invoked from network); 10 Nov 2003 09:42:56 -0000 Received: from unknown (HELO pipeline.ch) ([62.48.0.53]) (envelope-sender ) by mailtoaster1.pipeline.ch (qmail-ldap-1.03) with SMTP for ; 10 Nov 2003 09:42:56 -0000 Message-ID: <3FAF5CD9.ADA58CAF@pipeline.ch> Date: Mon, 10 Nov 2003 10:39:37 +0100 From: Andre Oppermann X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Jonathan Mini References: <3FAE68FB.64D262FF@pipeline.ch> <3FAEC407.F10E7BA@pipeline.ch> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: mb@imp.ch cc: freebsd-current@freebsd.org cc: ume@freebsd.org cc: sam@errno.com cc: freebsd-net@freebsd.org Subject: Re: tcp hostcache and ip fastforward for review X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Nov 2003 09:40:13 -0000 Jonathan Mini wrote: > > On Nov 9, 2003, at 2:47 PM, Andre Oppermann wrote: > > > Jonathan Mini wrote: > >> > >> On Nov 9, 2003, at 8:19 AM, Andre Oppermann wrote: > >> > >>> - DoS attack 2: make MSS very low on local side of connection > >>> and send maaaany small packet to remote host. For every packet > >>> (eg. 2 bytes payload) a sowakeup is done to the listening > >>> process. Consumes a lot of CPU there. > >>> > >> > >> This sounds as if it might be worthwhile to add a delay to > >> the TF_NODELAY case for receive processing as well. > > > > Unfortunatly it is not that easy. We can't just do that unconditionally > > to all connections. It would probably break or delay many things. You > > never know how much data is outstanding and whether it's just this > > packet with 2 bytes outstanding... > > This would be disastrous to the performance of interactive > sockets, however theoretically those connections have > NODELAY set. My above comment is a bit confusing: I meant the > "non TF_NODELAY" case, that is when Nagling is enabled. > > In this situation, you would be delay an sowakeup until > either a timeout or SO_RCVLOWAT-set value was reached. The normal > SO_RCVLOWAT case delays until SO_RCVTIMEO is reached. I suppose > the application could simulate this with a large SO_RCVLOWAT and a > small SO_RCVTIMEO, but I was wondering about the effects of such a > change as part of !TF_NODELAY. To do this we need another callout to do the eventual wakeup if no further packet arrive within whatever/HZ. In addition it probably would make FreeBSD look bad in network benchmarks since this causes the connection RTT to go up. All in all I don't think it is worth adding this complexity. > Sadly, there's this PSH bit in the TCP header that's completely > unreliable and could be used for scenarios like this. > > > As an application aware of this problematic you have currently two > > options: use accept filters (FreeBSD only) or set SO_RCVLOWAT to some > > higher value than the default 1 byte. Only the first one is workable > > if you don't know what and how much the clients send to you. Relying > > on the application to activate any such option to prevent this kind > > of DoS is unfortunatly whishful thinking. > > I was not suggesting that we use this to counter an attack, only asking > if it might be a worthwhile performance optimization to consider. > This is an unlikely case (many small packets sent to a non-interactive > application), so I can't see the improvement as being globally useful. No, I don't think it is a worthwhile opimization. If the application wants to do it, it can do so already via socket options. Normally an application needs such a delay feature to be specific to it's message types. Like with http accept filter. > > The code I've put in here simply caps off the extreme cases. It > > counts all packets and bytes in any given second and computes the > > average payload size per packet. If that is less than we have defined > > for minmss it will reset and drop the connection. However it will only > > start to compute the average if there are more than 1'000 packets per > > second on the same tcp connection. I've chosen this quite high value > > to never disconnect any ligitimate connection which just happens to > > send many small packets. In my tests I've seen telnet/ssh sending > > close to 100 small packets per second (some large copy-pasting and > > cat'ing of many small files). Probably 500 packets per second is a > > better cut-off value but I just want to be sure to never hit a false > > positive. > > This is actually a small value for TCP connections which are being > used to forward messages, especially on gigabit links. > Heavily-intensive > web applications that are using small HTTP requests (pipelined inside a > persistent connection) to update small manipulations of state are > a good example of this. I wouldn't be surprised to see chatter > between SQL servers follow similar patterns. Applications which > use XML-based messaging often send several small packets per message, > which is unfortunate. Do you think such applications manage to send 1000 packets per second with less than 256 bytes payload per packet? I think the network code would collect some data to form a larger packet (unless TCP_NODELAY set)? > On the other hand, I'm used to looking at proxies, which are not > the general case. This is why the limits are tunable, after all. =) Is there way you could monitor such connections and compile some statistics how many small packets per second are sent? I could adjust the patch to just report the fact instead of dropping the connection. Could do it for 4.9-R too, it's fairly easy. -- Andre