From owner-freebsd-net Fri Apr 5 16:31:34 2002 Delivered-To: freebsd-net@freebsd.org Received: from harrier.prod.itd.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12]) by hub.freebsd.org (Postfix) with ESMTP id 5C8E537B404; Fri, 5 Apr 2002 16:31:23 -0800 (PST) Received: from pool0399.cvx21-bradley.dialup.earthlink.net ([209.179.193.144] helo=mindspring.com) by harrier.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16te6x-0002If-00; Fri, 05 Apr 2002 16:31:19 -0800 Message-ID: <3CAE41BE.8AD65DC6@mindspring.com> Date: Fri, 05 Apr 2002 16:30:54 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Bill Fenner Cc: bmah@FreeBSD.ORG, gallatin@cs.duke.edu, net@FreeBSD.ORG Subject: Re: IP fragmentation (was Re: Fatal trap 12: page fault while in kernel mode) References: <20020403181854.I42720-100000@angui.sh> <15532.29114.310072.957330@grasshopper.cs.duke.edu> <200204050504.g355493C001200@intruder.bmah.org> <15533.46222.49598.958821@grasshopper.cs.duke.edu> <3CADE0E7.ED472650@mindspring.com> <15533.57961.725030.692387@grasshopper.cs.duke.edu> <200204052120.g35LKW00034174@intruder.bmah.org> <200204052310.PAA21604@windsor.research.att.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Bill Fenner wrote: > >Just for the heck of it, I started reading through ip_input.c to see how > >hard this would be to do. Haven't got there yet, I saw something odd: > >the variables ip_nfragpackets and nipq look *awfully* similar. > > So do the commit logs for the revisions in which each was introduced. > > Revision 1.65 - Mon Sep 15 23:07:01 1997 UTC (4 years, 6 months ago) by ache > > Prevent overflow with fragmented packets > > vs. > > Revision 1.169 - Sun Jun 3 23:33:23 2001 UTC (10 months ago) by jesper > > Prevent denial of service using bogus fragmented IPv4 packets. > > so I think you're right, that they're both meant to do the same thing > but neither is doing what they intended. I thought about this for a while, after Bruce said he was looking into it. There are some implicit problems that I don't know if it's really possible to resolve satisfactorily. If you drop fragments for whatever reason, in order to prevent overflow, just random dropping leads to "almost full" reassembly queues... and you don't want that. If you do it preferentially, based on what's already in there, then you end up assembling things in such a way that an attack can intentionally stick N-1 out of N packets into the queue for the full timeout period... and you don't want that. It seems to me that you need to disallow packets whose agregate size would be "too big" -- over some configurable limit. UDP datagrams larger than the max MTU strike me as "too big", but I'm sure someone will tell me why DNS switches to TCP, instead of using really big UDP datagrams? Preferential dropping gives you a similar deadlock, since the algorithm must permit you to not drop the Kth frag in a set of N, as K -> N. So you can't simply red-queue. If you LRU the frag sets, then that means that you will be penalizing the high latency links. In my experience, it is the humans on the other end of high latency links that have the patience of Job: they will try forever, until they get through. So dropping these means that you will up your overall pool retention time for packets from these people. If you don't LRU the frag sets, then that means that you will open youself to attack by intentional pacing, where frags are sent slowly enough to keep the drop timer reset in time to prevent dropping (TTL of a frag in the reassembly queue). This looks like a lively area for real research. My gut tells me that there should be two tiers: after you hit the second tier, then you need to drop any fragments for new frag sets, and at the top, you need to LRU out the total frag set. This implies two timers: an overall reassembly lifetime, which can not be exceeded (a frag set TTL), and a idle time wherein no mor fargs were received (a no new frags TTL). The second already exists. Really, you have to treat the fragment reassembly buffers as if they were external. You allocate the resources to them, and then forget the resources. What you really want is to act, logically, as if you have a traffic normalizer between you and the source of the traffic. I don't know how you would account for differential paths, should they result in further fragmentation. 8-(. But an approximation would be to precommit based on the number of frags expected in a frag chain. So getting frag 1 of 10 means that you have a committed quota usage of 10, even if you only have 1 frag in there right now. This is where I think the current algorithm is falling down. As I said, a nice area for research. Anyone looking for a Masters Thesis topic? -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message