Date: Sun, 22 Sep 1996 01:54:12 +0930 (CST) From: newton@communica.com.au (Mark Newton) To: tweten@frihet.com Cc: newton@communica.com.au, imp@village.org, spfarrel@midway.uchicago.edu, security@freebsd.org Subject: Re: comments on the SYN attack Message-ID: <9609211624.AA11272@communica.com.au> In-Reply-To: <199609211220.FAA06633@ns.frihet.com> from "David E. Tweten" at Sep 21, 96 05:20:38 am
next in thread | previous in thread | raw e-mail | index | archive | help
David E. Tweten wrote: > >This is so bogus, Warner. > > Before you put down someone else's idea it is best to *do your homework*. "Putting down someone else's good idea" was not what I was doing, sunshine. The sentence following the one you quoted *should* have provided fairly conclusive evidence of the fact that in that instance I was putting down the way that he was discussing the merits of the two proposals. To whit, pooh-poohing the behaviour of the deterministic approach in several instances by comparing it against the behaviour of the random approach *under different operating conditions*. > >The difference between the two is that the random method makes no > >effort to bias itself in favour of newly arrived packets which > >haven't had a chance to be SYN-ACK'ed yet. > > This is exactly why making room on a stocastic basis is superior to doing > it on a deterministic basis. Newly arrived packets are likely to be flood > packets. Sheer weight of numbers would suggest that yes, this is a correct assumption. When you look at the other end of the queue, however, you will find that the deterministic approach will ensure that it's highly likely that the oldest packets are illegitimate, and for sufficently large queue sizes, the chances of a packet at the end of the queue being illegitimate are quite small. The number of SYNs which can arrive at any system from any source is dependent on the bandwidth of the link between the system and the source; At the worst case, it's dependent on the speed of the pipe connecting the site under attack to that site's ISP. That places a nice upper bound on the maximum necessary queue size needed to guarantee a probability of 1.0 for acceptance of incoming legitimate SYNs. Granted, the present algorithm does too, but the present algorithm has a timeout which is far too long -- Excessively long, given the speed of today's backbone connections. > Besides, the problem isn't to identify and eliminate flood > packets, because identification seems to be impossible. The problem is to > give legitimate SYN packets a fighting chance to survive long enough to be > paired with an ACK. Bingo. Yet you favour an algorithm which has a built-in 1/n probability of nuking a SYN as soon as it arrives in the queue? > >The two approaches are so similar that if you were to implement the > >random approach, I could transform it into the deterministic approach > > by merely *changing your random number generator routine* [to one that > >isn't random]. > > This is hogwash. A "stocastic" process which is driven deterministically > isn't stocastic. It is not hogwash at all: If Warner were to implement the random algorithm by making use of a function call which randomly selected a packet for termination by returning its index into the SYN table, it could be transformed into the deterministic approach by changing that function call to one that simply returned whatever number represented the size of the queue (assuming the highest index is the oldest packet; return 0 in the converse case). Both approaches are fundamentally identical but for this detail. Warner uses a random number generator, I use "return n;" (n-1 really :-) where n is the queue size. Either way, both algorithms are selecting a single packet for termination in the event that the queue is full, they only differ in their selection method. > Now, lets run some numbers. No, let's not: You've left out some details in your derivations. You've calculated p, and defined it as "the probability of a legitimate SYN lasting long enough to be paired with an ACK." But according to my reading of your derivation, what you've *actually* calculated is the probability of ANY single SYN in the queue lasting long enough to be paired with an ACK (assuming the time interval for the pairing of a legitimate SYN with its ACK is constant), without factoring in any allowance for whether that SYN is legitimate. This is an important distinction, because in a flooding situation the vast majority of the packets in the queue will be illegitimate, and the random approach makes no effort to swing the proportions of good SYNs to bad in any positive direction (indeed, by definition that proportion will remain constant if the random number generator delivers a flat probability distribution). > f det stat t for c >= 0.5 > ------- ------- -------- -------------- > < 68 1.0 > 0.5 1 > 68 1.0 0.5 1 > 99 1.0 0.37 2 > 100 0.0 0.37 2 > 200 0.0 0.13 5 > 400 0.0 0.02 39 > 800 0.0 0.0003 2151 You've defined f as the number of flood SYNs which arrive between a SYN and its ACK. That would suggest that the probability that a given packet is legitimate is close to 1/f, or 1% for f=100 (I say "close to" because you should have defined f as "the number of SYNs which arrive betwen a SYN and its ACK," rather than the number of flood SYNs. With the definition duely corrected, the probability that a given packet in the input stream is legit is precisely 1/f, but even that doesn't accurately reflect the same probability for the domain representing the SYN queue in the kernel, because as SYNs are ACKed they are removed from the queue; This reduces the probability that any given packet in the queue is legitimate, but only marginally. Thus I conclude that that probability is *approximately* 1/f). What this says is that for n=100, the probability that any individual SYN will still be in the queue is 0.37 at f=100. In flooding conditions the number of legitimate SYNs will be vanishingly small compared to illegitimate SYNs -- Or 1% for f=100. Under those conditions, about 1% of that p=0.37 value would represent the probability that the packet that's hung around in the queue for long enough to be ACKed is legitimate, and hence the probability that a connection is successful is similarly vanishingly small. Your odds don't sound that great anymore. I'd go as far to say that at f=100 and n=100 *both* algorithms fall in a screaming heap, so perhaps n needs to be increased, yes? And if you increase n, you provide a larger range of f for which the deterministic value of p is 1, yes? And either way, n doesn't need to be increased by anywhere near the values to which it'd need to be increased under the present implementation. [ this is not surprising: Remember that both approaches are special cases of the same algorithm, ie: "my queue is full so I'll have to pick something to toss out to make room." Given that the only difference between the two is the selection method, it makes sense that they would both catastrophically fail at about the same point. ] Furthermore, with n=100, for values of f smaller than 100 the deterministic approach gives a probability of 1 for successful connection, whereas the non-deterministic case actually goes ahead and nukes a small proportion of legitimate SYNs. Indeed, if we redefine p as the probability that any SYN will have survivied for the time needed to ACK a legitimate SYN (which seems to be what you've actually calculated) then at f=99 the non-deterministic approach provides a 63% chance that a SYN *won't* survive that long (whether it's legitimate or not). While the deterministic approach says p=1.0, the non-deterministic approach gives a (63% x 1/f) chance of connection failure on the first try. Yuk, eh? But that's ok: I understand that you americans have become comfortable with the concept of friendly fire :-) [ as you can see, the chance of that friendly fire is quite small; once again, this is to be expected on the grounds that the two approaches are so similar. If it weren't for the (albeit small) chance of friendly fire, the similarities we're dealing with here would leave me wondering why we were having this discussion at all ] So, what have we found? Well, you've provided me with a statistical analysis of the algorithms which says exactly what I told Warner when we started this discussion, namely that the deterministic approach will work *perfectly* until the rate of incoming SYNs is sufficient to cause the queue to overflow in the time between the reception of a SYN and the reception of its ACK, that the non-deterministic approach contains a built-in chance of destroying an otherwise perfectly good SYN, and that the two approaches are more or less identical (allowing for small statistical variations). No further conclusions can be drawn from your presentation -- I have to wonder why you bothered. > Obviously, the deterministic algorithm fails to be able to establish a > legitimate connection when the flood rate is high enough to flush > legitimate SYNs just before their coresponding ACKs arrive. Like I said to Warner when we started this, if you consider the survival chances of any individual packet, then yes, the random approach seems better. But you can't consider a packet in isolation (no packet is an island? :-). Your 37% survival chance for a good SYN also allows bad SYNs a 37% fighting chance at the expense of the good SYNs which are, sadly, vastly outnumbered. Or, to put it another way, every time a SYN arrives from the net, the random method has a built-in 1/n probability of nuking a good SYN to make room. Oops. The random method will permit you to increase f by a few percentage points over the ceiling imposed by the deterministic method and still get the odd connection through. However, it does this at a cost: It provides the small but finite chance of allowing a legitimate connection attempt to time out unanswered. Conversely, the deterministic method guarantees perfect behaviour for all values of f up to n and no further. By sacrificing a very small margin at the high-end of f, you buy yourself reliability wins for all values of f less than or equal to n. That, IMHO, makes the deterministic method superior. > I leave checking of my simple derivations and running the numbers for other > values of n as "an exercise to the reader". The moral seems to be, "Do > your homework before ridiculing someone else's idea." I repeat, ridiculing Warner's idea is not something I set out to do, and upon rereading I can only come to the conclusion that it isn't something I actually went ahead and did either. If Warner has felt ridiculed by what I had to say then I can only apologise; If you feel that I ridiculed Warner and he believes otherwise then I can only suggest that you pull your head in. The world has enough thin-skinned brittle people already without some people making themselves thin-skinned and brittle on someone else's behalf. > Another good way to avoid embarassment is to check the spelling of people's > names before sending e-mail. Oh, a spelling flame. How trite. - mark [ driving vi about 30 keystrokes ahead over a link with an rtt of ~4.5 seconds, and suffering from an appalling lack of any urge to make any corrections to anything. ] --- Mark Newton Email: newton@communica.com.au Systems Engineer Phone: +61-8-8373-2523 Communica Systems WWW: http://www.communica.com.au
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9609211624.AA11272>