Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Sep 1996 01:54:12 +0930 (CST)
From:      newton@communica.com.au (Mark Newton)
To:        tweten@frihet.com
Cc:        newton@communica.com.au, imp@village.org, spfarrel@midway.uchicago.edu, security@freebsd.org
Subject:   Re: comments on the SYN attack
Message-ID:  <9609211624.AA11272@communica.com.au>
In-Reply-To: <199609211220.FAA06633@ns.frihet.com> from "David E. Tweten" at Sep 21, 96 05:20:38 am

next in thread | previous in thread | raw e-mail | index | archive | help
David E. Tweten wrote:

 > >This is so bogus, Warner.
 > 
 > Before you put down someone else's idea it is best to *do your homework*.  

"Putting down someone else's good idea" was not what I was doing, sunshine.
The sentence following the one you quoted *should* have provided fairly
conclusive evidence of the fact that in that instance I was putting down
the way that he was discussing the merits of the two proposals.  To whit,
pooh-poohing the behaviour of the deterministic approach in several 
instances by comparing it against the behaviour of the random approach 
*under different operating conditions*.

 > >The  difference between the two is that the random method makes no 
 > >effort to  bias itself in favour of newly arrived packets which 
 > >haven't had a  chance to be SYN-ACK'ed yet.
 > 
 > This is exactly why making room on a stocastic basis is superior to doing 
 > it on a deterministic basis.  Newly arrived packets are likely to be flood 
 > packets. 

Sheer weight of numbers would suggest that yes, this is a correct assumption.

When you look at the other end of the queue, however, you will find that
the deterministic approach will ensure that it's highly likely that the
oldest packets are illegitimate, and for sufficently large queue sizes,
the chances of a packet at the end of the queue being illegitimate are
quite small.

The number of SYNs which can arrive at any system from any source is
dependent on the bandwidth of the link between the system and the source;
At the worst case, it's dependent on the speed of the pipe connecting the
site under attack to that site's ISP.  That places a nice upper bound
on the maximum necessary queue size needed to guarantee a probability
of 1.0 for acceptance of incoming legitimate SYNs.  Granted, the present
algorithm does too, but the present algorithm has a timeout which is
far too long -- Excessively long, given the speed of today's backbone
connections.

 > Besides, the problem isn't to identify and eliminate flood 
 > packets, because identification seems to be impossible.  The problem is to 
 > give legitimate SYN packets a fighting chance to survive long enough to be 
 > paired with an ACK.

Bingo.  Yet you favour an algorithm which has a built-in 1/n probability
of nuking a SYN as soon as it arrives in the queue?

 > >The two approaches are so similar that if you were to implement the 
 > >random approach, I could transform it into the deterministic approach 
 > > by merely *changing your random number generator routine* [to one that 
 > >isn't random].
 > 
 > This is hogwash.  A "stocastic" process which is driven deterministically 
 > isn't stocastic.

It is not hogwash at all:  If Warner were to implement the random 
algorithm by making use of a function call which randomly selected
a packet for termination by returning its index into the SYN table,
it could be transformed into the deterministic approach by changing
that function call to one that simply returned whatever number represented
the size of the queue (assuming the highest index is the oldest packet;
return 0 in the converse case).

Both approaches are fundamentally identical but for this detail.  
Warner uses a random number generator, I use "return n;" (n-1 really :-)
where n is the queue size.  Either way, both algorithms are selecting
a single packet for termination in the event that the queue is full,
they only differ in their selection method.
 
 > Now, lets run some numbers. 

No, let's not:  You've left out some details in your derivations.

You've calculated p, and defined it as "the probability of a legitimate
SYN lasting long enough to be paired with an ACK."  But according to my 
reading of your derivation, what you've *actually* calculated is 
the probability of ANY single SYN in the queue lasting long enough
to be paired with an ACK (assuming the time interval for the pairing
of a legitimate SYN with its ACK is constant), without factoring in
any allowance for whether that SYN is legitimate.

This is an important distinction, because in a flooding situation
the vast majority of the packets in the queue will be illegitimate,
and the random approach makes no effort to swing the proportions of
good SYNs to bad in any positive direction (indeed, by definition that
proportion will remain constant if the random number generator delivers
a flat probability distribution).  

 > f            det     stat     t for c >= 0.5
 > -------	-------	-------- --------------
 > <  68	1.0	> 0.5		   1
 >    68	1.0	  0.5		   1
 >    99	1.0	  0.37		   2
 >   100	0.0	  0.37		   2
 >   200	0.0	  0.13		   5
 >   400	0.0	  0.02		  39
 >   800	0.0	  0.0003	2151

You've defined f as the number of flood SYNs which arrive between
a SYN and its ACK.  That would suggest that the probability that a
given packet is legitimate is close to 1/f, or 1% for f=100 (I say
"close to" because you should have defined f as "the number of SYNs
which arrive betwen a SYN and its ACK," rather than the number of
flood SYNs.  With the definition duely corrected, the probability
that a given packet in the input stream is legit is precisely 1/f,
but even that doesn't accurately reflect the same probability for
the domain representing the SYN queue in the kernel, because as 
SYNs are ACKed they are removed from the queue;  This reduces the
probability that any given packet in the queue is legitimate, but
only marginally.  Thus I conclude that that probability is
*approximately* 1/f).

What this says is that for n=100, the probability that any individual
SYN will still be in the queue is 0.37 at f=100.  In flooding conditions
the number of legitimate SYNs will be vanishingly small compared to
illegitimate SYNs -- Or 1% for f=100.  Under those conditions, about
1% of that p=0.37 value would represent the probability that the packet
that's hung around in the queue for long enough to be ACKed is
legitimate, and hence the probability that a connection is successful
is similarly vanishingly small.

Your odds don't sound that great anymore.  I'd go as far to say that
at f=100 and n=100 *both* algorithms fall in a screaming heap, so perhaps
n needs to be increased, yes?  And if you increase n, you provide a larger
range of f for which the deterministic value of p is 1, yes?  And either
way, n doesn't need to be increased by anywhere near the values to
which it'd need to be increased under the present implementation.

 [ this is not surprising:  Remember that both approaches are special
   cases of the same algorithm, ie: "my queue is full so I'll have to
   pick something to toss out to make room."  Given that the only difference
   between the two is the selection method, it makes sense that they 
   would both catastrophically fail at about the same point. ]

Furthermore, with n=100, for values of f smaller than 100 the deterministic
approach gives a probability of 1 for successful connection, whereas
the non-deterministic case actually goes ahead and nukes a small 
proportion of legitimate SYNs.  Indeed, if we redefine p as the probability
that any SYN will have survivied for the time needed to ACK a legitimate
SYN (which seems to be what you've actually calculated) then at f=99
the non-deterministic approach provides a 63% chance that a SYN *won't*
survive that long (whether it's legitimate or not).  While the 
deterministic approach says p=1.0, the non-deterministic approach gives
a (63% x 1/f) chance of connection failure on the first try.  Yuk, eh?

But that's ok:  I understand that you americans have become comfortable
with the concept of friendly fire :-)

  [ as you can see, the chance of that friendly fire is quite small;
    once again, this is to be expected on the grounds that the two
    approaches are so similar.  If it weren't for the (albeit small) 
    chance of friendly fire, the similarities we're dealing with here
    would leave me wondering why we were having this discussion at all ]

So, what have we found?  Well, you've provided me with a statistical
analysis of the algorithms which says exactly what I told Warner when
we started this discussion, namely that the deterministic approach
will work *perfectly* until the rate of incoming SYNs is sufficient to 
cause the queue to overflow in the time between the reception of a SYN
and the reception of its ACK, that the non-deterministic approach
contains a built-in chance of destroying an otherwise perfectly good
SYN, and that the two approaches are more or less identical (allowing
for small statistical variations).  No further conclusions can be drawn
from your presentation -- I have to wonder why you bothered.

 > Obviously, the deterministic algorithm fails to be able to establish a 
 > legitimate connection when the flood rate is high enough to flush 
 > legitimate SYNs just before their coresponding ACKs arrive.

Like I said to Warner when we started this, if you consider the survival
chances of any individual packet, then yes, the random approach seems
better.  But you can't consider a packet in isolation (no packet is 
an island? :-).  Your 37% survival chance for a good SYN also allows
bad SYNs a 37% fighting chance at the expense of the good SYNs which are,
sadly, vastly outnumbered.  Or, to put it another way, every time a
SYN arrives from the net, the random method has a built-in 1/n probability
of nuking a good SYN to make room.  Oops.

The random method will permit you to increase f by a few percentage
points over the ceiling imposed by the deterministic method and still
get the odd connection through.  However, it does this at a cost:  It
provides the small but finite chance of allowing a legitimate connection
attempt to time out unanswered.  Conversely, the deterministic method
guarantees perfect behaviour for all values of f up to n and no further.
By sacrificing a very small margin at the high-end of f, you buy yourself
reliability wins for all values of f less than or equal to n.

That, IMHO, makes the deterministic method superior.

 > I leave checking of my simple derivations and running the numbers for other 
 > values of n as "an exercise to the reader".  The moral seems to be, "Do 
 > your homework before ridiculing someone else's idea."

I repeat, ridiculing Warner's idea is not something I set out to do, and
upon rereading I can only come to the conclusion that it isn't something
I actually went ahead and did either.  If Warner has felt ridiculed by
what I had to say then I can only apologise;  If you feel that I ridiculed
Warner and he believes otherwise then I can only suggest that you pull
your head in.  The world has enough thin-skinned brittle people already
without some people making themselves thin-skinned and brittle on 
someone else's behalf.

 > Another good way to avoid embarassment is to check the spelling of people's 
 > names before sending e-mail.

Oh, a spelling flame.

How trite.

    - mark
      [ driving vi about 30 keystrokes ahead over a link with an rtt of ~4.5
        seconds, and suffering from an appalling lack of any urge to make
        any corrections to anything. ]

---
Mark Newton                               Email: newton@communica.com.au
Systems Engineer                          Phone: +61-8-8373-2523
Communica Systems                         WWW:   http://www.communica.com.au



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9609211624.AA11272>