Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Nov 2008 23:40:13 +0100
From:      Andre Oppermann <andre@freebsd.org>
To:        Hartmut Brandt <hartmut.brandt@dlr.de>
Cc:        freebsd-net@freebsd.org, bz@freebsd.org, Rui Paulo <rpaulo@freebsd.org>
Subject:   Re: TCP and syncache question
Message-ID:  <4921F2CD.503@freebsd.org>
In-Reply-To: <4921B3C6.5020002@freebsd.org>
References:  <491F2C47.4050500@dlr.de>	<0A4BB2F1-AC9F-4316-94E3-790E2D80F651@freebsd.org>	<49201859.2080605@dlr.de> <4921B3C6.5020002@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------010706030207060604050302
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Andre Oppermann wrote:
> Hartmut Brandt wrote:
>> Rui Paulo wrote:
>>>
>>> On 15 Nov 2008, at 20:08, Hartmut Brandt wrote:
>>>
>>>> Hi,
>>>>
>>>> in tcp_syncache.c:syncache_expand() there is a test that the 
>>>> acknowledgement number and the sequence number of an incoming ACK 
>>>> segment are in the expected range. If they are not, 
>>>> syncache_expand() returns 0 and tcp_input drops the segment and sets 
>>>> a reset. So far so good. But syncache_expand() also deletes the 
>>>> syncache entry, and so destroys the connection. I cannot see why it 
>>>> does it. It seems to me that such a wrong segment should be 
>>>> interpreted as to be from another connection and as such the segment 
>>>> should be ignored (but a reset sent). When the correct ACK comes, 
>>>> the connection could still be established. As it is now, the 
>>>> establishment of incoming connections can seriously be disturbed by 
>>>> someone sending fake ACK packets.
>>>>
>>>> The same test (for the ack number, not for the sequence number) is 
>>>> also further down in tcp_input.c:tcp_do_segment()  (just after the 
>>>> header prediction stuff) and here the handling is correct: the goto 
>>>> dropwithreset just sends a reset and drops the segment but leaves 
>>>> the connection in the SYN-RECEIVED state. This test is probably 
>>>> never reached now, because of syncache_expand(), though.
> 
> Correct.
> 
>>>> Maybe I fail to see something obvious, though...
>>>
>>>
>>> Well, if the RST is sent, why should we keep the syncache entry?
>>
>> Because this effectively destroys the connection in SYN-RECEIVED which 
>> is wrong according to RFC793. On page 69 the handling of incoming 
>> segments for connections in SYN-RECEIVED is described: first you check 
>> the sequence number and, if it is wrong, you send an RST (unless the 
>> RST bit is set in the incoming segment), but otherwise ignore the 
>> segment.
>  >
>> A segment with a bad sequence number in SYN-RECEIVED is either forged 
>> or from an old connection. In both cases you don't want to destroy the 
>> embryonic connection, because the correct ACK from the correct peer 
>> may still arrive.
> 
> I see your problem.  Syncookies mitigate this problem (if not disabled) as
> the correct ACK will pass that test later even if the syncache entry went
> away before (which can also happen due to a generally high SYN load).
> 
> RFC793 wants us to do the following:
> 
>  Page 69: Send back a challenge ACK with the correct parameters to help
>           to re-synchronize the connection when
>           !(RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND).
> 
>  Page 72: Send back a RST when !(SND.UNA =< SEG.ACK =< SND.NXT).
> 
> At the moment we send the RST and delete the syncache entry for both cases.
> However we should send the ACK in the former, the RST in the latter, and
> and keep the syncache entry in either case.
> 
> Fixing this requires some re-shuffling of the syncache_expand().  I'll
> post a version later today.

This is a bit more complicated because of interactions with tcp_input()
where syncache_expand() is called from.

The old code (as of December 2002) behaved slightly different.  It would
not remove the syncache entry when (SND.UNA == SEG.ACK) but send a RST.
The (RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND) test wasn't done at
all.  Instead a socket was opened whenever (SND.UNA == SEG.ACK) succeeded.
This gave way to the "LAND" DoS attack which was mostly fixed with a test
for (RCV.IRS < SEG.SEQ).

See the attached patch for fixed version of syncache_expand().  This patch
is untested though.  My development machine is currently down.  Harti, Rui
and Bjoern, please have a look at the patch and review it.

-- 
Andre


--------------010706030207060604050302
Content-Type: text/plain;
 name="syncache_expand_fix-20081117.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="syncache_expand_fix-20081117.diff"

--- tcp_input.c.1.390	Mon Nov 17 21:33:25 2008
+++ tcp_input.c.1.390.mod	Mon Nov 17 21:35:22 2008
@@ -642,10 +642,13 @@ findpcb:
 			if (!syncache_expand(&inc, &to, th, &so, m)) {
 				/*
 				 * No syncache entry or ACK was not
-				 * for our SYN/ACK.  Send a RST.
+				 * for our SYN/ACK.  Send a RST or
+				 * an ACK for re-synchronization.
 				 * NB: syncache did its own logging
 				 * of the failure cause.
 				 */
+				if (so == 1)
+					goto dropunlock;
 				rstreason = BANDLIM_RST_OPENPORT;
 				goto dropwithreset;
 			}
--- tcp_syncache.c.1.160	Mon Nov 17 16:49:01 2008
+++ tcp_syncache.c.1.160.mod	Mon Nov 17 23:35:39 2008
@@ -817,59 +817,47 @@ syncache_expand(struct in_conninfo *inc,
 	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
 	KASSERT((th->th_flags & (TH_RST|TH_ACK|TH_SYN)) == TH_ACK,
 	    ("%s: can handle only ACK", __func__));
+	*lsop = NULL;
 
 	sc = syncache_lookup(inc, &sch);	/* returns locked sch */
 	SCH_LOCK_ASSERT(sch);
 	if (sc == NULL) {
 		/*
 		 * There is no syncache entry, so see if this ACK is
-		 * a returning syncookie.  To do this, first:
-		 *  A. See if this socket has had a syncache entry dropped in
-		 *     the past.  We don't want to accept a bogus syncookie
-		 *     if we've never received a SYN.
-		 *  B. check that the syncookie is valid.  If it is, then
-		 *     cobble up a fake syncache entry, and return.
+		 * a returning syncookie.  If the syncookie is valid,
+		 * cobble up a fake syncache entry and create a socket.
+		 *
+		 * NB: Syncache head is locked for the syncookie access.
 		 */
 		if (!tcp_syncookies) {
-			SCH_UNLOCK(sch);
 			if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 				log(LOG_DEBUG, "%s; %s: Spurious ACK, "
 				    "segment rejected (syncookies disabled)\n",
 				    s, __func__);
-			goto failed;
+			goto sendrst;
 		}
 		bzero(&scs, sizeof(scs));
 		sc = syncookie_lookup(inc, sch, &scs, to, th, *lsop);
-		SCH_UNLOCK(sch);
 		if (sc == NULL) {
 			if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 				log(LOG_DEBUG, "%s; %s: Segment failed "
 				    "SYNCOOKIE authentication, segment rejected "
 				    "(probably spoofed)\n", s, __func__);
-			goto failed;
+			goto sendrst;
 		}
-	} else {
-		/* Pull out the entry to unlock the bucket row. */
-		TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash);
-		sch->sch_length--;
-		V_tcp_syncache.cache_count--;
-		SCH_UNLOCK(sch);
+		goto expand;		/* fully validated through syncookie */
 	}
+	SCH_LOCK_ASSERT(sch);
 
 	/*
 	 * Segment validation:
-	 * ACK must match our initial sequence number + 1 (the SYN|ACK).
-	 */
-	if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) {
-		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
-			log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment "
-			    "rejected\n", s, __func__, th->th_ack, sc->sc_iss);
-		goto failed;
-	}
-
-	/*
+	 *
 	 * The SEQ must fall in the window starting at the received
 	 * initial receive sequence number + 1 (the SYN).
+	 * If not the segment may be from an earlier connection.  We send
+	 * an ACK to re-synchronize the connection and keep the syncache
+	 * entry without ajusting its timers.
+	 * See RFC793 page 69, first check sequence number [SYN_RECEIVED].
 	 */
 	if ((SEQ_LEQ(th->th_seq, sc->sc_irs) ||
 	    SEQ_GT(th->th_seq, sc->sc_irs + sc->sc_wnd)) &&
@@ -877,14 +865,41 @@ syncache_expand(struct in_conninfo *inc,
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: SEQ %u != IRS+1 %u, segment "
 			    "rejected\n", s, __func__, th->th_seq, sc->sc_irs);
-		goto failed;
+		(void) syncache_respond(sc);
+		*lsop = 1;			/* prevent RST */
+		goto sendrstkeep;
 	}
 
+	/*
+	 * ACK must match our initial sequence number + 1 (the SYN|ACK).
+	 * If not the segment may be from an earlier connection.  We send
+	 * a RST but keep the syncache entry without ajusting its timers.
+	 * See RFC793 page 72, fifth check the ACK field, [SYN_RECEIVED].
+	 */
+	if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) {
+		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
+			log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment "
+			    "rejected\n", s, __func__, th->th_ack, sc->sc_iss);
+		goto sendrstkeep;
+	}
+
+	/*
+	 * Remove the entry to unlock the bucket row.
+	 * Tests from now on are fatal and remove the syncache entry.
+	 */
+	TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash);
+	sch->sch_length--;
+	V_tcp_syncache.cache_count--;
+
+	/*
+	 * If timestamps were not negotiated they must not show up later.
+	 * See RFC1312bis, section 1.3, second paragraph 
+	 */
 	if (!(sc->sc_flags & SCF_TIMESTAMP) && (to->to_flags & TOF_TS)) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: Timestamp not expected, "
 			    "segment rejected\n", s, __func__);
-		goto failed;
+		goto sendrst;
 	}
 	/*
 	 * If timestamps were negotiated the reflected timestamp
@@ -896,9 +911,11 @@ syncache_expand(struct in_conninfo *inc,
 			log(LOG_DEBUG, "%s; %s: TSECR %u != TS %u, "
 			    "segment rejected\n",
 			    s, __func__, to->to_tsecr, sc->sc_ts);
-		goto failed;
+		goto sendrst;
 	}
 
+expand:
+	SCH_UNLOCK(sch);
 	*lsop = syncache_socket(sc, *lsop, m);
 
 	if (*lsop == NULL)
@@ -906,16 +923,18 @@ syncache_expand(struct in_conninfo *inc,
 	else
 		V_tcpstat.tcps_sc_completed++;
 
-/* how do we find the inp for the new socket? */
 	if (sc != &scs)
 		syncache_free(sc);
 	return (1);
-failed:
-	if (sc != NULL && sc != &scs)
+
+sendrst:
+	if (sc != &scs)
 		syncache_free(sc);
+sendrstkeep:
+	SCH_LOCK_ASSERT(sch);
+	SCH_UNLOCK(sch);
 	if (s != NULL)
 		free(s, M_TCPLOG);
-	*lsop = NULL;
 	return (0);
 }
 
@@ -1322,6 +1341,8 @@ syncache_respond(struct syncache *sc)
 		 *
 		 *	1) path_mtu_discovery is disabled
 		 *	2) the SCF_UNREACH flag has been set
+		 *
+		 * XXXAO: The route lookup comment doesn't make sense.
 		 */
 		if (V_path_mtu_discovery && ((sc->sc_flags & SCF_UNREACH) == 0))
 		       ip->ip_off |= IP_DF;

--------------010706030207060604050302--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4921F2CD.503>