Date: Mon, 17 Nov 2008 23:40:13 +0100 From: Andre Oppermann <andre@freebsd.org> To: Hartmut Brandt <hartmut.brandt@dlr.de> Cc: freebsd-net@freebsd.org, bz@freebsd.org, Rui Paulo <rpaulo@freebsd.org> Subject: Re: TCP and syncache question Message-ID: <4921F2CD.503@freebsd.org> In-Reply-To: <4921B3C6.5020002@freebsd.org> References: <491F2C47.4050500@dlr.de> <0A4BB2F1-AC9F-4316-94E3-790E2D80F651@freebsd.org> <49201859.2080605@dlr.de> <4921B3C6.5020002@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------010706030207060604050302 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Andre Oppermann wrote: > Hartmut Brandt wrote: >> Rui Paulo wrote: >>> >>> On 15 Nov 2008, at 20:08, Hartmut Brandt wrote: >>> >>>> Hi, >>>> >>>> in tcp_syncache.c:syncache_expand() there is a test that the >>>> acknowledgement number and the sequence number of an incoming ACK >>>> segment are in the expected range. If they are not, >>>> syncache_expand() returns 0 and tcp_input drops the segment and sets >>>> a reset. So far so good. But syncache_expand() also deletes the >>>> syncache entry, and so destroys the connection. I cannot see why it >>>> does it. It seems to me that such a wrong segment should be >>>> interpreted as to be from another connection and as such the segment >>>> should be ignored (but a reset sent). When the correct ACK comes, >>>> the connection could still be established. As it is now, the >>>> establishment of incoming connections can seriously be disturbed by >>>> someone sending fake ACK packets. >>>> >>>> The same test (for the ack number, not for the sequence number) is >>>> also further down in tcp_input.c:tcp_do_segment() (just after the >>>> header prediction stuff) and here the handling is correct: the goto >>>> dropwithreset just sends a reset and drops the segment but leaves >>>> the connection in the SYN-RECEIVED state. This test is probably >>>> never reached now, because of syncache_expand(), though. > > Correct. > >>>> Maybe I fail to see something obvious, though... >>> >>> >>> Well, if the RST is sent, why should we keep the syncache entry? >> >> Because this effectively destroys the connection in SYN-RECEIVED which >> is wrong according to RFC793. On page 69 the handling of incoming >> segments for connections in SYN-RECEIVED is described: first you check >> the sequence number and, if it is wrong, you send an RST (unless the >> RST bit is set in the incoming segment), but otherwise ignore the >> segment. > > >> A segment with a bad sequence number in SYN-RECEIVED is either forged >> or from an old connection. In both cases you don't want to destroy the >> embryonic connection, because the correct ACK from the correct peer >> may still arrive. > > I see your problem. Syncookies mitigate this problem (if not disabled) as > the correct ACK will pass that test later even if the syncache entry went > away before (which can also happen due to a generally high SYN load). > > RFC793 wants us to do the following: > > Page 69: Send back a challenge ACK with the correct parameters to help > to re-synchronize the connection when > !(RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND). > > Page 72: Send back a RST when !(SND.UNA =< SEG.ACK =< SND.NXT). > > At the moment we send the RST and delete the syncache entry for both cases. > However we should send the ACK in the former, the RST in the latter, and > and keep the syncache entry in either case. > > Fixing this requires some re-shuffling of the syncache_expand(). I'll > post a version later today. This is a bit more complicated because of interactions with tcp_input() where syncache_expand() is called from. The old code (as of December 2002) behaved slightly different. It would not remove the syncache entry when (SND.UNA == SEG.ACK) but send a RST. The (RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND) test wasn't done at all. Instead a socket was opened whenever (SND.UNA == SEG.ACK) succeeded. This gave way to the "LAND" DoS attack which was mostly fixed with a test for (RCV.IRS < SEG.SEQ). See the attached patch for fixed version of syncache_expand(). This patch is untested though. My development machine is currently down. Harti, Rui and Bjoern, please have a look at the patch and review it. -- Andre --------------010706030207060604050302 Content-Type: text/plain; name="syncache_expand_fix-20081117.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="syncache_expand_fix-20081117.diff" --- tcp_input.c.1.390 Mon Nov 17 21:33:25 2008 +++ tcp_input.c.1.390.mod Mon Nov 17 21:35:22 2008 @@ -642,10 +642,13 @@ findpcb: if (!syncache_expand(&inc, &to, th, &so, m)) { /* * No syncache entry or ACK was not - * for our SYN/ACK. Send a RST. + * for our SYN/ACK. Send a RST or + * an ACK for re-synchronization. * NB: syncache did its own logging * of the failure cause. */ + if (so == 1) + goto dropunlock; rstreason = BANDLIM_RST_OPENPORT; goto dropwithreset; } --- tcp_syncache.c.1.160 Mon Nov 17 16:49:01 2008 +++ tcp_syncache.c.1.160.mod Mon Nov 17 23:35:39 2008 @@ -817,59 +817,47 @@ syncache_expand(struct in_conninfo *inc, INP_INFO_WLOCK_ASSERT(&V_tcbinfo); KASSERT((th->th_flags & (TH_RST|TH_ACK|TH_SYN)) == TH_ACK, ("%s: can handle only ACK", __func__)); + *lsop = NULL; sc = syncache_lookup(inc, &sch); /* returns locked sch */ SCH_LOCK_ASSERT(sch); if (sc == NULL) { /* * There is no syncache entry, so see if this ACK is - * a returning syncookie. To do this, first: - * A. See if this socket has had a syncache entry dropped in - * the past. We don't want to accept a bogus syncookie - * if we've never received a SYN. - * B. check that the syncookie is valid. If it is, then - * cobble up a fake syncache entry, and return. + * a returning syncookie. If the syncookie is valid, + * cobble up a fake syncache entry and create a socket. + * + * NB: Syncache head is locked for the syncookie access. */ if (!tcp_syncookies) { - SCH_UNLOCK(sch); if ((s = tcp_log_addrs(inc, th, NULL, NULL))) log(LOG_DEBUG, "%s; %s: Spurious ACK, " "segment rejected (syncookies disabled)\n", s, __func__); - goto failed; + goto sendrst; } bzero(&scs, sizeof(scs)); sc = syncookie_lookup(inc, sch, &scs, to, th, *lsop); - SCH_UNLOCK(sch); if (sc == NULL) { if ((s = tcp_log_addrs(inc, th, NULL, NULL))) log(LOG_DEBUG, "%s; %s: Segment failed " "SYNCOOKIE authentication, segment rejected " "(probably spoofed)\n", s, __func__); - goto failed; + goto sendrst; } - } else { - /* Pull out the entry to unlock the bucket row. */ - TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash); - sch->sch_length--; - V_tcp_syncache.cache_count--; - SCH_UNLOCK(sch); + goto expand; /* fully validated through syncookie */ } + SCH_LOCK_ASSERT(sch); /* * Segment validation: - * ACK must match our initial sequence number + 1 (the SYN|ACK). - */ - if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) { - if ((s = tcp_log_addrs(inc, th, NULL, NULL))) - log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment " - "rejected\n", s, __func__, th->th_ack, sc->sc_iss); - goto failed; - } - - /* + * * The SEQ must fall in the window starting at the received * initial receive sequence number + 1 (the SYN). + * If not the segment may be from an earlier connection. We send + * an ACK to re-synchronize the connection and keep the syncache + * entry without ajusting its timers. + * See RFC793 page 69, first check sequence number [SYN_RECEIVED]. */ if ((SEQ_LEQ(th->th_seq, sc->sc_irs) || SEQ_GT(th->th_seq, sc->sc_irs + sc->sc_wnd)) && @@ -877,14 +865,41 @@ syncache_expand(struct in_conninfo *inc, if ((s = tcp_log_addrs(inc, th, NULL, NULL))) log(LOG_DEBUG, "%s; %s: SEQ %u != IRS+1 %u, segment " "rejected\n", s, __func__, th->th_seq, sc->sc_irs); - goto failed; + (void) syncache_respond(sc); + *lsop = 1; /* prevent RST */ + goto sendrstkeep; } + /* + * ACK must match our initial sequence number + 1 (the SYN|ACK). + * If not the segment may be from an earlier connection. We send + * a RST but keep the syncache entry without ajusting its timers. + * See RFC793 page 72, fifth check the ACK field, [SYN_RECEIVED]. + */ + if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) { + if ((s = tcp_log_addrs(inc, th, NULL, NULL))) + log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment " + "rejected\n", s, __func__, th->th_ack, sc->sc_iss); + goto sendrstkeep; + } + + /* + * Remove the entry to unlock the bucket row. + * Tests from now on are fatal and remove the syncache entry. + */ + TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash); + sch->sch_length--; + V_tcp_syncache.cache_count--; + + /* + * If timestamps were not negotiated they must not show up later. + * See RFC1312bis, section 1.3, second paragraph + */ if (!(sc->sc_flags & SCF_TIMESTAMP) && (to->to_flags & TOF_TS)) { if ((s = tcp_log_addrs(inc, th, NULL, NULL))) log(LOG_DEBUG, "%s; %s: Timestamp not expected, " "segment rejected\n", s, __func__); - goto failed; + goto sendrst; } /* * If timestamps were negotiated the reflected timestamp @@ -896,9 +911,11 @@ syncache_expand(struct in_conninfo *inc, log(LOG_DEBUG, "%s; %s: TSECR %u != TS %u, " "segment rejected\n", s, __func__, to->to_tsecr, sc->sc_ts); - goto failed; + goto sendrst; } +expand: + SCH_UNLOCK(sch); *lsop = syncache_socket(sc, *lsop, m); if (*lsop == NULL) @@ -906,16 +923,18 @@ syncache_expand(struct in_conninfo *inc, else V_tcpstat.tcps_sc_completed++; -/* how do we find the inp for the new socket? */ if (sc != &scs) syncache_free(sc); return (1); -failed: - if (sc != NULL && sc != &scs) + +sendrst: + if (sc != &scs) syncache_free(sc); +sendrstkeep: + SCH_LOCK_ASSERT(sch); + SCH_UNLOCK(sch); if (s != NULL) free(s, M_TCPLOG); - *lsop = NULL; return (0); } @@ -1322,6 +1341,8 @@ syncache_respond(struct syncache *sc) * * 1) path_mtu_discovery is disabled * 2) the SCF_UNREACH flag has been set + * + * XXXAO: The route lookup comment doesn't make sense. */ if (V_path_mtu_discovery && ((sc->sc_flags & SCF_UNREACH) == 0)) ip->ip_off |= IP_DF; --------------010706030207060604050302--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4921F2CD.503>