From owner-freebsd-net@FreeBSD.ORG Wed Nov 19 20:01:11 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ADC33106564A for ; Wed, 19 Nov 2008 20:01:11 +0000 (UTC) (envelope-from rpaulo@gmail.com) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.173]) by mx1.freebsd.org (Postfix) with ESMTP id 30AF28FC1C for ; Wed, 19 Nov 2008 20:01:11 +0000 (UTC) (envelope-from rpaulo@gmail.com) Received: by ug-out-1314.google.com with SMTP id 30so77319ugs.39 for ; Wed, 19 Nov 2008 12:01:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:cc:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer:sender; bh=3hWv5l3mzSeYrRpdt2iLIuLxd4S4ICUe6ThMUSr7Uv8=; b=eYDJSjeNmAb+PeynfU2DXnEuSR9sBnChZuX44lKVw0obUFUjLs/XZqi7acz71yzKa2 xuifV2itqtQBFK5wnQggh5rE8Sp+AkHz5YbiSUOMwhwZVShylYCrncWO2XiwVkRINmBm KfOL2lmy1BRPLuYE3J1G9nPqJOyV4nY90df7M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=cc:message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer:sender; b=Sq7ol0yfOq8TGmmM/Fzn4+a1SZfUDLXGQ+cYvqF03KDcT5sfkFf9634gGdS83Ya2rH IkREck/0FyuWl4Xysu1p/2+EBSSoSAL3JoMBABqtqjhYkmjH/ptFZCW3zUfbFZISFboo xJrve9hyBVfEN2IjAi3pH/P38hsBKGv64XZVs= Received: by 10.67.119.8 with SMTP id w8mr3256066ugm.3.1227124870169; Wed, 19 Nov 2008 12:01:10 -0800 (PST) Received: from epsilon.lan (bl6-157-129.dsl.telepac.pt [82.155.157.129]) by mx.google.com with ESMTPS id 31sm31848ugg.34.2008.11.19.12.01.07 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 19 Nov 2008 12:01:09 -0800 (PST) Message-Id: <403932D0-8B62-4C95-8076-A185AC1C69E7@freebsd.org> From: Rui Paulo To: Andre Oppermann In-Reply-To: <4921F2CD.503@freebsd.org> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v929.2) Date: Wed, 19 Nov 2008 20:01:06 +0000 References: <491F2C47.4050500@dlr.de> <0A4BB2F1-AC9F-4316-94E3-790E2D80F651@freebsd.org> <49201859.2080605@dlr.de> <4921B3C6.5020002@freebsd.org> <4921F2CD.503@freebsd.org> X-Mailer: Apple Mail (2.929.2) Sender: Rui Paulo Cc: freebsd-net@freebsd.org, Hartmut Brandt , bz@freebsd.org Subject: Re: TCP and syncache question X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2008 20:01:11 -0000 On 17 Nov 2008, at 22:40, Andre Oppermann wrote: > This is a bit more complicated because of interactions with > tcp_input() > where syncache_expand() is called from. > > The old code (as of December 2002) behaved slightly different. It > would > not remove the syncache entry when (SND.UNA == SEG.ACK) but send a > RST. > The (RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND) test wasn't > done at > all. Instead a socket was opened whenever (SND.UNA == SEG.ACK) > succeeded. > This gave way to the "LAND" DoS attack which was mostly fixed with a > test > for (RCV.IRS < SEG.SEQ). > > See the attached patch for fixed version of syncache_expand(). This > patch > is untested though. My development machine is currently down. > Harti, Rui > and Bjoern, please have a look at the patch and review it. > > -- > Andre > > --- tcp_input.c.1.390 Mon Nov 17 21:33:25 2008 > +++ tcp_input.c.1.390.mod Mon Nov 17 21:35:22 2008 > @@ -642,10 +642,13 @@ findpcb: > if (!syncache_expand(&inc, &to, th, &so, m)) { > /* > * No syncache entry or ACK was not > - * for our SYN/ACK. Send a RST. > + * for our SYN/ACK. Send a RST or > + * an ACK for re-synchronization. > * NB: syncache did its own logging > * of the failure cause. > */ > + if (so == 1) > + goto dropunlock; > rstreason = BANDLIM_RST_OPENPORT; > goto dropwithreset; > } > --- tcp_syncache.c.1.160 Mon Nov 17 16:49:01 2008 > +++ tcp_syncache.c.1.160.mod Mon Nov 17 23:35:39 2008 > @@ -817,59 +817,47 @@ syncache_expand(struct in_conninfo *inc, > INP_INFO_WLOCK_ASSERT(&V_tcbinfo); > KASSERT((th->th_flags & (TH_RST|TH_ACK|TH_SYN)) == TH_ACK, > ("%s: can handle only ACK", __func__)); > + *lsop = NULL; > > sc = syncache_lookup(inc, &sch); /* returns locked sch */ > SCH_LOCK_ASSERT(sch); > if (sc == NULL) { > /* > * There is no syncache entry, so see if this ACK is > - * a returning syncookie. To do this, first: > - * A. See if this socket has had a syncache entry dropped in > - * the past. We don't want to accept a bogus syncookie > - * if we've never received a SYN. > - * B. check that the syncookie is valid. If it is, then > - * cobble up a fake syncache entry, and return. > + * a returning syncookie. If the syncookie is valid, > + * cobble up a fake syncache entry and create a socket. > + * > + * NB: Syncache head is locked for the syncookie access. > */ > if (!tcp_syncookies) { > - SCH_UNLOCK(sch); > if ((s = tcp_log_addrs(inc, th, NULL, NULL))) > log(LOG_DEBUG, "%s; %s: Spurious ACK, " > "segment rejected (syncookies disabled)\n", > s, __func__); > - goto failed; > + goto sendrst; > } > bzero(&scs, sizeof(scs)); > sc = syncookie_lookup(inc, sch, &scs, to, th, *lsop); > - SCH_UNLOCK(sch); > if (sc == NULL) { > if ((s = tcp_log_addrs(inc, th, NULL, NULL))) > log(LOG_DEBUG, "%s; %s: Segment failed " > "SYNCOOKIE authentication, segment rejected " > "(probably spoofed)\n", s, __func__); > - goto failed; > + goto sendrst; > } > - } else { > - /* Pull out the entry to unlock the bucket row. */ > - TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash); > - sch->sch_length--; > - V_tcp_syncache.cache_count--; > - SCH_UNLOCK(sch); > + goto expand; /* fully validated through syncookie */ > } > + SCH_LOCK_ASSERT(sch); Why do you need this assert? > > > /* > * Segment validation: > - * ACK must match our initial sequence number + 1 (the SYN|ACK). > - */ > - if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) { > - if ((s = tcp_log_addrs(inc, th, NULL, NULL))) > - log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment " > - "rejected\n", s, __func__, th->th_ack, sc->sc_iss); > - goto failed; > - } > - > - /* > + * > * The SEQ must fall in the window starting at the received > * initial receive sequence number + 1 (the SYN). > + * If not the segment may be from an earlier connection. We send > + * an ACK to re-synchronize the connection and keep the syncache > + * entry without ajusting its timers. > + * See RFC793 page 69, first check sequence number [SYN_RECEIVED]. > */ > if ((SEQ_LEQ(th->th_seq, sc->sc_irs) || > SEQ_GT(th->th_seq, sc->sc_irs + sc->sc_wnd)) && > @@ -877,14 +865,41 @@ syncache_expand(struct in_conninfo *inc, > if ((s = tcp_log_addrs(inc, th, NULL, NULL))) > log(LOG_DEBUG, "%s; %s: SEQ %u != IRS+1 %u, segment " > "rejected\n", s, __func__, th->th_seq, sc->sc_irs); > - goto failed; > + (void) syncache_respond(sc); > + *lsop = 1; /* prevent RST */ > + goto sendrstkeep; > } > > + /* > + * ACK must match our initial sequence number + 1 (the SYN|ACK). > + * If not the segment may be from an earlier connection. We send > + * a RST but keep the syncache entry without ajusting its timers. > + * See RFC793 page 72, fifth check the ACK field, [SYN_RECEIVED]. > + */ > + if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) { > + if ((s = tcp_log_addrs(inc, th, NULL, NULL))) > + log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment " > + "rejected\n", s, __func__, th->th_ack, sc->sc_iss); > + goto sendrstkeep; > + } > + > + /* > + * Remove the entry to unlock the bucket row. > + * Tests from now on are fatal and remove the syncache entry. > + */ > + TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash); > + sch->sch_length--; > + V_tcp_syncache.cache_count--; > + > + /* > + * If timestamps were not negotiated they must not show up later. > + * See RFC1312bis, section 1.3, second paragraph > + */ > if (!(sc->sc_flags & SCF_TIMESTAMP) && (to->to_flags & TOF_TS)) { > if ((s = tcp_log_addrs(inc, th, NULL, NULL))) > log(LOG_DEBUG, "%s; %s: Timestamp not expected, " > "segment rejected\n", s, __func__); > - goto failed; > + goto sendrst; > } > /* > * If timestamps were negotiated the reflected timestamp > @@ -896,9 +911,11 @@ syncache_expand(struct in_conninfo *inc, > log(LOG_DEBUG, "%s; %s: TSECR %u != TS %u, " > "segment rejected\n", > s, __func__, to->to_tsecr, sc->sc_ts); > - goto failed; > + goto sendrst; > } > > +expand: > + SCH_UNLOCK(sch); > *lsop = syncache_socket(sc, *lsop, m); > > if (*lsop == NULL) > @@ -906,16 +923,18 @@ syncache_expand(struct in_conninfo *inc, > else > V_tcpstat.tcps_sc_completed++; > > -/* how do we find the inp for the new socket? */ > if (sc != &scs) > syncache_free(sc); > return (1); > -failed: > - if (sc != NULL && sc != &scs) > + > +sendrst: > + if (sc != &scs) > syncache_free(sc); > +sendrstkeep: > + SCH_LOCK_ASSERT(sch); > + SCH_UNLOCK(sch); Why do we need an assert before an unlock? > > if (s != NULL) > free(s, M_TCPLOG); > - *lsop = NULL; > return (0); > } > This was probably out of scope: > @@ -1322,6 +1341,8 @@ syncache_respond(struct syncache *sc) > * > * 1) path_mtu_discovery is disabled > * 2) the SCF_UNREACH flag has been set > + * > + * XXXAO: The route lookup comment doesn't make sense. > */ > if (V_path_mtu_discovery && ((sc->sc_flags & SCF_UNREACH) == 0)) > ip->ip_off |= IP_DF; I won't be able to test this any time soon, so I can't really comment on the rest, but it *looks* okay. Regards, -- Rui Paulo