From owner-freebsd-net@FreeBSD.ORG  Thu Nov 20 10:59:22 2008
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 87A4D106564A
	for <freebsd-net@freebsd.org>; Thu, 20 Nov 2008 10:59:22 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 641BD8FC19
	for <freebsd-net@freebsd.org>; Thu, 20 Nov 2008 10:59:20 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: (qmail 45163 invoked from network); 20 Nov 2008 09:24:22 -0000
Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1])
	(envelope-sender <andre@freebsd.org>)
	by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
	for <rpaulo@freebsd.org>; 20 Nov 2008 09:24:22 -0000
Message-ID: <4925430A.9050602@freebsd.org>
Date: Thu, 20 Nov 2008 11:59:22 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Thunderbird 1.5.0.14 (Windows/20071210)
MIME-Version: 1.0
To: Rui Paulo <rpaulo@freebsd.org>
References: <491F2C47.4050500@dlr.de>	<0A4BB2F1-AC9F-4316-94E3-790E2D80F651@freebsd.org>	<49201859.2080605@dlr.de>	<4921B3C6.5020002@freebsd.org>
	<4921F2CD.503@freebsd.org>
	<403932D0-8B62-4C95-8076-A185AC1C69E7@freebsd.org>
In-Reply-To: <403932D0-8B62-4C95-8076-A185AC1C69E7@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-net@freebsd.org, Hartmut Brandt <hartmut.brandt@dlr.de>,
	bz@freebsd.org
Subject: Re: TCP and syncache question
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Nov 2008 10:59:22 -0000

Rui Paulo wrote:
> On 17 Nov 2008, at 22:40, Andre Oppermann wrote:
>> This is a bit more complicated because of interactions with tcp_input()
>> where syncache_expand() is called from.
>>
>> The old code (as of December 2002) behaved slightly different.  It would
>> not remove the syncache entry when (SND.UNA == SEG.ACK) but send a RST.
>> The (RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND) test wasn't done at
>> all.  Instead a socket was opened whenever (SND.UNA == SEG.ACK) 
>> succeeded.
>> This gave way to the "LAND" DoS attack which was mostly fixed with a test
>> for (RCV.IRS < SEG.SEQ).
>>
>> See the attached patch for fixed version of syncache_expand().  This 
>> patch
>> is untested though.  My development machine is currently down.  Harti, 
>> Rui
>> and Bjoern, please have a look at the patch and review it.
>>
>> --Andre
>>
>> --- tcp_input.c.1.390    Mon Nov 17 21:33:25 2008
>> +++ tcp_input.c.1.390.mod    Mon Nov 17 21:35:22 2008
>> @@ -642,10 +642,13 @@ findpcb:
>>             if (!syncache_expand(&inc, &to, th, &so, m)) {
>>                 /*
>>                  * No syncache entry or ACK was not
>> -                 * for our SYN/ACK.  Send a RST.
>> +                 * for our SYN/ACK.  Send a RST or
>> +                 * an ACK for re-synchronization.
>>                  * NB: syncache did its own logging
>>                  * of the failure cause.
>>                  */
>> +                if (so == 1)
>> +                    goto dropunlock;
>>                 rstreason = BANDLIM_RST_OPENPORT;
>>                 goto dropwithreset;
>>             }
>> --- tcp_syncache.c.1.160    Mon Nov 17 16:49:01 2008
>> +++ tcp_syncache.c.1.160.mod    Mon Nov 17 23:35:39 2008
>> @@ -817,59 +817,47 @@ syncache_expand(struct in_conninfo *inc,
>>     INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>>     KASSERT((th->th_flags & (TH_RST|TH_ACK|TH_SYN)) == TH_ACK,
>>         ("%s: can handle only ACK", __func__));
>> +    *lsop = NULL;
>>
>>     sc = syncache_lookup(inc, &sch);    /* returns locked sch */
>>     SCH_LOCK_ASSERT(sch);
>>     if (sc == NULL) {
>>         /*
>>          * There is no syncache entry, so see if this ACK is
>> -         * a returning syncookie.  To do this, first:
>> -         *  A. See if this socket has had a syncache entry dropped in
>> -         *     the past.  We don't want to accept a bogus syncookie
>> -         *     if we've never received a SYN.
>> -         *  B. check that the syncookie is valid.  If it is, then
>> -         *     cobble up a fake syncache entry, and return.
>> +         * a returning syncookie.  If the syncookie is valid,
>> +         * cobble up a fake syncache entry and create a socket.
>> +         *
>> +         * NB: Syncache head is locked for the syncookie access.
>>          */
>>         if (!tcp_syncookies) {
>> -            SCH_UNLOCK(sch);
>>             if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
>>                 log(LOG_DEBUG, "%s; %s: Spurious ACK, "
>>                     "segment rejected (syncookies disabled)\n",
>>                     s, __func__);
>> -            goto failed;
>> +            goto sendrst;
>>         }
>>         bzero(&scs, sizeof(scs));
>>         sc = syncookie_lookup(inc, sch, &scs, to, th, *lsop);
>> -        SCH_UNLOCK(sch);
>>         if (sc == NULL) {
>>             if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
>>                 log(LOG_DEBUG, "%s; %s: Segment failed "
>>                     "SYNCOOKIE authentication, segment rejected "
>>                     "(probably spoofed)\n", s, __func__);
>> -            goto failed;
>> +            goto sendrst;
>>         }
>> -    } else {
>> -        /* Pull out the entry to unlock the bucket row. */
>> -        TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash);
>> -        sch->sch_length--;
>> -        V_tcp_syncache.cache_count--;
>> -        SCH_UNLOCK(sch);
>> +        goto expand;        /* fully validated through syncookie */
>>     }
>> +    SCH_LOCK_ASSERT(sch);
> 
> Why do you need this assert?

Removed.  Was from an earlier iteration with different locking and gotos.

>>     /*
>>      * Segment validation:
>> -     * ACK must match our initial sequence number + 1 (the SYN|ACK).
>> -     */
>> -    if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) {
>> -        if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
>> -            log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment "
>> -                "rejected\n", s, __func__, th->th_ack, sc->sc_iss);
>> -        goto failed;
>> -    }
>> -
>> -    /*
>> +     *
>>      * The SEQ must fall in the window starting at the received
>>      * initial receive sequence number + 1 (the SYN).
>> +     * If not the segment may be from an earlier connection.  We send
>> +     * an ACK to re-synchronize the connection and keep the syncache
>> +     * entry without ajusting its timers.
>> +     * See RFC793 page 69, first check sequence number [SYN_RECEIVED].
>>      */
>>     if ((SEQ_LEQ(th->th_seq, sc->sc_irs) ||
>>         SEQ_GT(th->th_seq, sc->sc_irs + sc->sc_wnd)) &&
>> @@ -877,14 +865,41 @@ syncache_expand(struct in_conninfo *inc,
>>         if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
>>             log(LOG_DEBUG, "%s; %s: SEQ %u != IRS+1 %u, segment "
>>                 "rejected\n", s, __func__, th->th_seq, sc->sc_irs);
>> -        goto failed;
>> +        (void) syncache_respond(sc);
>> +        *lsop = 1;            /* prevent RST */
>> +        goto sendrstkeep;
>>     }
>>
>> +    /*
>> +     * ACK must match our initial sequence number + 1 (the SYN|ACK).
>> +     * If not the segment may be from an earlier connection.  We send
>> +     * a RST but keep the syncache entry without ajusting its timers.
>> +     * See RFC793 page 72, fifth check the ACK field, [SYN_RECEIVED].
>> +     */
>> +    if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) {
>> +        if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
>> +            log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment "
>> +                "rejected\n", s, __func__, th->th_ack, sc->sc_iss);
>> +        goto sendrstkeep;
>> +    }
>> +
>> +    /*
>> +     * Remove the entry to unlock the bucket row.
>> +     * Tests from now on are fatal and remove the syncache entry.
>> +     */
>> +    TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash);
>> +    sch->sch_length--;
>> +    V_tcp_syncache.cache_count--;
>> +
>> +    /*
>> +     * If timestamps were not negotiated they must not show up later.
>> +     * See RFC1312bis, section 1.3, second paragraph
>> +     */
>>     if (!(sc->sc_flags & SCF_TIMESTAMP) && (to->to_flags & TOF_TS)) {
>>         if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
>>             log(LOG_DEBUG, "%s; %s: Timestamp not expected, "
>>                 "segment rejected\n", s, __func__);
>> -        goto failed;
>> +        goto sendrst;
>>     }
>>     /*
>>      * If timestamps were negotiated the reflected timestamp
>> @@ -896,9 +911,11 @@ syncache_expand(struct in_conninfo *inc,
>>             log(LOG_DEBUG, "%s; %s: TSECR %u != TS %u, "
>>                 "segment rejected\n",
>>                 s, __func__, to->to_tsecr, sc->sc_ts);
>> -        goto failed;
>> +        goto sendrst;
>>     }
>>
>> +expand:
>> +    SCH_UNLOCK(sch);
>>     *lsop = syncache_socket(sc, *lsop, m);
>>
>>     if (*lsop == NULL)
>> @@ -906,16 +923,18 @@ syncache_expand(struct in_conninfo *inc,
>>     else
>>         V_tcpstat.tcps_sc_completed++;
>>
>> -/* how do we find the inp for the new socket? */
>>     if (sc != &scs)
>>         syncache_free(sc);
>>     return (1);
>> -failed:
>> -    if (sc != NULL && sc != &scs)
>> +
>> +sendrst:
>> +    if (sc != &scs)
>>         syncache_free(sc);
>> +sendrstkeep:
>> +    SCH_LOCK_ASSERT(sch);
>> +    SCH_UNLOCK(sch);
> 
> Why do we need an assert before an unlock?

This is customary in at least the TCP code.  A failed assert provides more
and better diagnostics than a crash-and-burn failed unlock.

>>     if (s != NULL)
>>         free(s, M_TCPLOG);
>> -    *lsop = NULL;
>>     return (0);
>> }
>>
> 
> This was probably out of scope:

yep.  Put it in while reading the code.

>> @@ -1322,6 +1341,8 @@ syncache_respond(struct syncache *sc)
>>          *
>>          *    1) path_mtu_discovery is disabled
>>          *    2) the SCF_UNREACH flag has been set
>> +         *
>> +         * XXXAO: The route lookup comment doesn't make sense.
>>          */
>>         if (V_path_mtu_discovery && ((sc->sc_flags & SCF_UNREACH) == 0))
>>                ip->ip_off |= IP_DF;
> 
> 
> 
> I won't be able to test this any time soon, so I can't really comment on 
> the rest, but it *looks* okay.

-- 
Andre