Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Apr 2018 10:32:26 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 227760] Race condition in syncache_lookup and syncache_insert in TCP Handshake
Message-ID:  <bug-227760-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D227760

            Bug ID: 227760
           Summary: Race condition in syncache_lookup and syncache_insert
                    in TCP Handshake
           Product: Base System
           Version: CURRENT
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: harsh@chelsio.com

Created attachment 192796
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D192796&action=
=3Dedit
server_100.sh

Some of TCP connection Resets when we try to establish multiple connections
with Chelsio TOE Driver.

Test Setup Details:

2 Machines connected Back 2 Back with Chelsio T6 Adapter.

OS : FreeBSD-current.

Application Used: openssl s_time/s_server. Each s_server can handle single
client(s_time) connection.


Steps to Re-produce the issue.

Server:

1) kldload if_cxgbe
2) kldload t4_tom
3) ifconfig cc0 1.0.0.9/24 toe up
4) sh server_100.sh (file attached). It will start ~200 s_server connection.


Client:

1) kldload if_cxgbe
2) kldload t4_tom
3) ifconfig cc0 1.0.0.35 toe up
4) ./bash_client_100.sh(file attached). It will submit http GET request for
File transfer and repeats the same for given time interval.


Problem:

After some time some of connections will reset.
On debugging We observed that syncache_lookup() in syncache_expand() fails
while handling ack(3rd hadshake message) from client. On failure following =
code
snippet in syncache_expand sends reset

if (!V_tcp_syncookiesonly &&
                    sch->sch_last_overflow < time_uptime - SYNCOOKIE_LIFETI=
ME)
{
                        SCH_UNLOCK(sch);
                        if ((s =3D tcp_log_addrs(inc, th, NULL, NULL)))
                                log(LOG_DEBUG, "%s; %s: Spurious ACK, "
                                    "segment rejected (no syncache entry)\n=
",
                                    s, __func__);
                        goto failed;
                }

Detailed steps:

Client sends SYNC to server =3D=3D> 1 server in syncache_add calls
syncache_respond.
                                2 syncache_respond will send SYNC+ACK to cl=
ient
                            <=3D=3D SYNC+ACK=20
                                3 thread handling SYNC will context switch=
=20=20=20=20=20=20
                                  because of that syncache_insert()will not=
 get
                                  a chance to add entry.=20
                        ACK =3D=3D> another thread in server will start han=
dling
                                ack message. During syncache_expand lookup =
will
                                fail and RESET will be sent to client.

Following workarounds tried avoid this=20

1) Setting sysctl net.inet.tcp.syncookies_only=3D1 avoids this issue.
2) Following hack to make sure insert happens before sending SYNC+ACK also
works.

@@ -1558,17 +1575,23 @@ syncache_add(struct in_conninfo *inc, struct tcpopt
*to,
 struct tcphdr *th,
--More--(byte 11894)        /*
         * Do a standard 3-way handshake.
         */
+       if (V_tcp_syncookies && V_tcp_syncookiesonly && sc !=3D &scs) {
+               // compilation error
+               rv =3D rv +1;
+               rv =3D rv - 1;
+       }else if (sc !=3D &scs)
+               syncache_insert(sc, sch);   /* locks and unlocks sch */
+
        if (syncache_respond(sc, sch, 0, m) =3D=3D 0) {
                if (V_tcp_syncookies && V_tcp_syncookiesonly && sc !=3D &sc=
s)
                        syncache_free(sc);
-               else if (sc !=3D &scs)
-                       syncache_insert(sc, sch);   /* locks and unlocks sc=
h */
-               TCPSTAT_INC(tcps_sndacks);
-               TCPSTAT_INC(tcps_sndtotal);
        } else {
-               if (sc !=3D &scs)
-                       syncache_free(sc);
-               TCPSTAT_INC(tcps_sc_dropped);
+                if (sc !=3D &scs) {
+                       SCH_LOCK(sch);
+                       syncache_drop(sc, sch);   /* locks and unlocks sch =
*/
+                       SCH_UNLOCK(sch);
+                       TCPSTAT_INC(tcps_sc_dropped);
+               }
        }


netstat -sp tcp at the time of issue

42824 syncache entries added
                12 retransmitted
                0 dupsyn
                0 dropped
                42820 completed
                0 bucket overflow
                0 cache overflow
                0 reset
                4 stale
                0 aborted
                0 badack
                0 unreach
                0 zone failures

dmesg log with "sysctl net.inet.tcp.log_debug=3D1"

TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021 tcpflags 0x10<ACK>; syncache_expand:
Spurious ACK, segment rejected (no syncache ent     ry)
TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021; syncache_timer: Response timeout,
retransmitting (1) SYN|ACK
TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021; syncache_timer: Response timeout,
retransmitting (2) SYN|ACK
TCP: [1.0.0.35]:46690 to [1.0.0.9]:1021; syncache_timer: Response timeout,
retransmitting (3) SYN|ACK

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-227760-227>