Date: Fri, 20 Mar 2009 01:44:07 GMT From: Renaud Lienhart <renaud@vmware.com> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/132832: tcp_output() might generate invalid TSO frames when len > TCP_MAXWIN - hdrlen - optlen Message-ID: <200903200144.n2K1i7QY017113@www.freebsd.org> Resent-Message-ID: <200903200150.n2K1o1wO021450@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 132832 >Category: kern >Synopsis: tcp_output() might generate invalid TSO frames when len > TCP_MAXWIN - hdrlen - optlen >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Mar 20 01:50:01 UTC 2009 >Closed-Date: >Last-Modified: >Originator: Renaud Lienhart >Release: FreeBSD 7.1 >Organization: VMware, Inc. >Environment: >Description: The tcp_output() routine has an issue when the send window exceeds TCP_MAXWIN and the underlying interface supports TSO. There is a check to ensure the data being pushed doesn't exceed the maximum TSO packet size. If this is the case, "len" is trimmed and "sendalot" is set: --- 8< --- if (tso) { if (len > TCP_MAXWIN - hdrlen - optlen) { len = TCP_MAXWIN - hdrlen - optlen; len = len - (len % (tp->t_maxopd - optlen)); sendalot = 1; --- >8 --- Everything's hunky-dory, until the next "sendalot" iteration; If the remaining window does not require TSO (i.e. len <= tp->t_maxseg), this following piece of code fails to properly reset "tso": --- 8< --- if (len > tp->t_maxseg) { if (<tso_conds>) tso = 1; } else { len = tp->t_maxseg; sendalot = 1; tso = 0; } } --- >8 --- "tso" remains set to 1 and the resulting packet is tagged with CSUM_TSO although it doesn't require TSO. This causes problems with a large number of nics which refuse to send the resulting frame and, in some case, wedge. >How-To-Repeat: Using netperf (or any TCP workload) with a large socket buffer size exposes the issue in a matter of seconds. >Fix: The solution is to always reset "tso" at the beginning of the "sendalot" loop in order to ensure it is not stale. In my patch I also added a KASSERT() which directly catches any problematic frame before it reaches any other layer. Patch attached with submission follows: Index: netinet/tcp_output.c =================================================================== --- netinet/tcp_output.c (revision 190117) +++ netinet/tcp_output.c (working copy) @@ -140,7 +140,7 @@ int idle, sendalot; int sack_rxmit, sack_bytes_rxmt; struct sackhole *p; - int tso = 0; + int tso; struct tcpopt to; #if 0 int maxburst = TCP_MAXBURST; @@ -198,6 +198,7 @@ SEQ_LT(tp->snd_nxt, tp->snd_max)) tcp_sack_adjust(tp); sendalot = 0; + tso = 0; off = tp->snd_nxt - tp->snd_una; sendwin = min(tp->snd_wnd, tp->snd_cwnd); sendwin = min(sendwin, tp->snd_bwnd); @@ -477,7 +478,6 @@ } else { len = tp->t_maxseg; sendalot = 1; - tso = 0; } } if (sack_rxmit) { @@ -996,6 +996,8 @@ * XXX: Fixme: This is currently not the case for IPv6. */ if (tso) { + KASSERT(len > tp->t_maxopd - optlen, + ("%s: len <= tso_segsz", __func__)); m->m_pkthdr.csum_flags = CSUM_TSO; m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen; } >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200903200144.n2K1i7QY017113>