From owner-freebsd-bugs@FreeBSD.ORG Fri Mar 20 01:50:01 2009 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ED56D1065670 for ; Fri, 20 Mar 2009 01:50:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C88238FC16 for ; Fri, 20 Mar 2009 01:50:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n2K1o1xe021451 for ; Fri, 20 Mar 2009 01:50:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n2K1o1wO021450; Fri, 20 Mar 2009 01:50:01 GMT (envelope-from gnats) Resent-Date: Fri, 20 Mar 2009 01:50:01 GMT Resent-Message-Id: <200903200150.n2K1o1wO021450@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Renaud Lienhart Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A9C8106566B for ; Fri, 20 Mar 2009 01:44:08 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 34C9D8FC17 for ; Fri, 20 Mar 2009 01:44:08 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n2K1i7FG017114 for ; Fri, 20 Mar 2009 01:44:07 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id n2K1i7QY017113; Fri, 20 Mar 2009 01:44:07 GMT (envelope-from nobody) Message-Id: <200903200144.n2K1i7QY017113@www.freebsd.org> Date: Fri, 20 Mar 2009 01:44:07 GMT From: Renaud Lienhart To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/132832: tcp_output() might generate invalid TSO frames when len > TCP_MAXWIN - hdrlen - optlen X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Mar 2009 01:50:02 -0000 >Number: 132832 >Category: kern >Synopsis: tcp_output() might generate invalid TSO frames when len > TCP_MAXWIN - hdrlen - optlen >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Mar 20 01:50:01 UTC 2009 >Closed-Date: >Last-Modified: >Originator: Renaud Lienhart >Release: FreeBSD 7.1 >Organization: VMware, Inc. >Environment: >Description: The tcp_output() routine has an issue when the send window exceeds TCP_MAXWIN and the underlying interface supports TSO. There is a check to ensure the data being pushed doesn't exceed the maximum TSO packet size. If this is the case, "len" is trimmed and "sendalot" is set: --- 8< --- if (tso) { if (len > TCP_MAXWIN - hdrlen - optlen) { len = TCP_MAXWIN - hdrlen - optlen; len = len - (len % (tp->t_maxopd - optlen)); sendalot = 1; --- >8 --- Everything's hunky-dory, until the next "sendalot" iteration; If the remaining window does not require TSO (i.e. len <= tp->t_maxseg), this following piece of code fails to properly reset "tso": --- 8< --- if (len > tp->t_maxseg) { if () tso = 1; } else { len = tp->t_maxseg; sendalot = 1; tso = 0; } } --- >8 --- "tso" remains set to 1 and the resulting packet is tagged with CSUM_TSO although it doesn't require TSO. This causes problems with a large number of nics which refuse to send the resulting frame and, in some case, wedge. >How-To-Repeat: Using netperf (or any TCP workload) with a large socket buffer size exposes the issue in a matter of seconds. >Fix: The solution is to always reset "tso" at the beginning of the "sendalot" loop in order to ensure it is not stale. In my patch I also added a KASSERT() which directly catches any problematic frame before it reaches any other layer. Patch attached with submission follows: Index: netinet/tcp_output.c =================================================================== --- netinet/tcp_output.c (revision 190117) +++ netinet/tcp_output.c (working copy) @@ -140,7 +140,7 @@ int idle, sendalot; int sack_rxmit, sack_bytes_rxmt; struct sackhole *p; - int tso = 0; + int tso; struct tcpopt to; #if 0 int maxburst = TCP_MAXBURST; @@ -198,6 +198,7 @@ SEQ_LT(tp->snd_nxt, tp->snd_max)) tcp_sack_adjust(tp); sendalot = 0; + tso = 0; off = tp->snd_nxt - tp->snd_una; sendwin = min(tp->snd_wnd, tp->snd_cwnd); sendwin = min(sendwin, tp->snd_bwnd); @@ -477,7 +478,6 @@ } else { len = tp->t_maxseg; sendalot = 1; - tso = 0; } } if (sack_rxmit) { @@ -996,6 +996,8 @@ * XXX: Fixme: This is currently not the case for IPv6. */ if (tso) { + KASSERT(len > tp->t_maxopd - optlen, + ("%s: len <= tso_segsz", __func__)); m->m_pkthdr.csum_flags = CSUM_TSO; m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen; } >Release-Note: >Audit-Trail: >Unformatted: