From owner-freebsd-transport@freebsd.org  Tue May 17 23:36:26 2016
Return-Path: <owner-freebsd-transport@freebsd.org>
Delivered-To: freebsd-transport@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 57611B3F362
 for <freebsd-transport@mailman.ysv.freebsd.org>;
 Tue, 17 May 2016 23:36:26 +0000 (UTC)
 (envelope-from hiren@strugglingcoder.info)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 45F2919BD
 for <freebsd-transport@freebsd.org>; Tue, 17 May 2016 23:36:26 +0000 (UTC)
 (envelope-from hiren@strugglingcoder.info)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 418C5B3F361; Tue, 17 May 2016 23:36:26 +0000 (UTC)
Delivered-To: transport@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 412B3B3F360
 for <transport@mailman.ysv.freebsd.org>; Tue, 17 May 2016 23:36:26 +0000 (UTC)
 (envelope-from hiren@strugglingcoder.info)
Received: from mail.strugglingcoder.info (strugglingcoder.info
 [104.236.146.68])
 by mx1.freebsd.org (Postfix) with ESMTP id 236E219BA;
 Tue, 17 May 2016 23:36:25 +0000 (UTC)
 (envelope-from hiren@strugglingcoder.info)
Received: from localhost (unknown [10.1.1.3])
 (Authenticated sender: hiren@strugglingcoder.info)
 by mail.strugglingcoder.info (Postfix) with ESMTPA id F053A17D0B;
 Tue, 17 May 2016 16:36:18 -0700 (PDT)
Date: Tue, 17 May 2016 16:36:18 -0700
From: hiren panchasara <hiren@strugglingcoder.info>
To: transport@FreeBSD.org
Cc: glebius@FreeBSD.org, pkelsey@FreeBSD.org, lstewart@FreeBSD.org,
 killing@multiplay.co.uk
Subject: Re: Abrupt reset sent instead of retransmitting a lost packet
Message-ID: <20160517233618.GS44085@strugglingcoder.info>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature"; boundary="MT9SxUWSsctiw0kG"
Content-Disposition: inline
In-Reply-To: <20160513173633.GG44085@strugglingcoder.info>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-BeenThere: freebsd-transport@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussions of transport level network protocols in FreeBSD
 <freebsd-transport.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-transport>, 
 <mailto:freebsd-transport-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-transport/>
List-Post: <mailto:freebsd-transport@freebsd.org>
List-Help: <mailto:freebsd-transport-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-transport>, 
 <mailto:freebsd-transport-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 23:36:26 -0000


--MT9SxUWSsctiw0kG
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 05/13/16 at 10:36P, hiren panchasara wrote:
> https://people.freebsd.org/~hiren/pcaps/tcp_weird_reset.txt
> Something we saw in the wild on 10.2ish systems (server and client both).
>=20
> The most interesting thing can be seen at the end of the file.
>=20
> 3298737767:3298739215 gets lost, client tells us about it via a bunch of
> dupacks with SACK info. It SACKs all the outstanding data but this one
> missing packet. We (server) never retransmits that missing
> packet but rather decide to send a Reset after 0.312582ms. Which somehow
> causes client to pause for 75secs. (which might be another issue and not
> particularly important for this discussion.)
>=20
> What could cause this behavior of sending a reset instead of
> retransmitting a lost packet?=20

Turns out I am finding a lot of "discarded due to memory problems" in
'netstat -sp tcp' and also net.inet.tcp.reass.overflows is rapidly
increasing.=20

This is happening in a very low RTT env (in the range of 0.20ms) and
about 1G of b/w.

So seems like following code is where reass queue is overflowing:
(I've also confirmed with tcp debug that I am seeing this message)

In tcp_reass()

        if ((th->th_seq !=3D tp->rcv_nxt || !TCPS_HAVEESTABLISHED(tp->t_sta=
te)) &&
            tp->t_segqlen >=3D (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1) {
                V_tcp_reass_overflows++;
                TCPSTAT_INC(tcps_rcvmemdrop); =20
                m_freem(m);
                *tlenp =3D 0;
                if ((s =3D tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, N=
ULL))) {
                        log(LOG_DEBUG, "%s; %s: queue limit reached, "
                            "segment dropped\n", s, __func__);
                        free(s, M_TCPLOG);
                }
                return (0);
        }

I know this is a bit older (stable/10) code but I think problem still
remains.

This is the gist of this issue:
tp->t_segqlen >=3D (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1 - evaluating
to be true which makes us drop packets on the floor.=20

I've tried to restore default behavior with:
net.inet.tcp.recvbuf_max: 131072
net.inet.tcp.recvbuf_inc: 16384
net.inet.tcp.sendbuf_max: 131072
net.inet.tcp.sendbuf_inc: 16384
net.inet.tcp.sendbuf_auto: 1
net.inet.tcp.sendspace: 65536
net.inet.tcp.recvspace: 65536

net.inet.tcp.reass.overflows: 156440623
net.inet.tcp.reass.cursegments: 91
net.inet.tcp.reass.maxsegments: 557900

And the app is *not* setting SO_SNDBUF or SO_RCVBUF to keep SB_AUTOSIZE
into effect.

I was assuming the usual auto-sizing would kick in and do the right
thing where we don't run into this issue but something is amiss.

I am seeing a bunch of connections to inter-colo hosts with high Recv-Q
(close to recvbuf_max) from 'netstat -an'.

I found and old issue which seems similar:
https://lists.freebsd.org/pipermail/freebsd-net/2011-August/029491.html

I am cc'ing a few folks who've touched this code of may have some idea.
Any help is appreciated.

Cheers,
Hiren

--MT9SxUWSsctiw0kG
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQF8BAABCgBmBQJXO6rvXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNEUyMEZBMUQ4Nzg4RjNGMTdFNjZGMDI4
QjkyNTBFMTU2M0VERkU1AAoJEIuSUOFWPt/lmokIAJ2p9i7/je1Uo/r5kuSvVHVu
3gEuKEePhcyQkGQ/5HTR6a0OW6qWDuyScO4CEV1yMvF3xZjHg5zGRbhtQx7T57bH
UgdTpQq88dR3t89yN69rwKVs7CofbdjjogmAiTJqKgVLVIRkqCQEozjYV1K3YvHD
zjx6IMuWoMUr/llZOQvjaffzta/8E+/1rXGVdjfrDXcu1/yoVjroGG7Oh4clFNfa
ezlvINF9/QeKzqIKzlCzY/5yDH/lY+iw0Y+vOarjKK3W7umYQsHaUC/S6+3KKA/x
KvER85vjbWrJ4lrF4rIIzzK176n5aROWTAdzAGdzqkpDNC36KanLJEZ6ok7ir2w=
=IxjI
-----END PGP SIGNATURE-----

--MT9SxUWSsctiw0kG--

From owner-freebsd-transport@freebsd.org  Wed May 18 05:09:55 2016
Return-Path: <owner-freebsd-transport@freebsd.org>
Delivered-To: freebsd-transport@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EB51CB40B02
 for <freebsd-transport@mailman.ysv.freebsd.org>;
 Wed, 18 May 2016 05:09:55 +0000 (UTC)
 (envelope-from pkelsey@gmail.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id CA51D16F4
 for <freebsd-transport@freebsd.org>; Wed, 18 May 2016 05:09:55 +0000 (UTC)
 (envelope-from pkelsey@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id C626CB40B01; Wed, 18 May 2016 05:09:55 +0000 (UTC)
Delivered-To: transport@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C3644B40B00
 for <transport@mailman.ysv.freebsd.org>; Wed, 18 May 2016 05:09:55 +0000 (UTC)
 (envelope-from pkelsey@gmail.com)
Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com
 [IPv6:2607:f8b0:4003:c06::232])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 8840016F1;
 Wed, 18 May 2016 05:09:55 +0000 (UTC)
 (envelope-from pkelsey@gmail.com)
Received: by mail-oi0-x232.google.com with SMTP id x201so59491607oif.3;
 Tue, 17 May 2016 22:09:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc; bh=O0OotksBdO/XalXSjHJebFuHQdAT0Ascu078DnDhKYY=;
 b=zj+zoy/nllNYzQLdmRIxMRa5ihUi04XJGNijIRdYLl+MHh8GlKujQ4/OsjcpL7GHNZ
 Z3iPcUJJ4m7YUjv+CE3sKhLSAC15PaA9VlBDLLDovsEKAzyTLt+26GapCQxWAkAEhGKl
 29LClG+NfWyZlTobSMNScTlPbt1y8KZcBlJMiPOwjQ7LJ1zqCUu9QBMhvzhSucp0bi/U
 s2AL761NyuvhaX0KNdYbU7LDa29MO76XH8Da3Ok3R/hPQ11Xo5JNKtd3QPFf6Z8NlaKQ
 6jQP+jrAZ2Dad6y4g2yH7fdYFOLjtBzbdw2NCgTCXdvWw5I6gExSbmopqZ/yv6I1L9FQ
 2gsg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:date
 :message-id:subject:from:to:cc;
 bh=O0OotksBdO/XalXSjHJebFuHQdAT0Ascu078DnDhKYY=;
 b=aOWiBUsOLElueUF31zOCfQg0O6V1HHU1Q+wCEcuio4t14tnbMO2jQiJzxmMFjPxJUa
 6hKM/YQuoQ6RssYNXkwEb5aFugEaIpvYwht/bBP5QJTzES2Gy2ArsLmwl/WT9qAOiELR
 F14vfUcE/vMKH+WXujARPuDFUyPPvLKK8IXu5S06r/o5vLv5rM5ZSe/2iHaLwwNegjF0
 j44qdMOGOKwqVKjcsRMi4vkjUtkiWc3dwMiRnrMphEy3mp6VGJUzy8TECF/CB6mAZ/2o
 BwHDQNiJINOdxXCUdas1AoSGBJ2Sc8jA3kWrP22n3ROQT5ZjUzVA6ucLKk/BMr0RNPQx
 JvpA==
X-Gm-Message-State: AOPr4FWZuV8JN+E+43oRTEfhIB6gVGan7vxo5VJkx+s6ahdhThUfSrSTWD+GFkhoiBVOALjmdgRVKjtRNm0ZpA==
MIME-Version: 1.0
X-Received: by 10.157.44.177 with SMTP id p46mr3022968otb.176.1463548194661;
 Tue, 17 May 2016 22:09:54 -0700 (PDT)
Sender: pkelsey@gmail.com
Received: by 10.157.16.84 with HTTP; Tue, 17 May 2016 22:09:54 -0700 (PDT)
In-Reply-To: <20160517233618.GS44085@strugglingcoder.info>
References: <20160513173633.GG44085@strugglingcoder.info>
 <20160517233618.GS44085@strugglingcoder.info>
Date: Wed, 18 May 2016 01:09:54 -0400
X-Google-Sender-Auth: qVBNL7YLOSU4ZsJNOXpt0prZjHw
Message-ID: <CAD44qMUnEABxN04v0UVHdurwUxd3of-q0ZoGNMa_EcFL_vSbrg@mail.gmail.com>
Subject: Re: Abrupt reset sent instead of retransmitting a lost packet
From: Patrick Kelsey <pkelsey@freebsd.org>
To: hiren panchasara <hiren@strugglingcoder.info>
Cc: transport@freebsd.org, Gleb Smirnoff <glebius@freebsd.org>, 
 Lawrence Stewart <lstewart@freebsd.org>, killing@multiplay.co.uk
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-transport@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussions of transport level network protocols in FreeBSD
 <freebsd-transport.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-transport>, 
 <mailto:freebsd-transport-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-transport/>
List-Post: <mailto:freebsd-transport@freebsd.org>
List-Help: <mailto:freebsd-transport-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-transport>, 
 <mailto:freebsd-transport-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 05:09:56 -0000

On Tue, May 17, 2016 at 7:36 PM, hiren panchasara <
hiren@strugglingcoder.info> wrote:

> On 05/13/16 at 10:36P, hiren panchasara wrote:
> > https://people.freebsd.org/~hiren/pcaps/tcp_weird_reset.txt
> > Something we saw in the wild on 10.2ish systems (server and client both).
> >
> > The most interesting thing can be seen at the end of the file.
> >
> > 3298737767:3298739215 gets lost, client tells us about it via a bunch of
> > dupacks with SACK info. It SACKs all the outstanding data but this one
> > missing packet. We (server) never retransmits that missing
> > packet but rather decide to send a Reset after 0.312582ms. Which somehow
> > causes client to pause for 75secs. (which might be another issue and not
> > particularly important for this discussion.)
> >
> > What could cause this behavior of sending a reset instead of
> > retransmitting a lost packet?
>
> Turns out I am finding a lot of "discarded due to memory problems" in
> 'netstat -sp tcp' and also net.inet.tcp.reass.overflows is rapidly
> increasing.
>
> This is happening in a very low RTT env (in the range of 0.20ms) and
> about 1G of b/w.
>
> So seems like following code is where reass queue is overflowing:
> (I've also confirmed with tcp debug that I am seeing this message)
>
> In tcp_reass()
>
>         if ((th->th_seq != tp->rcv_nxt ||
> !TCPS_HAVEESTABLISHED(tp->t_state)) &&
>             tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1) {
>                 V_tcp_reass_overflows++;
>                 TCPSTAT_INC(tcps_rcvmemdrop);
>                 m_freem(m);
>                 *tlenp = 0;
>                 if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL,
> NULL))) {
>                         log(LOG_DEBUG, "%s; %s: queue limit reached, "
>                             "segment dropped\n", s, __func__);
>                         free(s, M_TCPLOG);
>                 }
>                 return (0);
>         }
>
> I know this is a bit older (stable/10) code but I think problem still
> remains.
>
> This is the gist of this issue:
> tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1 - evaluating
> to be true which makes us drop packets on the floor.
>
> I've tried to restore default behavior with:
> net.inet.tcp.recvbuf_max: 131072
> net.inet.tcp.recvbuf_inc: 16384
> net.inet.tcp.sendbuf_max: 131072
> net.inet.tcp.sendbuf_inc: 16384
> net.inet.tcp.sendbuf_auto: 1
> net.inet.tcp.sendspace: 65536
> net.inet.tcp.recvspace: 65536
>
> net.inet.tcp.reass.overflows: 156440623
> net.inet.tcp.reass.cursegments: 91
> net.inet.tcp.reass.maxsegments: 557900
>
> And the app is *not* setting SO_SNDBUF or SO_RCVBUF to keep SB_AUTOSIZE
> into effect.
>
> I was assuming the usual auto-sizing would kick in and do the right
> thing where we don't run into this issue but something is amiss.
>
> I am seeing a bunch of connections to inter-colo hosts with high Recv-Q
> (close to recvbuf_max) from 'netstat -an'.
>
> I found and old issue which seems similar:
> https://lists.freebsd.org/pipermail/freebsd-net/2011-August/029491.html
>
> I am cc'ing a few folks who've touched this code of may have some idea.
> Any help is appreciated.
>

So far, I mainly just have an initial comment regarding the expectation
that receive buffer auto-sizing would kick in.  For an increment to happen,
allegedly (I haven't fully audited this code path yet) greater than 7/8 of
the current receive buffer size has to be received in one RTT.  Given your
stated bw of ~1G (assuming that means 1 Gbps, on a single connection) and
RTT of 0.2ms, you have ~200 kbits per RTT.  When the receive buffer is at
the initial 64KiB, the threshold for the first increment is ~459 kbits
within one RTT.  To get that first increment, the traffic would have to be
sufficiently bursty.

In the trace you sent, I'm seeing at most 23 or so segments of length 1448
with the same ecr, which is a peak of around ~250 kbits per RTT.

Also, only in-sequence segments that arrive when the reassembly queue is
empty are counted towards the receive buffer auto-increment threshold.  If
you are relying on bursty traffic in order to hit the auto-increment
threshold, and you have packet loss correlated with bursts, then I think it
would be even less likely that you will hit the scaling threshold as the
reassembly queue will be hiding segments from the scaling logic during the
bursts.

Can you quantify 'net.inet.tcp.reass.overflows is increasing rapidly'?

Only segments that fit inside the receive window will be admitted to the
reassembly queue, and thus be subject to the queue overflow check.  Let's
say so_rcv.sb_hiwat is 65536, t_maxseg is 1460, and the actual segment
sizes being received are 1448.  Let's also assume for simplicity that a
single lost packet causes the reassembly queue to being to fill and that no
other packets are lost before that one is delivered.  With those givens,
the reassembly queue will hold at most 44 segments, and it will take 46
segments (45.25 -> 46) to cover the receive window.  Since we must have one
missing segment in order for things to be submitted to the reassembly
queue, of the 46 segments required to fill the receive window, at most 45
of them will be submitted to the reassembly queue.  If the sender runs the
receive window all the way to zero, then only 1 segment, the 45th submitted
to the queue, will result in the queue overflow logic being triggered and
that segment being dropped.  Each retransmit of that final segment that
occurs before the lost segment is retransmitted will result in another
queue overflow (the result is the same if the receive window is 128KiB,
we'd just be talking about the 90th segment instead of the 45th).

So if we are operating with the above parameters and we have a sender that
will run the receive window down to zero, then if the lost packet is
retransmitted before the queue-overflow dropped packet is, we would see
net.inet.tcp.reass.overflows increase by one for each lost packet.  If for
some reason the sender retransmits the queue-overflow dropped packet on
average N times before the initial lost packet is retransmitted and
received, then on average we would see net.inet.tcp.reass.overflows
increment by N for each lost packet.

-Patrick

From owner-freebsd-transport@freebsd.org  Wed May 18 05:35:14 2016
Return-Path: <owner-freebsd-transport@freebsd.org>
Delivered-To: freebsd-transport@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8B251B401D8
 for <freebsd-transport@mailman.ysv.freebsd.org>;
 Wed, 18 May 2016 05:35:14 +0000 (UTC)
 (envelope-from pkelsey@gmail.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 699F4180F
 for <freebsd-transport@freebsd.org>; Wed, 18 May 2016 05:35:14 +0000 (UTC)
 (envelope-from pkelsey@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 653C9B401D7; Wed, 18 May 2016 05:35:14 +0000 (UTC)
Delivered-To: transport@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 64D32B401D6
 for <transport@mailman.ysv.freebsd.org>; Wed, 18 May 2016 05:35:14 +0000 (UTC)
 (envelope-from pkelsey@gmail.com)
Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com
 [IPv6:2607:f8b0:4003:c06::232])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 289D2180C;
 Wed, 18 May 2016 05:35:14 +0000 (UTC)
 (envelope-from pkelsey@gmail.com)
Received: by mail-oi0-x232.google.com with SMTP id x19so60144960oix.2;
 Tue, 17 May 2016 22:35:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc; bh=pEqZpVAd3grUQqhRuIIMZ9WzMeLsX5ELECjEqf5LAxQ=;
 b=BdP8F5I1WcTDATMSDUC3kSHnA94o+KXuwJkQ4R+dagYu6DlV9GCWFdNrne68WdfX6m
 8IzxAaUxQK/tzxKedIW/iVjwZI7MLs3BNWvgkm5Livx/OwZ1j5kfrVXWy0uIn96HerY5
 8RzTOvu0vgoKSLOWg/lP0j8BWdtnFpO90k73C/vzZqL9TkmPUtEYB8N1j5ymP50EWy1B
 FS/kuv4z6CToACQIotNcvsxo7BVy+/aJ9oWnc36NilEjPo9uDI70dypYWzESurIyQSuw
 SiQELG/p/TZubPxD9wJgvLe2TrEhju+d4JzGhRiHSisq+siouPE3KkqBfuhvg+Xxg88e
 6zCg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:date
 :message-id:subject:from:to:cc;
 bh=pEqZpVAd3grUQqhRuIIMZ9WzMeLsX5ELECjEqf5LAxQ=;
 b=XJ4TqhPcFFN04pSgrPb/ov7zyOGRIbm2YpXtwhN0j9OsU/G40BkRRGIU+znvYUPv8m
 UI5kdfF5hqCG1zgzH7g8Xec3ZwFKrPFk1EXmOezPT5K+xL6M/mOjxYTqK3x1fwRNLFWL
 UYsvGNmD2FIOhyds/9+7p701fqddQWwmz6OmYLJmCNFlR5nC5TARTtDl8GuUf3YIKMP9
 ECGGNWGpDORQ/kiwshYGHJEhrDur0GJiKua9JCA8FRimm6uI1FsA6hrZU0g7xOeoESX7
 sYQglIPRyaegcuwvIDFiVW5OHAWpZr4ZC7XSUvjeZb61H9FH95mhqIDFBemwYDB3yDrH
 Pqfw==
X-Gm-Message-State: AOPr4FWdpf+kAg2tSa5nJHRm7kLakbdlqiQMGTEIH9rdav64Rt4Df7h0Jh/Z0lGh98h9DFKai54KVoZaInZ4Vg==
MIME-Version: 1.0
X-Received: by 10.157.56.114 with SMTP id r47mr3013679otd.68.1463549713506;
 Tue, 17 May 2016 22:35:13 -0700 (PDT)
Sender: pkelsey@gmail.com
Received: by 10.157.16.84 with HTTP; Tue, 17 May 2016 22:35:13 -0700 (PDT)
In-Reply-To: <CAD44qMUnEABxN04v0UVHdurwUxd3of-q0ZoGNMa_EcFL_vSbrg@mail.gmail.com>
References: <20160513173633.GG44085@strugglingcoder.info>
 <20160517233618.GS44085@strugglingcoder.info>
 <CAD44qMUnEABxN04v0UVHdurwUxd3of-q0ZoGNMa_EcFL_vSbrg@mail.gmail.com>
Date: Wed, 18 May 2016 01:35:13 -0400
X-Google-Sender-Auth: mG-zZiaMUFAg6Hwg_WbG47DrzU8
Message-ID: <CAD44qMU1e8LhaooOXM047jvcepQ_mxV9TXRdYO+SFOFv_Bu9qQ@mail.gmail.com>
Subject: Re: Abrupt reset sent instead of retransmitting a lost packet
From: Patrick Kelsey <pkelsey@freebsd.org>
To: hiren panchasara <hiren@strugglingcoder.info>
Cc: transport@freebsd.org, Gleb Smirnoff <glebius@freebsd.org>, 
 Lawrence Stewart <lstewart@freebsd.org>, killing@multiplay.co.uk
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-transport@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussions of transport level network protocols in FreeBSD
 <freebsd-transport.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-transport>, 
 <mailto:freebsd-transport-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-transport/>
List-Post: <mailto:freebsd-transport@freebsd.org>
List-Help: <mailto:freebsd-transport-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-transport>, 
 <mailto:freebsd-transport-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 05:35:14 -0000

On Wed, May 18, 2016 at 1:09 AM, Patrick Kelsey <pkelsey@freebsd.org> wrote:

>
>
> On Tue, May 17, 2016 at 7:36 PM, hiren panchasara <
> hiren@strugglingcoder.info> wrote:
>
>> On 05/13/16 at 10:36P, hiren panchasara wrote:
>> > https://people.freebsd.org/~hiren/pcaps/tcp_weird_reset.txt
>> > Something we saw in the wild on 10.2ish systems (server and client
>> both).
>> >
>> > The most interesting thing can be seen at the end of the file.
>> >
>> > 3298737767:3298739215 gets lost, client tells us about it via a bunch of
>> > dupacks with SACK info. It SACKs all the outstanding data but this one
>> > missing packet. We (server) never retransmits that missing
>> > packet but rather decide to send a Reset after 0.312582ms. Which somehow
>> > causes client to pause for 75secs. (which might be another issue and not
>> > particularly important for this discussion.)
>> >
>> > What could cause this behavior of sending a reset instead of
>> > retransmitting a lost packet?
>>
>> Turns out I am finding a lot of "discarded due to memory problems" in
>> 'netstat -sp tcp' and also net.inet.tcp.reass.overflows is rapidly
>> increasing.
>>
>> This is happening in a very low RTT env (in the range of 0.20ms) and
>> about 1G of b/w.
>>
>> So seems like following code is where reass queue is overflowing:
>> (I've also confirmed with tcp debug that I am seeing this message)
>>
>> In tcp_reass()
>>
>>         if ((th->th_seq != tp->rcv_nxt ||
>> !TCPS_HAVEESTABLISHED(tp->t_state)) &&
>>             tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1) {
>>                 V_tcp_reass_overflows++;
>>                 TCPSTAT_INC(tcps_rcvmemdrop);
>>                 m_freem(m);
>>                 *tlenp = 0;
>>                 if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL,
>> NULL))) {
>>                         log(LOG_DEBUG, "%s; %s: queue limit reached, "
>>                             "segment dropped\n", s, __func__);
>>                         free(s, M_TCPLOG);
>>                 }
>>                 return (0);
>>         }
>>
>> I know this is a bit older (stable/10) code but I think problem still
>> remains.
>>
>> This is the gist of this issue:
>> tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1 - evaluating
>> to be true which makes us drop packets on the floor.
>>
>> I've tried to restore default behavior with:
>> net.inet.tcp.recvbuf_max: 131072
>> net.inet.tcp.recvbuf_inc: 16384
>> net.inet.tcp.sendbuf_max: 131072
>> net.inet.tcp.sendbuf_inc: 16384
>> net.inet.tcp.sendbuf_auto: 1
>> net.inet.tcp.sendspace: 65536
>> net.inet.tcp.recvspace: 65536
>>
>> net.inet.tcp.reass.overflows: 156440623
>> net.inet.tcp.reass.cursegments: 91
>> net.inet.tcp.reass.maxsegments: 557900
>>
>> And the app is *not* setting SO_SNDBUF or SO_RCVBUF to keep SB_AUTOSIZE
>> into effect.
>>
>> I was assuming the usual auto-sizing would kick in and do the right
>> thing where we don't run into this issue but something is amiss.
>>
>> I am seeing a bunch of connections to inter-colo hosts with high Recv-Q
>> (close to recvbuf_max) from 'netstat -an'.
>>
>> I found and old issue which seems similar:
>> https://lists.freebsd.org/pipermail/freebsd-net/2011-August/029491.html
>>
>> I am cc'ing a few folks who've touched this code of may have some idea.
>> Any help is appreciated.
>>
>
> So far, I mainly just have an initial comment regarding the expectation
> that receive buffer auto-sizing would kick in.  For an increment to happen,
> allegedly (I haven't fully audited this code path yet) greater than 7/8 of
> the current receive buffer size has to be received in one RTT.  Given your
> stated bw of ~1G (assuming that means 1 Gbps, on a single connection) and
> RTT of 0.2ms, you have ~200 kbits per RTT.  When the receive buffer is at
> the initial 64KiB, the threshold for the first increment is ~459 kbits
> within one RTT.  To get that first increment, the traffic would have to be
> sufficiently bursty.
>
> In the trace you sent, I'm seeing at most 23 or so segments of length 1448
> with the same ecr, which is a peak of around ~250 kbits per RTT.
>

Here is the point where I betrayed that I had read the scaling code a bit
more closely, but was still thinking mostly in terms of what the comments
said was going on.  I had gone and looked at the trace file for how many
segments are transmitted with the same ecr and called that the RTT because
that is what the scaling code does - it evaluates the scaling logic when
there is an increment in the ecr above the base timestamp, but I didn't
fully account for the fact that this new reference frame affects the math.
Looking for a change in the ecr value in order to trigger the 'RTT-based'
scaling logic effectively puts a floor of about 1 timer tick on the time
interval over which the scaling code counts bytes.  So when real RTT is
less than a timer tick or so, the scaling code is counting bytes roughly
over a timer tick (plus the smaller RTT), and when real RTT is greater than
a timer tick or so, the scaling code is counting bytes over an interval
closer to the real RTT.

In this case, the timer tick is probably 1ms, so the scaling code is not
counting bytes over the real RTT of 0.2ms, it's counting them over
something like 1.2ms.  That means at 1 Gbps, there will be 1.2 Mbits per
'scaling-logic-RTT', and we would expect to see that initial receive buffer
increment (as well as a number of others after that one).