Date: Wed, 18 May 2016 01:09:54 -0400 From: Patrick Kelsey <pkelsey@freebsd.org> To: hiren panchasara <hiren@strugglingcoder.info> Cc: transport@freebsd.org, Gleb Smirnoff <glebius@freebsd.org>, Lawrence Stewart <lstewart@freebsd.org>, killing@multiplay.co.uk Subject: Re: Abrupt reset sent instead of retransmitting a lost packet Message-ID: <CAD44qMUnEABxN04v0UVHdurwUxd3of-q0ZoGNMa_EcFL_vSbrg@mail.gmail.com> In-Reply-To: <20160517233618.GS44085@strugglingcoder.info> References: <20160513173633.GG44085@strugglingcoder.info> <20160517233618.GS44085@strugglingcoder.info>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, May 17, 2016 at 7:36 PM, hiren panchasara < hiren@strugglingcoder.info> wrote: > On 05/13/16 at 10:36P, hiren panchasara wrote: > > https://people.freebsd.org/~hiren/pcaps/tcp_weird_reset.txt > > Something we saw in the wild on 10.2ish systems (server and client both). > > > > The most interesting thing can be seen at the end of the file. > > > > 3298737767:3298739215 gets lost, client tells us about it via a bunch of > > dupacks with SACK info. It SACKs all the outstanding data but this one > > missing packet. We (server) never retransmits that missing > > packet but rather decide to send a Reset after 0.312582ms. Which somehow > > causes client to pause for 75secs. (which might be another issue and not > > particularly important for this discussion.) > > > > What could cause this behavior of sending a reset instead of > > retransmitting a lost packet? > > Turns out I am finding a lot of "discarded due to memory problems" in > 'netstat -sp tcp' and also net.inet.tcp.reass.overflows is rapidly > increasing. > > This is happening in a very low RTT env (in the range of 0.20ms) and > about 1G of b/w. > > So seems like following code is where reass queue is overflowing: > (I've also confirmed with tcp debug that I am seeing this message) > > In tcp_reass() > > if ((th->th_seq != tp->rcv_nxt || > !TCPS_HAVEESTABLISHED(tp->t_state)) && > tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1) { > V_tcp_reass_overflows++; > TCPSTAT_INC(tcps_rcvmemdrop); > m_freem(m); > *tlenp = 0; > if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, > NULL))) { > log(LOG_DEBUG, "%s; %s: queue limit reached, " > "segment dropped\n", s, __func__); > free(s, M_TCPLOG); > } > return (0); > } > > I know this is a bit older (stable/10) code but I think problem still > remains. > > This is the gist of this issue: > tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1 - evaluating > to be true which makes us drop packets on the floor. > > I've tried to restore default behavior with: > net.inet.tcp.recvbuf_max: 131072 > net.inet.tcp.recvbuf_inc: 16384 > net.inet.tcp.sendbuf_max: 131072 > net.inet.tcp.sendbuf_inc: 16384 > net.inet.tcp.sendbuf_auto: 1 > net.inet.tcp.sendspace: 65536 > net.inet.tcp.recvspace: 65536 > > net.inet.tcp.reass.overflows: 156440623 > net.inet.tcp.reass.cursegments: 91 > net.inet.tcp.reass.maxsegments: 557900 > > And the app is *not* setting SO_SNDBUF or SO_RCVBUF to keep SB_AUTOSIZE > into effect. > > I was assuming the usual auto-sizing would kick in and do the right > thing where we don't run into this issue but something is amiss. > > I am seeing a bunch of connections to inter-colo hosts with high Recv-Q > (close to recvbuf_max) from 'netstat -an'. > > I found and old issue which seems similar: > https://lists.freebsd.org/pipermail/freebsd-net/2011-August/029491.html > > I am cc'ing a few folks who've touched this code of may have some idea. > Any help is appreciated. > So far, I mainly just have an initial comment regarding the expectation that receive buffer auto-sizing would kick in. For an increment to happen, allegedly (I haven't fully audited this code path yet) greater than 7/8 of the current receive buffer size has to be received in one RTT. Given your stated bw of ~1G (assuming that means 1 Gbps, on a single connection) and RTT of 0.2ms, you have ~200 kbits per RTT. When the receive buffer is at the initial 64KiB, the threshold for the first increment is ~459 kbits within one RTT. To get that first increment, the traffic would have to be sufficiently bursty. In the trace you sent, I'm seeing at most 23 or so segments of length 1448 with the same ecr, which is a peak of around ~250 kbits per RTT. Also, only in-sequence segments that arrive when the reassembly queue is empty are counted towards the receive buffer auto-increment threshold. If you are relying on bursty traffic in order to hit the auto-increment threshold, and you have packet loss correlated with bursts, then I think it would be even less likely that you will hit the scaling threshold as the reassembly queue will be hiding segments from the scaling logic during the bursts. Can you quantify 'net.inet.tcp.reass.overflows is increasing rapidly'? Only segments that fit inside the receive window will be admitted to the reassembly queue, and thus be subject to the queue overflow check. Let's say so_rcv.sb_hiwat is 65536, t_maxseg is 1460, and the actual segment sizes being received are 1448. Let's also assume for simplicity that a single lost packet causes the reassembly queue to being to fill and that no other packets are lost before that one is delivered. With those givens, the reassembly queue will hold at most 44 segments, and it will take 46 segments (45.25 -> 46) to cover the receive window. Since we must have one missing segment in order for things to be submitted to the reassembly queue, of the 46 segments required to fill the receive window, at most 45 of them will be submitted to the reassembly queue. If the sender runs the receive window all the way to zero, then only 1 segment, the 45th submitted to the queue, will result in the queue overflow logic being triggered and that segment being dropped. Each retransmit of that final segment that occurs before the lost segment is retransmitted will result in another queue overflow (the result is the same if the receive window is 128KiB, we'd just be talking about the 90th segment instead of the 45th). So if we are operating with the above parameters and we have a sender that will run the receive window down to zero, then if the lost packet is retransmitted before the queue-overflow dropped packet is, we would see net.inet.tcp.reass.overflows increase by one for each lost packet. If for some reason the sender retransmits the queue-overflow dropped packet on average N times before the initial lost packet is retransmitted and received, then on average we would see net.inet.tcp.reass.overflows increment by N for each lost packet. -Patrick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAD44qMUnEABxN04v0UVHdurwUxd3of-q0ZoGNMa_EcFL_vSbrg>