Date: Tue, 17 May 2016 16:36:18 -0700 From: hiren panchasara <hiren@strugglingcoder.info> To: transport@FreeBSD.org Cc: glebius@FreeBSD.org, pkelsey@FreeBSD.org, lstewart@FreeBSD.org, killing@multiplay.co.uk Subject: Re: Abrupt reset sent instead of retransmitting a lost packet Message-ID: <20160517233618.GS44085@strugglingcoder.info> In-Reply-To: <20160513173633.GG44085@strugglingcoder.info>
next in thread | previous in thread | raw e-mail | index | archive | help
--MT9SxUWSsctiw0kG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 05/13/16 at 10:36P, hiren panchasara wrote: > https://people.freebsd.org/~hiren/pcaps/tcp_weird_reset.txt > Something we saw in the wild on 10.2ish systems (server and client both). >=20 > The most interesting thing can be seen at the end of the file. >=20 > 3298737767:3298739215 gets lost, client tells us about it via a bunch of > dupacks with SACK info. It SACKs all the outstanding data but this one > missing packet. We (server) never retransmits that missing > packet but rather decide to send a Reset after 0.312582ms. Which somehow > causes client to pause for 75secs. (which might be another issue and not > particularly important for this discussion.) >=20 > What could cause this behavior of sending a reset instead of > retransmitting a lost packet?=20 Turns out I am finding a lot of "discarded due to memory problems" in 'netstat -sp tcp' and also net.inet.tcp.reass.overflows is rapidly increasing.=20 This is happening in a very low RTT env (in the range of 0.20ms) and about 1G of b/w. So seems like following code is where reass queue is overflowing: (I've also confirmed with tcp debug that I am seeing this message) In tcp_reass() if ((th->th_seq !=3D tp->rcv_nxt || !TCPS_HAVEESTABLISHED(tp->t_sta= te)) && tp->t_segqlen >=3D (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1) { V_tcp_reass_overflows++; TCPSTAT_INC(tcps_rcvmemdrop); =20 m_freem(m); *tlenp =3D 0; if ((s =3D tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, N= ULL))) { log(LOG_DEBUG, "%s; %s: queue limit reached, " "segment dropped\n", s, __func__); free(s, M_TCPLOG); } return (0); } I know this is a bit older (stable/10) code but I think problem still remains. This is the gist of this issue: tp->t_segqlen >=3D (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1 - evaluating to be true which makes us drop packets on the floor.=20 I've tried to restore default behavior with: net.inet.tcp.recvbuf_max: 131072 net.inet.tcp.recvbuf_inc: 16384 net.inet.tcp.sendbuf_max: 131072 net.inet.tcp.sendbuf_inc: 16384 net.inet.tcp.sendbuf_auto: 1 net.inet.tcp.sendspace: 65536 net.inet.tcp.recvspace: 65536 net.inet.tcp.reass.overflows: 156440623 net.inet.tcp.reass.cursegments: 91 net.inet.tcp.reass.maxsegments: 557900 And the app is *not* setting SO_SNDBUF or SO_RCVBUF to keep SB_AUTOSIZE into effect. I was assuming the usual auto-sizing would kick in and do the right thing where we don't run into this issue but something is amiss. I am seeing a bunch of connections to inter-colo hosts with high Recv-Q (close to recvbuf_max) from 'netstat -an'. I found and old issue which seems similar: https://lists.freebsd.org/pipermail/freebsd-net/2011-August/029491.html I am cc'ing a few folks who've touched this code of may have some idea. Any help is appreciated. Cheers, Hiren --MT9SxUWSsctiw0kG Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAABCgBmBQJXO6rvXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNEUyMEZBMUQ4Nzg4RjNGMTdFNjZGMDI4 QjkyNTBFMTU2M0VERkU1AAoJEIuSUOFWPt/lmokIAJ2p9i7/je1Uo/r5kuSvVHVu 3gEuKEePhcyQkGQ/5HTR6a0OW6qWDuyScO4CEV1yMvF3xZjHg5zGRbhtQx7T57bH UgdTpQq88dR3t89yN69rwKVs7CofbdjjogmAiTJqKgVLVIRkqCQEozjYV1K3YvHD zjx6IMuWoMUr/llZOQvjaffzta/8E+/1rXGVdjfrDXcu1/yoVjroGG7Oh4clFNfa ezlvINF9/QeKzqIKzlCzY/5yDH/lY+iw0Y+vOarjKK3W7umYQsHaUC/S6+3KKA/x KvER85vjbWrJ4lrF4rIIzzK176n5aROWTAdzAGdzqkpDNC36KanLJEZ6ok7ir2w= =IxjI -----END PGP SIGNATURE----- --MT9SxUWSsctiw0kG--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160517233618.GS44085>