Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Jun 2017 22:57:01 +0000
From:      "Youssef  GHORBAL" <youssef.ghorbal@pasteur.fr>
To:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Sporadic TCP/RST sent to client
Message-ID:  <84CB0795-B28E-46DF-9593-4C1BAAB7DDF5@pasteur.fr>

next in thread | raw e-mail | index | archive | help
Hello,

	I'm having an issue with a FreeBSD 11 based system, sending sporadically T=
CP/RST to clients after initial TCP session correctly initiated.
	The sequence goes this way :

	1 Client -> Server : SYN
	2 Server -> Client : SYN/ACK
	3 Client -> Server : ACK
	4 Client -> Server : PSH/ACK (upper protocol data sending starts here)
	5 Server -> Client : RST
=09
	- The problem happens sporadically, same client and same server can commun=
icate smoothely on the same service port. But from time to time (hours, som=
etime days) the previous sequence happens.
	- The service running on server is not responsible for the RST sent. The s=
ervice was deeply profiled and nothing happens to justify the RST.
	- tcpdump on the server side assures that packet arrives timely ordered.
	- the traffic is very light. Some TCP sessions per day.
	- the server is connected using a lagg enslaving two cxgb interfaces.

	In my effort to diagnose the problem (try to have a reproductible test cas=
e) I noticed that the issue is triggered most likely when those two conditi=
ons are met :
	- the ACK (in step 3) and the PSH/ACK (in step 4) arrive on different lagg=
 NICs.
	- the timing between those two packets is sub 10 microseconds.

	When searching the interwebs I came across a strangely similar issue repor=
ted here 7 years ago :
	https://lists.freebsd.org/pipermail/freebsd-net/2010-August/026029.html

	(The OP seemed to have resolved his issue changing the netisr policy from =
direct to hybrid. but no reference of laggs being used)

	I'm pretty sure that I'm hitting some race condition, a scenario where due=
 to multithreading the PSH/ACK is somehow handled before the ACK making the=
 kernel rising TCP/RST since the initial TCP handshake did'nt finish yet.

	I've read about netisr work and I was under the impression that even if it=
's SMP enabled it was made to keep prorocol ordering.

	What's the expected behaviour in this scenario on the netisr side ?
	How can I push the investigation further ?

Youssef Ghorbal





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?84CB0795-B28E-46DF-9593-4C1BAAB7DDF5>