Date: Thu, 22 Jun 2017 22:57:01 +0000 From: "Youssef GHORBAL" <youssef.ghorbal@pasteur.fr> To: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Sporadic TCP/RST sent to client Message-ID: <84CB0795-B28E-46DF-9593-4C1BAAB7DDF5@pasteur.fr>
next in thread | raw e-mail | index | archive | help
Hello, I'm having an issue with a FreeBSD 11 based system, sending sporadically T= CP/RST to clients after initial TCP session correctly initiated. The sequence goes this way : 1 Client -> Server : SYN 2 Server -> Client : SYN/ACK 3 Client -> Server : ACK 4 Client -> Server : PSH/ACK (upper protocol data sending starts here) 5 Server -> Client : RST =09 - The problem happens sporadically, same client and same server can commun= icate smoothely on the same service port. But from time to time (hours, som= etime days) the previous sequence happens. - The service running on server is not responsible for the RST sent. The s= ervice was deeply profiled and nothing happens to justify the RST. - tcpdump on the server side assures that packet arrives timely ordered. - the traffic is very light. Some TCP sessions per day. - the server is connected using a lagg enslaving two cxgb interfaces. In my effort to diagnose the problem (try to have a reproductible test cas= e) I noticed that the issue is triggered most likely when those two conditi= ons are met : - the ACK (in step 3) and the PSH/ACK (in step 4) arrive on different lagg= NICs. - the timing between those two packets is sub 10 microseconds. When searching the interwebs I came across a strangely similar issue repor= ted here 7 years ago : https://lists.freebsd.org/pipermail/freebsd-net/2010-August/026029.html (The OP seemed to have resolved his issue changing the netisr policy from = direct to hybrid. but no reference of laggs being used) I'm pretty sure that I'm hitting some race condition, a scenario where due= to multithreading the PSH/ACK is somehow handled before the ACK making the= kernel rising TCP/RST since the initial TCP handshake did'nt finish yet. I've read about netisr work and I was under the impression that even if it= 's SMP enabled it was made to keep prorocol ordering. What's the expected behaviour in this scenario on the netisr side ? How can I push the investigation further ? Youssef Ghorbal
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?84CB0795-B28E-46DF-9593-4C1BAAB7DDF5>