Date: Tue, 27 Jun 2017 09:15:36 +0000 From: "Youssef GHORBAL" <youssef.ghorbal@pasteur.fr> To: Matt Joras <matt.joras@gmail.com> Cc: Navdeep Parhar <nparhar@gmail.com>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: Sporadic TCP/RST sent to client Message-ID: <5ABA962E-A90A-4C25-A5A7-EE5CF66FFDD4@pasteur.fr> In-Reply-To: <CADdTf%2BgeCy4naC5jJ6UhTm7-n9c6XKpgRs96EsXGq-UVjSn8MQ@mail.gmail.com> References: <84CB0795-B28E-46DF-9593-4C1BAAB7DDF5@pasteur.fr> <CAPFoGT8sthFOm=vOiFb9%2B2BM=%2BBqtREz1SrOm_UiVge81CrYrw@mail.gmail.com> <CADdTf%2BgeCy4naC5jJ6UhTm7-n9c6XKpgRs96EsXGq-UVjSn8MQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Imagine this set up : freebsd host port0 <-> switch 1 <-> linux host port0 freebsd host port1 <-> switch 2 <-> linux host port1 On the linux box, port 0&1 are enslaved in a bond with a RR algorithm (Roun= d Robin) On the freebsd box, port 0&1 are enslaved in a lagg. switchs 1&2 are configured for doing MLAG. The Linux box disapatchs packets on both NICs (since the RR algo dictates t= hat) packets are dispatched in order. Packets outgoing on port0 gets handled by switch1 and hits the freebsd box = on port 0 Packets outgoing on port1 gets handled by switch2 and hits the freebsd box = on port 1 As I stated earlier, from the tcpdump traces I've done on the freebsd box (= both on the lagg interface and the actual ports) packets do arrive ordered = but on different NICs and it works great until the elapes times start to be= around microsecond. I don't really have control over the Linux box to make them use other hash = algo (but I'm stil trying) Youssef ------------------------ > On 27 Jun 2017, at 00:13, Matt Joras <matt.joras@gmail.com> wrote: >=20 > Out of curiosity, what sort of lagg setup are you using that's causing > the TCP packets to be split across the two lagg interfaces? >=20 > Matt >=20 > On Mon, Jun 26, 2017 at 1:35 PM, Navdeep Parhar <nparhar@gmail.com> wrote= : >> On Thu, Jun 22, 2017 at 3:57 PM, Youssef GHORBAL >> <youssef.ghorbal@pasteur.fr> wrote: >>> Hello, >>>=20 >>> I'm having an issue with a FreeBSD 11 based system, sending spor= adically TCP/RST to clients after initial TCP session correctly initiated. >>> The sequence goes this way : >>>=20 >>> 1 Client -> Server : SYN >>> 2 Server -> Client : SYN/ACK >>> 3 Client -> Server : ACK >>> 4 Client -> Server : PSH/ACK (upper protocol data sending starts= here) >>> 5 Server -> Client : RST >>>=20 >>> - The problem happens sporadically, same client and same server = can communicate smoothely on the same service port. But from time to time (= hours, sometime days) the previous sequence happens. >>> - The service running on server is not responsible for the RST s= ent. The service was deeply profiled and nothing happens to justify the RST= . >>> - tcpdump on the server side assures that packet arrives timely = ordered. >>> - the traffic is very light. Some TCP sessions per day. >>> - the server is connected using a lagg enslaving two cxgb interf= aces. >>>=20 >>> In my effort to diagnose the problem (try to have a reproductibl= e test case) I noticed that the issue is triggered most likely when those t= wo conditions are met : >>> - the ACK (in step 3) and the PSH/ACK (in step 4) arrive on diff= erent lagg NICs. >>> - the timing between those two packets is sub 10 microseconds. >>>=20 >>> When searching the interwebs I came across a strangely similar i= ssue reported here 7 years ago : >>> https://lists.freebsd.org/pipermail/freebsd-net/2010-August/0260= 29.html >>>=20 >>> (The OP seemed to have resolved his issue changing the netisr po= licy from direct to hybrid. but no reference of laggs being used) >>>=20 >>> I'm pretty sure that I'm hitting some race condition, a scenario= where due to multithreading the PSH/ACK is somehow handled before the ACK = making the kernel rising TCP/RST since the initial TCP handshake did'nt fin= ish yet. >>>=20 >>> I've read about netisr work and I was under the impression that = even if it's SMP enabled it was made to keep prorocol ordering. >>>=20 >>> What's the expected behaviour in this scenario on the netisr sid= e ? >>> How can I push the investigation further ? >>=20 >> I think you've already figured out the situation here -- the PSH/ACK is = likely >> being handled before the ACK for the SYN because they arrived on differe= nt >> interfaces. There is nothing in netisr dispatch that will maintain prot= ocol >> ordering in this case. >>=20 >> Regards, >> Navdeep >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5ABA962E-A90A-4C25-A5A7-EE5CF66FFDD4>