From owner-freebsd-net@freebsd.org  Sat Oct 19 17:32:33 2019
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6CB491525BE
 for <freebsd-net@mailman.nyi.freebsd.org>;
 Sat, 19 Oct 2019 17:32:33 +0000 (UTC) (envelope-from devgs@ukr.net)
Received: from frv196.fwdcdn.com (frv196.fwdcdn.com [212.42.77.196])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "*.ukr.net", Issuer "Thawte RSA CA 2018" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 46wVN011fGz4SDK
 for <freebsd-net@freebsd.org>; Sat, 19 Oct 2019 17:32:31 +0000 (UTC)
 (envelope-from devgs@ukr.net)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ukr.net;
 s=ffe; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-Id:
 References:In-Reply-To:Cc:To:Subject:From:Date:Sender:Reply-To:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
 List-Post:List-Owner:List-Archive;
 bh=bScw/jFGWndC5VwaWImR8YFAx/txUP5TIuTknSYFxN0=; b=nnFwlRattPk02qR9bbd7evuCMS
 QO4mxTeY7Cm/0v9hgA8PNO4f+2De1h2jo/TRnY1NGzKLq9jlxON4B0Z/uCqRy3P3BgKfTLUsU2HkN
 7FYXRmFkkwoOhCey4lt2TeG4eEnCaeizwvhVAv+UC8TYifjXg8EvShGc/Y/DUzGHNDdw=;
Received: from [10.10.10.39] (helo=frv39.fwdcdn.com)
 by frv196.fwdcdn.com with smtp ID 1iLsaS-000IRu-Ny
 for freebsd-net@freebsd.org; Sat, 19 Oct 2019 20:32:20 +0300
Date: Sat, 19 Oct 2019 20:32:20 +0300
From: Paul <devgs@ukr.net>
Subject: Re[2]: Re[2]: Network anomalies after update from 11.2 STABLE to
 12.1 STABLE
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: =?us-ascii?q?michael=2Etuexen=40lurchi=2Efranken=2Ede?=
 <michael.tuexen@lurchi.franken.de>,
 =?us-ascii?q?freebsd-net=40freebsd=2Eorg?= <freebsd-net@freebsd.org>,
 =?us-ascii?q?freebsd-stable=40freebsd=2Eorg?= <freebsd-stable@freebsd.org>
Received: from devgs@ukr.net by frv39.fwdcdn.com;
 Sat, 19 Oct 2019 20:32:20 +0300
In-Reply-To: <YQBPR0101MB1652CC049B157794AC016378DD6F0@YQBPR0101MB1652.CANPRD01.PROD.OUTLOOK.COM>
References: <1571499556.409350000.a1ewtyar@frv39.fwdcdn.com>
 <YQBPR0101MB1652CC049B157794AC016378DD6F0@YQBPR0101MB1652.CANPRD01.PROD.OUTLOOK.COM>
X-Reply-Action: reply
Message-Id: <1571505850.986841000.zen2nmth@frv39.fwdcdn.com>
X-Mailer: mail.ukr.net 5.0
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: binary
X-Rspamd-Queue-Id: 46wVN011fGz4SDK
X-Spamd-Bar: --
Authentication-Results: mx1.freebsd.org;
 dkim=pass header.d=ukr.net header.s=ffe header.b=nnFwlRat;
 dmarc=pass (policy=none) header.from=ukr.net;
 spf=pass (mx1.freebsd.org: domain of devgs@ukr.net designates 212.42.77.196 as
 permitted sender) smtp.mailfrom=devgs@ukr.net
X-Spamd-Result: default: False [-2.80 / 15.00]; ARC_NA(0.00)[];
 TO_DN_EQ_ADDR_SOME(0.00)[]; R_DKIM_ALLOW(-0.20)[ukr.net:s=ffe];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[];
 RCPT_COUNT_THREE(0.00)[4];
 R_SPF_ALLOW(-0.20)[+ip4:212.42.77.0/24:c];
 FREEMAIL_FROM(0.00)[ukr.net]; MIME_GOOD(-0.10)[text/plain];
 IP_SCORE(0.00)[ipnet: 212.42.77.0/24(-4.84), asn: 8856(-3.88), country:
 UA(0.08)]; CC_EXCESS_QP(1.20)[]; TO_DN_SOME(0.00)[];
 DWL_DNSWL_LOW(-1.00)[ukr.net.dwl.dnswl.org : 127.0.5.1];
 RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 DKIM_TRACE(0.00)[ukr.net:+];
 DMARC_POLICY_ALLOW(-0.50)[ukr.net,none];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; IP_SCORE_FREEMAIL(0.00)[];
 FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+];
 FREEMAIL_ENVFROM(0.00)[ukr.net];
 ASN(0.00)[asn:8856, ipnet:212.42.77.0/24, country:UA];
 RCVD_TLS_LAST(0.00)[]
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Oct 2019 17:32:33 -0000

Hi Rick,

RST is only one part of a syndrome. Apart from it, we have a ton of different
other issues. For example: a lot (50+) of ACK and [FIN, ACK] re-transmissions
in cases where they are definitely not needed, as seen in tspdump, unless the 
packets that we see in the dump are not actually processed by the kernel(?), 
therefore leading to re-transmissions? It definitely has something to do with 
races, because issue completely disappears when only single queue is enabled.

In other cases, we have observed that 12.1-STABLE has sent FIN, but then, 
when sending the ACK it didn't actually increment SEQ, as if those two packets
FIN an ACK were sent concurrently, though ACK was dispatched later.  

Also, I want to focus on a weird behavior, as I wrote in the original post:
issue also disappears if, multiple TCP streams each use different DST port.
It's as if it has anything to do with sharing a port.


19 October 2019, 19:24:43, by "Rick Macklem" <rmacklem@uoguelph.ca>:

> Btw, I once ran into a situation where "smart networking" was injecting
> RSTs into a TCP stream. The packet captures at the client and server
> machines were identical, except for the RSTs and the problem went away
> when I connected the two machines with a cable, bypassing the network.
> Might be worth a try, if you can do it?
> 
> Good luck with it, rick
> 
> ________________________________________
> From: owner-freebsd-net@freebsd.org <owner-freebsd-net@freebsd.org> on behalf of Paul <devgs@ukr.net>
> Sent: Saturday, October 19, 2019 12:09 PM
> To: michael.tuexen@lurchi.franken.de; freebsd-net@freebsd.org; freebsd-stable@freebsd.org
> Subject: Re[2]: Network anomalies after update from 11.2 STABLE to 12.1 STABLE
> 
> Hi Michael,
> 
> Thank you, for taking your time!
> 
> We use physical machines. We don not have any special `pf` rules.
> Both sides ran `pfctl -d` before testing.
> 
> 
> `nginx` config is primitive, no secrets there:
> 
> -------------------------------------------------------------------
> user  www;
> worker_processes  auto;
> 
> error_log  /var/log/nginx/error.log warn;
> 
> events {
>     worker_connections  81920;
>     kqueue_changes  4096;
>     use kqueue;
> }
> 
> http {
>     include                     mime.types;
>     default_type                application/octet-stream;
> 
>     sendfile                    off;
>     keepalive_timeout           65;
>     tcp_nopush                  on;
>     tcp_nodelay                 on;
> 
>     # Logging
>     log_format  main            '$remote_addr - $remote_user [$time_local] "$request" '
>                                 '$status $request_length $body_bytes_sent "$http_referer" '
>                                 '"$http_user_agent" "$http_x_real_ip" "$realip_remote_addr" "$request_completion" "$request_time" '
>                                 '"$request_body"';
> 
>     access_log                  /var/log/nginx/access.log  main;
> 
>     server {
>         listen                  80 default;
> 
>         server_name             localhost _;
> 
>         location / {
>             return 404;
>         }
>     }
> }
> -------------------------------------------------------------------
> 
> 
> `wrk` is compiled with a default configuration. We test like this:
> 
> `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency http://10.10.10.92:80/missing`
> 
> 
> Also, it seems that our issue, and the one described in this thread, are identical:
> 
>    https://lists.freebsd.org/pipermail/freebsd-net/2019-June/053667.html
> 
> We both have the Intel network cards, BTW. Our network cards are these:
> 
> em0 at pci0:10:0:0:        class=0x020000 card=0x000015d9 chip=0x10d38086 rev=0x00 hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = '82574L Gigabit Network Connection'
> 
> ixl0 at pci0:4:0:0:        class=0x020000 card=0x00078086 chip=0x15728086 rev=0x01 hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = 'Ethernet Controller X710 for 10GbE SFP+'
> 
> 
> ==============================
> 
> Additional info:
> 
> During the tests, we have bonded two interfaces into a lagg:
> 
> ixl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=c500b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,TXCSUM_IPV6>
>         ether 3c:fd:fe:aa:60:20
>         media: Ethernet autoselect (10Gbase-SR <full-duplex>)
>         status: active
>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> ixl1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=c500b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,TXCSUM_IPV6>
>         ether 3c:fd:fe:aa:60:20
>         hwaddr 3c:fd:fe:aa:60:21
>         media: Ethernet autoselect (10Gbase-SR <full-duplex>)
>         status: active
>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> 
> 
> lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=c500b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,TXCSUM_IPV6>
>         ether 3c:fd:fe:aa:60:20
>         inet 10.10.10.92 netmask 0xffff0000 broadcast 10.10.255.255
>         laggproto failover lagghash l2,l3,l4
>         laggport: ixl0 flags=5<MASTER,ACTIVE>
>         laggport: ixl1 flags=0<>
>         groups: lagg
>         media: Ethernet autoselect
>         status: active
>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> 
> using this config:
> 
>     ifconfig_ixl0="up -lro -tso -rxcsum -txcsum"  (tried different options - got the same outcome)
>     ifconfig_ixl1="up -lro -tso -rxcsum -txcsum"
>     ifconfig_lagg0="laggproto failover laggport ixl0 laggport ixl1 10.10.10.92/24"
> 
> 
> We have randomly picked `ixl0` and restricted number of RX/TX queues to 1:
>     /boot/loader.conf :
>     dev.ixl.0.iflib.override_ntxqs=1
>     dev.ixl.0.iflib.override_nrxqs=1
> 
> leaving `ixl1` with a default number, matching number of cores (6).
> 
> 
>     ixl0: <Intel(R) Ethernet Controller X710 for 10GbE SFP+ - 2.1.0-k> mem 0xf8800000-0xf8ffffff,0xf9808000-0xf980ffff irq 40 at device 0.0 on pci4
>     ixl0: fw 5.0.40043 api 1.5 nvm 5.05 etid 80002927 oem 1.261.0
>     ixl0: PF-ID[0]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C
>     ixl0: Using 1024 TX descriptors and 1024 RX descriptors
>     ixl0: Using 1 RX queues 1 TX queues
>     ixl0: Using MSI-X interrupts with 2 vectors
>     ixl0: Ethernet address: 3c:fd:fe:aa:60:20
>     ixl0: Allocating 1 queues for PF LAN VSI; 1 queues active
>     ixl0: PCI Express Bus: Speed 8.0GT/s Width x4
>     ixl0: SR-IOV ready
>     ixl0: netmap queues/slots: TX 1/1024, RX 1/1024
>     ixl1: <Intel(R) Ethernet Controller X710 for 10GbE SFP+ - 2.1.0-k> mem 0xf8000000-0xf87fffff,0xf9800000-0xf9807fff irq 40 at device 0.1 on pci4
>     ixl1: fw 5.0.40043 api 1.5 nvm 5.05 etid 80002927 oem 1.261.0
>     ixl1: PF-ID[1]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C
>     ixl1: Using 1024 TX descriptors and 1024 RX descriptors
>     ixl1: Using 6 RX queues 6 TX queues
>     ixl1: Using MSI-X interrupts with 7 vectors
>     ixl1: Ethernet address: 3c:fd:fe:aa:60:21
>     ixl1: Allocating 8 queues for PF LAN VSI; 6 queues active
>     ixl1: PCI Express Bus: Speed 8.0GT/s Width x4
>     ixl1: SR-IOV ready
>     ixl1: netmap queues/slots: TX 6/1024, RX 6/1024
> 
> 
> This allowed us easy switch between different configurations without
> the need to reboot, by simply shutting down one interface or the other:
> 
>     `ifconfig XXX down`
> 
> When testing `ixl0` that runs only a single queue:
>     ixl0: Using 1 RX queues 1 TX queues
>     ixl0: netmap queues/slots: TX 1/1024, RX 1/1024
> 
> we've got these results:
> 
> `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency http://10.10.10.92:80/missing`
> Running 10s test @ http://10.10.10.92:80/missing
>   1 threads and 10 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency   281.31us  297.74us  22.66ms   99.70%
>     Req/Sec    19.91k     2.79k   21.25k    97.59%
>   Latency Distribution
>      50%  266.00us
>      75%  309.00us
>      90%  374.00us
>      99%  490.00us
>   164440 requests in 10.02s, 47.52MB read
>   Socket errors: read 0, write 0, timeout 0
>   Non-2xx or 3xx responses: 164440
> Requests/sec:  16412.09
> Transfer/sec:      4.74MB
> 
> 
> When testing `ixl1` that runs 6 queues:
>     ixl1: Using 6 RX queues 6 TX queues
>     ixl1: netmap queues/slots: TX 6/1024, RX 6/1024
> 
> we've got these results:
> 
> `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency http://10.10.10.92:80/missing`
> Running 10s test @ http://10.10.10.92:80/missing
>   1 threads and 10 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency   216.16us   71.97us 511.00us   47.56%
>     Req/Sec     4.34k     2.76k   15.44k    83.17%
>   Latency Distribution
>      50%  216.00us
>      75%  276.00us
>      90%  312.00us
>      99%  365.00us
>   43616 requests in 10.10s, 12.60MB read
>   Socket errors: connect 0, read 24, write 8, timeout 0
>   Non-2xx or 3xx responses: 43616
> Requests/sec:   4318.26
> Transfer/sec:      1.25MB
> 
> Do note, that, not only multiple queues cause issues they also dramatically
> decrease the performance of the network.
> 
> Using `sysctl -w net.inet.tcp.ts_offset_per_conn=0` didn't help at all.
> 
> Best regards,
> -Paul
> 
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>