From owner-freebsd-net@freebsd.org Sat Oct 19 16:35:28 2019 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 40E6B151268; Sat, 19 Oct 2019 16:35:28 +0000 (UTC) (envelope-from michael.tuexen@lurchi.franken.de) Received: from drew.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.franken.de", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46wT665C13z4Px4; Sat, 19 Oct 2019 16:35:26 +0000 (UTC) (envelope-from michael.tuexen@lurchi.franken.de) Received: from [IPv6:2a02:8109:1140:c3d:e070:ad51:e71e:6446] (unknown [IPv6:2a02:8109:1140:c3d:e070:ad51:e71e:6446]) (Authenticated sender: lurchi) by drew.franken.de (Postfix) with ESMTPSA id 84F2C721E281A; Sat, 19 Oct 2019 18:35:21 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3594.4.19\)) Subject: Re: Network anomalies after update from 11.2 STABLE to 12.1 STABLE From: Michael Tuexen In-Reply-To: <1571499556.409350000.a1ewtyar@frv39.fwdcdn.com> Date: Sat, 19 Oct 2019 18:35:20 +0200 Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <1571499556.409350000.a1ewtyar@frv39.fwdcdn.com> To: Paul X-Mailer: Apple Mail (2.3594.4.19) X-Spam-Status: No, score=-1.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, NUMERIC_HTTP_ADDR autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de X-Rspamd-Queue-Id: 46wT665C13z4Px4 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of michael.tuexen@lurchi.franken.de has no SPF policy when checking 2001:638:a02:a001:20e:cff:fe4a:feaa) smtp.mailfrom=michael.tuexen@lurchi.franken.de X-Spamd-Result: default: False [-0.81 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; NEURAL_HAM_MEDIUM(-0.70)[-0.700,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[franken.de]; AUTH_NA(1.00)[]; NEURAL_HAM_LONG(-0.99)[-0.988,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(-0.42)[ip: (-4.45), ipnet: 2001:638::/32(2.25), asn: 680(0.11), country: DE(-0.01)]; R_SPF_NA(0.00)[]; FREEMAIL_TO(0.00)[ukr.net]; RCVD_IN_DNSWL_LOW(-0.10)[a.a.e.f.a.4.e.f.f.f.c.0.e.0.2.0.1.0.0.a.2.0.a.0.8.3.6.0.1.0.0.2.list.dnswl.org : 127.0.5.1]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:680, ipnet:2001:638::/32, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; FROM_EQ_ENVFROM(0.00)[] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Oct 2019 16:35:28 -0000 > On 19. Oct 2019, at 18:09, Paul wrote: >=20 > Hi Michael, >=20 > Thank you, for taking your time! >=20 > We use physical machines. We don not have any special `pf` rules.=20 > Both sides ran `pfctl -d` before testing. Hi Paul, OK. How are the physical machines connected to each other? What happens when you don't use a lagg interface, but the physical ones? (Trying to localise the problem...) Best regards Michael >=20 >=20 > `nginx` config is primitive, no secrets there: >=20 > ------------------------------------------------------------------- > user www; > worker_processes auto; >=20 > error_log /var/log/nginx/error.log warn; >=20 > events { > worker_connections 81920; > kqueue_changes 4096; > use kqueue; > } >=20 > http { > include mime.types; > default_type application/octet-stream; >=20 > sendfile off; > keepalive_timeout 65; > tcp_nopush on; > tcp_nodelay on; >=20 > # Logging > log_format main '$remote_addr - $remote_user = [$time_local] "$request" ' > '$status $request_length = $body_bytes_sent "$http_referer" ' > '"$http_user_agent" "$http_x_real_ip" = "$realip_remote_addr" "$request_completion" "$request_time" ' > '"$request_body"'; >=20 > access_log /var/log/nginx/access.log main; >=20 > server { > listen 80 default; >=20 > server_name localhost _; >=20 > location / { > return 404; > } > } > } > ------------------------------------------------------------------- >=20 >=20 > `wrk` is compiled with a default configuration. We test like this: >=20 > `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency = http://10.10.10.92:80/missing` >=20 >=20 > Also, it seems that our issue, and the one described in this thread, = are identical: >=20 > = https://lists.freebsd.org/pipermail/freebsd-net/2019-June/053667.html >=20 > We both have the Intel network cards, BTW. Our network cards are = these: >=20 > em0 at pci0:10:0:0: class=3D0x020000 card=3D0x000015d9 = chip=3D0x10d38086 rev=3D0x00 hdr=3D0x00 > vendor =3D 'Intel Corporation' > device =3D '82574L Gigabit Network Connection' >=20 > ixl0 at pci0:4:0:0: class=3D0x020000 card=3D0x00078086 = chip=3D0x15728086 rev=3D0x01 hdr=3D0x00 > vendor =3D 'Intel Corporation' > device =3D 'Ethernet Controller X710 for 10GbE SFP+' >=20 >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D >=20 > Additional info: >=20 > During the tests, we have bonded two interfaces into a lagg: >=20 > ixl0: flags=3D8843 metric 0 = mtu 1500 > = options=3Dc500b8 > ether 3c:fd:fe:aa:60:20 > media: Ethernet autoselect (10Gbase-SR ) > status: active > nd6 options=3D29 > ixl1: flags=3D8843 metric 0 = mtu 1500 > = options=3Dc500b8 > ether 3c:fd:fe:aa:60:20 > hwaddr 3c:fd:fe:aa:60:21 > media: Ethernet autoselect (10Gbase-SR ) > status: active > nd6 options=3D29 >=20 >=20 > lagg0: flags=3D8843 metric 0 = mtu 1500 > = options=3Dc500b8 > ether 3c:fd:fe:aa:60:20 > inet 10.10.10.92 netmask 0xffff0000 broadcast 10.10.255.255 > laggproto failover lagghash l2,l3,l4 > laggport: ixl0 flags=3D5 > laggport: ixl1 flags=3D0<> > groups: lagg > media: Ethernet autoselect > status: active > nd6 options=3D29 >=20 > using this config: >=20 > ifconfig_ixl0=3D"up -lro -tso -rxcsum -txcsum" (tried different = options - got the same outcome) > ifconfig_ixl1=3D"up -lro -tso -rxcsum -txcsum" > ifconfig_lagg0=3D"laggproto failover laggport ixl0 laggport ixl1 = 10.10.10.92/24" >=20 >=20 > We have randomly picked `ixl0` and restricted number of RX/TX queues = to 1: > /boot/loader.conf : > dev.ixl.0.iflib.override_ntxqs=3D1 > dev.ixl.0.iflib.override_nrxqs=3D1 >=20 > leaving `ixl1` with a default number, matching number of cores (6). >=20 >=20 > ixl0: = mem 0xf8800000-0xf8ffffff,0xf9808000-0xf980ffff irq 40 at device 0.0 on = pci4 > ixl0: fw 5.0.40043 api 1.5 nvm 5.05 etid 80002927 oem 1.261.0 > ixl0: PF-ID[0]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C > ixl0: Using 1024 TX descriptors and 1024 RX descriptors > ixl0: Using 1 RX queues 1 TX queues > ixl0: Using MSI-X interrupts with 2 vectors > ixl0: Ethernet address: 3c:fd:fe:aa:60:20 > ixl0: Allocating 1 queues for PF LAN VSI; 1 queues active > ixl0: PCI Express Bus: Speed 8.0GT/s Width x4 > ixl0: SR-IOV ready > ixl0: netmap queues/slots: TX 1/1024, RX 1/1024 > ixl1: = mem 0xf8000000-0xf87fffff,0xf9800000-0xf9807fff irq 40 at device 0.1 on = pci4 > ixl1: fw 5.0.40043 api 1.5 nvm 5.05 etid 80002927 oem 1.261.0 > ixl1: PF-ID[1]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C > ixl1: Using 1024 TX descriptors and 1024 RX descriptors > ixl1: Using 6 RX queues 6 TX queues > ixl1: Using MSI-X interrupts with 7 vectors > ixl1: Ethernet address: 3c:fd:fe:aa:60:21 > ixl1: Allocating 8 queues for PF LAN VSI; 6 queues active > ixl1: PCI Express Bus: Speed 8.0GT/s Width x4 > ixl1: SR-IOV ready > ixl1: netmap queues/slots: TX 6/1024, RX 6/1024 >=20 >=20 > This allowed us easy switch between different configurations without > the need to reboot, by simply shutting down one interface or the = other: >=20 > `ifconfig XXX down` >=20 > When testing `ixl0` that runs only a single queue: > ixl0: Using 1 RX queues 1 TX queues > ixl0: netmap queues/slots: TX 1/1024, RX 1/1024 >=20 > we've got these results: >=20 > `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency = http://10.10.10.92:80/missing` > Running 10s test @ http://10.10.10.92:80/missing > 1 threads and 10 connections > Thread Stats Avg Stdev Max +/- Stdev > Latency 281.31us 297.74us 22.66ms 99.70% > Req/Sec 19.91k 2.79k 21.25k 97.59% > Latency Distribution > 50% 266.00us > 75% 309.00us > 90% 374.00us > 99% 490.00us > 164440 requests in 10.02s, 47.52MB read > Socket errors: read 0, write 0, timeout 0 > Non-2xx or 3xx responses: 164440 > Requests/sec: 16412.09 > Transfer/sec: 4.74MB >=20 >=20 > When testing `ixl1` that runs 6 queues: > ixl1: Using 6 RX queues 6 TX queues > ixl1: netmap queues/slots: TX 6/1024, RX 6/1024 >=20 > we've got these results: >=20 > `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency = http://10.10.10.92:80/missing` > Running 10s test @ http://10.10.10.92:80/missing > 1 threads and 10 connections > Thread Stats Avg Stdev Max +/- Stdev > Latency 216.16us 71.97us 511.00us 47.56% > Req/Sec 4.34k 2.76k 15.44k 83.17% > Latency Distribution > 50% 216.00us > 75% 276.00us > 90% 312.00us > 99% 365.00us > 43616 requests in 10.10s, 12.60MB read > Socket errors: connect 0, read 24, write 8, timeout 0 > Non-2xx or 3xx responses: 43616 > Requests/sec: 4318.26 > Transfer/sec: 1.25MB >=20 > Do note, that, not only multiple queues cause issues they also = dramatically =20 > decrease the performance of the network.=20 >=20 > Using `sysctl -w net.inet.tcp.ts_offset_per_conn=3D0` didn't help at = all. >=20 > Best regards, > -Paul >=20 >=20