Date: Sun, 20 Oct 2019 11:59:10 +0200 From: Michael Tuexen <tuexen@freebsd.org> To: Paul <devgs@ukr.net> Cc: Rick Macklem <rmacklem@uoguelph.ca>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org> Subject: Re: Network anomalies after update from 11.2 STABLE to 12.1 STABLE Message-ID: <D77A5604-10E2-412B-89FE-2547ADA9C9A1@freebsd.org> In-Reply-To: <1571505850.986841000.zen2nmth@frv39.fwdcdn.com> References: <1571499556.409350000.a1ewtyar@frv39.fwdcdn.com> <YQBPR0101MB1652CC049B157794AC016378DD6F0@YQBPR0101MB1652.CANPRD01.PROD.OUTLOOK.COM> <1571505850.986841000.zen2nmth@frv39.fwdcdn.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 19. Oct 2019, at 19:32, Paul <devgs@ukr.net> wrote: >=20 > Hi Rick, >=20 > RST is only one part of a syndrome. Apart from it, we have a ton of = different > other issues. For example: a lot (50+) of ACK and [FIN, ACK] = re-transmissions > in cases where they are definitely not needed, as seen in tspdump, = unless the=20 > packets that we see in the dump are not actually processed by the = kernel(?),=20 > therefore leading to re-transmissions? It definitely has something to = do with=20 > races, because issue completely disappears when only single queue is = enabled. >=20 > In other cases, we have observed that 12.1-STABLE has sent FIN, but = then,=20 > when sending the ACK it didn't actually increment SEQ, as if those two = packets > FIN an ACK were sent concurrently, though ACK was dispatched later. =20= >=20 > Also, I want to focus on a weird behavior, as I wrote in the original = post: > issue also disappears if, multiple TCP streams each use different DST = port. > It's as if it has anything to do with sharing a port. Hi Paul, I understand that you see the NIC level queue handling as a part of what = has to be taken into account. I agree, that having problems there might result = in packets send out not in the expected order or packets received not being = processed in the expected order. =46rom a TCP perspective, both cases look like reordering in the network = and this might impact the performance in a negative way (unnecessary = retransmissions, congestion control limiting the transfer more than it should), but it = should not result in TCP connection drops. Do you have tracefiles (.pcap preferred) from both sides showing = connection drops? Best regards Michael >=20 >=20 > 19 October 2019, 19:24:43, by "Rick Macklem" <rmacklem@uoguelph.ca>: >=20 >> Btw, I once ran into a situation where "smart networking" was = injecting >> RSTs into a TCP stream. The packet captures at the client and server >> machines were identical, except for the RSTs and the problem went = away >> when I connected the two machines with a cable, bypassing the = network. >> Might be worth a try, if you can do it? >>=20 >> Good luck with it, rick >>=20 >> ________________________________________ >> From: owner-freebsd-net@freebsd.org <owner-freebsd-net@freebsd.org> = on behalf of Paul <devgs@ukr.net> >> Sent: Saturday, October 19, 2019 12:09 PM >> To: michael.tuexen@lurchi.franken.de; freebsd-net@freebsd.org; = freebsd-stable@freebsd.org >> Subject: Re[2]: Network anomalies after update from 11.2 STABLE to = 12.1 STABLE >>=20 >> Hi Michael, >>=20 >> Thank you, for taking your time! >>=20 >> We use physical machines. We don not have any special `pf` rules. >> Both sides ran `pfctl -d` before testing. >>=20 >>=20 >> `nginx` config is primitive, no secrets there: >>=20 >> ------------------------------------------------------------------- >> user www; >> worker_processes auto; >>=20 >> error_log /var/log/nginx/error.log warn; >>=20 >> events { >> worker_connections 81920; >> kqueue_changes 4096; >> use kqueue; >> } >>=20 >> http { >> include mime.types; >> default_type application/octet-stream; >>=20 >> sendfile off; >> keepalive_timeout 65; >> tcp_nopush on; >> tcp_nodelay on; >>=20 >> # Logging >> log_format main '$remote_addr - $remote_user = [$time_local] "$request" ' >> '$status $request_length = $body_bytes_sent "$http_referer" ' >> '"$http_user_agent" "$http_x_real_ip" = "$realip_remote_addr" "$request_completion" "$request_time" ' >> '"$request_body"'; >>=20 >> access_log /var/log/nginx/access.log main; >>=20 >> server { >> listen 80 default; >>=20 >> server_name localhost _; >>=20 >> location / { >> return 404; >> } >> } >> } >> ------------------------------------------------------------------- >>=20 >>=20 >> `wrk` is compiled with a default configuration. We test like this: >>=20 >> `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency = http://10.10.10.92:80/missing` >>=20 >>=20 >> Also, it seems that our issue, and the one described in this thread, = are identical: >>=20 >> = https://lists.freebsd.org/pipermail/freebsd-net/2019-June/053667.html >>=20 >> We both have the Intel network cards, BTW. Our network cards are = these: >>=20 >> em0 at pci0:10:0:0: class=3D0x020000 card=3D0x000015d9 = chip=3D0x10d38086 rev=3D0x00 hdr=3D0x00 >> vendor =3D 'Intel Corporation' >> device =3D '82574L Gigabit Network Connection' >>=20 >> ixl0 at pci0:4:0:0: class=3D0x020000 card=3D0x00078086 = chip=3D0x15728086 rev=3D0x01 hdr=3D0x00 >> vendor =3D 'Intel Corporation' >> device =3D 'Ethernet Controller X710 for 10GbE SFP+' >>=20 >>=20 >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D >>=20 >> Additional info: >>=20 >> During the tests, we have bonded two interfaces into a lagg: >>=20 >> ixl0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 = mtu 1500 >> = options=3Dc500b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILT= ER,VLAN_HWTSO,TXCSUM_IPV6> >> ether 3c:fd:fe:aa:60:20 >> media: Ethernet autoselect (10Gbase-SR <full-duplex>) >> status: active >> nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >> ixl1: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 = mtu 1500 >> = options=3Dc500b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILT= ER,VLAN_HWTSO,TXCSUM_IPV6> >> ether 3c:fd:fe:aa:60:20 >> hwaddr 3c:fd:fe:aa:60:21 >> media: Ethernet autoselect (10Gbase-SR <full-duplex>) >> status: active >> nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >>=20 >>=20 >> lagg0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 = mtu 1500 >> = options=3Dc500b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILT= ER,VLAN_HWTSO,TXCSUM_IPV6> >> ether 3c:fd:fe:aa:60:20 >> inet 10.10.10.92 netmask 0xffff0000 broadcast 10.10.255.255 >> laggproto failover lagghash l2,l3,l4 >> laggport: ixl0 flags=3D5<MASTER,ACTIVE> >> laggport: ixl1 flags=3D0<> >> groups: lagg >> media: Ethernet autoselect >> status: active >> nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >>=20 >> using this config: >>=20 >> ifconfig_ixl0=3D"up -lro -tso -rxcsum -txcsum" (tried different = options - got the same outcome) >> ifconfig_ixl1=3D"up -lro -tso -rxcsum -txcsum" >> ifconfig_lagg0=3D"laggproto failover laggport ixl0 laggport ixl1 = 10.10.10.92/24" >>=20 >>=20 >> We have randomly picked `ixl0` and restricted number of RX/TX queues = to 1: >> /boot/loader.conf : >> dev.ixl.0.iflib.override_ntxqs=3D1 >> dev.ixl.0.iflib.override_nrxqs=3D1 >>=20 >> leaving `ixl1` with a default number, matching number of cores (6). >>=20 >>=20 >> ixl0: <Intel(R) Ethernet Controller X710 for 10GbE SFP+ - 2.1.0-k> = mem 0xf8800000-0xf8ffffff,0xf9808000-0xf980ffff irq 40 at device 0.0 on = pci4 >> ixl0: fw 5.0.40043 api 1.5 nvm 5.05 etid 80002927 oem 1.261.0 >> ixl0: PF-ID[0]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C >> ixl0: Using 1024 TX descriptors and 1024 RX descriptors >> ixl0: Using 1 RX queues 1 TX queues >> ixl0: Using MSI-X interrupts with 2 vectors >> ixl0: Ethernet address: 3c:fd:fe:aa:60:20 >> ixl0: Allocating 1 queues for PF LAN VSI; 1 queues active >> ixl0: PCI Express Bus: Speed 8.0GT/s Width x4 >> ixl0: SR-IOV ready >> ixl0: netmap queues/slots: TX 1/1024, RX 1/1024 >> ixl1: <Intel(R) Ethernet Controller X710 for 10GbE SFP+ - 2.1.0-k> = mem 0xf8000000-0xf87fffff,0xf9800000-0xf9807fff irq 40 at device 0.1 on = pci4 >> ixl1: fw 5.0.40043 api 1.5 nvm 5.05 etid 80002927 oem 1.261.0 >> ixl1: PF-ID[1]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C >> ixl1: Using 1024 TX descriptors and 1024 RX descriptors >> ixl1: Using 6 RX queues 6 TX queues >> ixl1: Using MSI-X interrupts with 7 vectors >> ixl1: Ethernet address: 3c:fd:fe:aa:60:21 >> ixl1: Allocating 8 queues for PF LAN VSI; 6 queues active >> ixl1: PCI Express Bus: Speed 8.0GT/s Width x4 >> ixl1: SR-IOV ready >> ixl1: netmap queues/slots: TX 6/1024, RX 6/1024 >>=20 >>=20 >> This allowed us easy switch between different configurations without >> the need to reboot, by simply shutting down one interface or the = other: >>=20 >> `ifconfig XXX down` >>=20 >> When testing `ixl0` that runs only a single queue: >> ixl0: Using 1 RX queues 1 TX queues >> ixl0: netmap queues/slots: TX 1/1024, RX 1/1024 >>=20 >> we've got these results: >>=20 >> `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency = http://10.10.10.92:80/missing` >> Running 10s test @ http://10.10.10.92:80/missing >> 1 threads and 10 connections >> Thread Stats Avg Stdev Max +/- Stdev >> Latency 281.31us 297.74us 22.66ms 99.70% >> Req/Sec 19.91k 2.79k 21.25k 97.59% >> Latency Distribution >> 50% 266.00us >> 75% 309.00us >> 90% 374.00us >> 99% 490.00us >> 164440 requests in 10.02s, 47.52MB read >> Socket errors: read 0, write 0, timeout 0 >> Non-2xx or 3xx responses: 164440 >> Requests/sec: 16412.09 >> Transfer/sec: 4.74MB >>=20 >>=20 >> When testing `ixl1` that runs 6 queues: >> ixl1: Using 6 RX queues 6 TX queues >> ixl1: netmap queues/slots: TX 6/1024, RX 6/1024 >>=20 >> we've got these results: >>=20 >> `wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency = http://10.10.10.92:80/missing` >> Running 10s test @ http://10.10.10.92:80/missing >> 1 threads and 10 connections >> Thread Stats Avg Stdev Max +/- Stdev >> Latency 216.16us 71.97us 511.00us 47.56% >> Req/Sec 4.34k 2.76k 15.44k 83.17% >> Latency Distribution >> 50% 216.00us >> 75% 276.00us >> 90% 312.00us >> 99% 365.00us >> 43616 requests in 10.10s, 12.60MB read >> Socket errors: connect 0, read 24, write 8, timeout 0 >> Non-2xx or 3xx responses: 43616 >> Requests/sec: 4318.26 >> Transfer/sec: 1.25MB >>=20 >> Do note, that, not only multiple queues cause issues they also = dramatically >> decrease the performance of the network. >>=20 >> Using `sysctl -w net.inet.tcp.ts_offset_per_conn=3D0` didn't help at = all. >>=20 >> Best regards, >> -Paul >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D77A5604-10E2-412B-89FE-2547ADA9C9A1>