Date: Fri, 18 Oct 2019 15:57:13 +0300 From: Paul <devgs@ukr.net> To: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Network anomalies after update from 11.2 STABLE to 12.1 STABLE Message-ID: <1571398510.796520000.8iwbi4pd@frv39.fwdcdn.com>
next in thread | raw e-mail | index | archive | help
Our current version is: FreeBSD 11.2-STABLE #0 r340725 New version that we have problems with: FreeBSD 12.1-STABLE #5 r352893 After update to new version we have started to observe an incredible number of errors in HTTP requests in between various services in our system. This problem appeared on all the servers that were upgraded, and seems to not be specific to concrete network card: we use different models, all are affected. During various tests, we observed a lot of spontaneous TCP stream abortions, including at the establishment stage (SYN) in cases that were 100% issue free on 11.2-STABLE. Concrete test cases will be shown below. We also want to highlight that, on numerous occasions, we have observed random, huge ACK indices in a first response to a SYN packet, instead of 1, as expected. This forces client to abort connection via RST. On the fist glance it looks like races in the kernel, because problem disappears when: * we use `dev.ixl.0.iflib.override_nrxqs=1` and `dev.ixl.0.iflib.override_ntxqs=1` * we use `dev.ixl.0.iflib.override_nrxqs=0` and `dev.ixl.0.iflib.override_ntxqs=0`, but don't issue concurrent TCP streams These are some debug log messages, emitted by 12.1-STABLE: Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16304 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16326 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16402 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16652 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16686 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:18562 to [10.10.10.92]:80 tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, no action Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:18918 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19331 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19340 to [10.10.10.92]:80 tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, no action Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19340 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19340 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19489 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19580 to [10.10.10.92]:80 tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, no action Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19580 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19580 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:17705 to [10.10.10.92]:80; syncache_timer: Response timeout, retransmitting (1) SYN|ACK Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:18066 to [10.10.10.92]:80; syncache_timer: Response timeout, retransmitting (1) SYN|ACK Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:18066 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Our SYN|ACK was rejected, connection attempt aborted by remote endpoint Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:17705 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Our SYN|ACK was rejected, connection attempt aborted by remote endpoint Here, 10.10.10.92 runs 12.1-STABLE, while 10.10.10.39 is a client that runs 11.2-STABLE. In our test case we use nginx and wrk , with a minimal config, where nginx always returns error page 404. nginx is on the 12.1-STABLE, while wrk is on 11.2-STABLE. We run wrk like so: wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency http://10.10.10.92:80/missing and often see errors like these: Socket errors: connect 12, read 4, write 4, timeout 0 If we reverse the test, by switching two servers places, ie 12.1-STABLE becomes a client and issues requests via wrk, we see no problems at all. Same is true between two between two 11.2-STABLE machines. It seems like issue appears only when the same local port is used for multiple connections on 12.1-STABLE. Currently this is possible only when 12.1-STABLE is a server and accepts connections on port, say 80, as in our case. To confirm, this we made another test. We've configured nginx to listen on 10 different ports, 80 through 89, and then launched 10 different wrk processes, each using only one concurrent connection, meaning that we will have only 10 TCP streams, each having its own unique port on the 12.1-STABLE's side: for I in {0..9}; do wrk -c 1 --header "Connection: close" -d 10 -t 1 --latency http://10.10.10.92:8${I}/missing & ; done Socket errors stopped appearing. We ran this test many many times, errors just don't appear. Though, whenever we repeat a previous test, using a single port: wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency http://10.10.10.92:80/missing errors start appearing again and again: Socket errors: connect 8, read 14, write 9, timeout 0 We've tested different drivers with the same outcome: em driver: em0@pci0:10:0:0: class=0x020000 card=0x000015d9 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' ixl driver: ixl0@pci0:4:0:0: class=0x020000 card=0x00078086 chip=0x15728086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Controller X710 for 10GbE SFP+' Even the driver from ports (/usr/ports/net/intel-ixl-kmod): ixl-1.11.9 Help with this matter would be really appreciated. Best regards, -Paul
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1571398510.796520000.8iwbi4pd>