From owner-freebsd-net@freebsd.org Sat Dec 28 08:17:07 2019 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id DD8F31E26BF for ; Sat, 28 Dec 2019 08:17:07 +0000 (UTC) (envelope-from vmaffione@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 47lGkq4lRtz48fR; Sat, 28 Dec 2019 08:17:07 +0000 (UTC) (envelope-from vmaffione@freebsd.org) Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) (Authenticated sender: vmaffione) by smtp.freebsd.org (Postfix) with ESMTPSA id 8E44119AB9; Sat, 28 Dec 2019 08:17:07 +0000 (UTC) (envelope-from vmaffione@freebsd.org) Received: by mail-qv1-f48.google.com with SMTP id x1so10756922qvr.8; Sat, 28 Dec 2019 00:17:07 -0800 (PST) X-Gm-Message-State: APjAAAXB/HWHKyQGWIMJtZTJ6XoA82qZci6y4cN8r54c7RichihNGiJt G+QTwchmgCHGiyLz+T/EAxd3ALoTqSJeAG09T8I= X-Google-Smtp-Source: APXvYqx3VpY7QR57ZtOx4m9Cs97HbsW2F8ooWrgM1e9JBaNpZg/7vVc/ozgumExK4af/x322b63Ytw2YEw3NZtTZV9o= X-Received: by 2002:ad4:4b6a:: with SMTP id m10mr41814599qvx.116.1577521026581; Sat, 28 Dec 2019 00:17:06 -0800 (PST) MIME-Version: 1.0 References: <67dc1ce9-274c-7e70-30dc-97e2d5767237@FreeBSD.org> <963e3042-90b4-4de2-e18c-3e29627a25a9@FreeBSD.org> In-Reply-To: From: Vincenzo Maffione Date: Sat, 28 Dec 2019 09:16:55 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: vmx: strange issue, related to to tso? To: Patrick Kelsey Cc: Andriy Gapon , freebsd-net Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Dec 2019 08:17:07 -0000 I think you are correct. Good catch! We should file a bug and/or create a review on the Phabricator (If you are busy I could do that). Thanks, Vincenzo Il giorno sab 28 dic 2019 alle ore 05:44 Patrick Kelsey ha scritto: > > On Fri, Dec 27, 2019 at 5:01 PM Andriy Gapon wrote: > >> On 27/12/2019 15:34, Vincenzo Maffione wrote: >> > It may be useful to check what happens if you replace the vmx0 >> interface with an >> > em0. >> > In this way you would know if the issue is vmx-specific or not. >> >> I'll put this on my to-do, can't test right now. >> >> But one thing I noticed when comparing the TCP control block of the >> connection >> before and after the "TSO dance" is that TF_TSO gets cleared after any >> outgoing >> traffic while TSO is disabled on the interface. And the flag does not >> come back >> after TSO is reenabled. Any new connections get the flag, of course. >> >> So, I indeed suspect that there is a problem with vmx TSO. >> As another data point, an older system from before vmx->iflib conversion >> does >> not exhibit the problem. >> >> > Il giorno gio 26 dic 2019 alle ore 20:04 Andriy Gapon > > > ha scritto: >> > >> > >> > Maybe someone would have any pointers for me with the following >> problem. >> > This happens with CURRENT as of the beginning of September. >> > I connect via ssh to a VM running on VMware, it has a single vmx0 >> interface. >> > The problem is that when I print a moderately large amount of text >> to the >> > terminal (e.g., tail -100 /var/log/messages) I literally see it >> printed in >> > chunks with noticeable pauses between chunks. It takes several >> seconds for all >> > lines to get shown. This happens every time I do it. >> > There is an interesting twist. If I disable TSO with ifconfig vmx0 >> -tso and >> > print the same output in the same ssh session, then the output is >> smooth and >> > fast as I would expect it. The lines scroll by almost instantly. >> > If then I re-enable TSO and again produce the same output in the >> same ssh, then >> > it is still fast. >> > >> > It appears that the TCP connection gets tuned to some very >> sub-optimal >> > parameters when TSO is enabled. When I disable TSO, the parameters >> get re-tuned >> > to better values and the values stick when I re-enable TSO. >> > This is just a conjecture, of course. >> > >> > I have some tcpdump captures, but I do not see anything that would >> really stand >> > out. One difference is that in the slow case only "full sized" >> packets are sent >> > while in the fast case there are shorter packets with push flag. >> > >> > Some packets for the slow case: >> > 00:00:00.453202 IP 10.180.106.180.22 > 10.180.1.29.25490: Flags >> [.], seq >> > 37:1485, ack 36, win 128, options [nop,nop,TS val 1403195134 ecr >> 4966311], >> > length 1448 >> > 00:00:00.096859 IP 10.180.1.29.25490 > 10.180.106.180.22: Flags >> [.], ack 1485, >> > win 1026, options [nop,nop,TS val 4966864 ecr 1403195134], length 0 >> > 00:00:00.442963 IP 10.180.106.180.22 > 10.180.1.29.25490: Flags >> [.], seq >> > 1485:2933, ack 36, win 128, options [nop,nop,TS val 1403195664 ecr >> 4966864], >> > length 1448 >> > 00:00:00.092677 IP 10.180.1.29.25490 > 10.180.106.180.22: Flags >> [.], ack 2933, >> > win 1026, options [nop,nop,TS val 4967400 ecr 1403195664], length 0 >> > 00:00:00.437336 IP 10.180.106.180.22 > 10.180.1.29.25490: Flags >> [.], seq >> > 2933:4381, ack 36, win 128, options [nop,nop,TS val 1403196194 ecr >> 4967400], >> > length 1448 >> > 00:00:00.097190 IP 10.180.1.29.25490 > 10.180.106.180.22: Flags >> [.], ack 4381, >> > win 1026, options [nop,nop,TS val 4967934 ecr 1403196194], length 0 >> > >> > Some packets after the TSO dance: >> > 00:00:00.000450 IP 10.180.106.180.22 > 10.180.1.29.25369: Flags >> [.], seq >> > 4077:5525, ack 36, win 128, options [nop,nop,TS val 2124310129 ecr >> 21706510], >> > length 1448 >> > 00:00:00.000016 IP 10.180.106.180.22 > 10.180.1.29.25369: Flags >> [P.], seq >> > 5525:6097, ack 36, win 128, options [nop,nop,TS val 2124310129 ecr >> 21706510], >> > length 572 >> > 00:00:00.000009 IP 10.180.1.29.25369 > 10.180.106.180.22: Flags >> [.], ack 5525, >> > win 1003, options [nop,nop,TS val 21706510 ecr 2124310129], length 0 >> > 00:00:00.000303 IP 10.180.106.180.22 > 10.180.1.29.25369: Flags >> [.], seq >> > 6097:7545, ack 36, win 128, options [nop,nop,TS val 2124310129 ecr >> 21706510], >> > length 1448 >> > 00:00:00.000019 IP 10.180.106.180.22 > 10.180.1.29.25369: Flags >> [P.], seq >> > 7545:8117, ack 36, win 128, options [nop,nop,TS val 2124310129 ecr >> 21706510], >> > length 572 >> > 00:00:00.000013 IP 10.180.1.29.25369 > 10.180.106.180.22: Flags >> [.], ack 7545, >> > win 1003, options [nop,nop,TS val 21706510 ecr 2124310129], length 0 >> > 00:00:00.000162 IP 10.180.106.180.22 > 10.180.1.29.25369: Flags >> [.], seq >> > 8117:9565, ack 36, win 128, options [nop,nop,TS val 2124310129 ecr >> 21706510], >> > length 1448 >> > 00:00:00.000012 IP 10.180.106.180.22 > 10.180.1.29.25369: Flags >> [P.], seq >> > 9565:10137, ack 36, win 128, options [nop,nop,TS val 2124310129 ecr >> 21706510], >> > length 572 >> > 00:00:00.000007 IP 10.180.1.29.25369 > 10.180.106.180.22: Flags >> [.], ack 9565, >> > win 1003, options [nop,nop,TS val 21706510 ecr 2124310129], length 0 >> > >> > What else can I examine to debug the problem further? >> > Thank you! >> > -- >> > Andriy Gapon >> > _______________________________________________ >> > freebsd-net@freebsd.org mailing >> list >> > https://lists.freebsd.org/mailman/listinfo/freebsd-net >> > To unsubscribe, send any mail to " >> freebsd-net-unsubscribe@freebsd.org >> > " >> > >> >> > I am not able to test this at the moment, nor likely in the very near > future, but I did have a few minutes to do some code reading and now > believe that the following is part of the problem, if not the entire > problem. Using r353803 as a reference, I believe line 1323 in > sys/dev/vmware/vmxnet3/if_vmx.c (in vmxnet3_isc_txd_encap()) should be: > > sop->hlen = hdrlen + ipi->ipi_tcp_hlen; > > instead of the current: > > sop->hlen = hdrlen; > > This can be seen by going back to r333813 and examining the CSUM_TSO case > of vmxnet3_txq_offload_ctx(). The final increment of *start in that case > is what was literally lost in translation when converting the driver to > iflib. > > -Patrick >