From owner-freebsd-net@freebsd.org Wed Feb 26 05:07:53 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6AD1A24BC67 for ; Wed, 26 Feb 2020 05:07:53 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: from mail-io1-f42.google.com (mail-io1-f42.google.com [209.85.166.42]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48S3hk4hQPz4jkr; Wed, 26 Feb 2020 05:07:50 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: by mail-io1-f42.google.com with SMTP id h8so1965258iob.2; Tue, 25 Feb 2020 21:07:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4Y4JqzSDdldrh4z0NhmltbkmQbFqiXAJE6X2fpdcC3A=; b=K5XmLcOCjmIVsuGukxlNbPEsPIGRbTf4zbeNGFnfZUSvK+Wh8y5LnBhsLgJ1COLZ6x Rnz35g5gR3ht91i0Vhf6ZjAEdig/7J0kAAg5bknOrkWEpxLdUwyacECcrva9fty10Ojy D9goaJwVfn9G/UenMyuIGq+fUETla3fF//LEKqeSvzBkJ6Ghh6e46UWECz6EZYwldmtm RjNzY86PZmPdIrq0e5k3jGtLI7XMovdu9DprNMq1YBRqOqzff9fUlbTH3Q1XQ++YVXld c4W/aWE2YYg/GjHOa/L1VlL/v4tZzYGL/aI4hHhg5p/gyLL1+ANXmcxlyU7SFXb+0APv klqQ== X-Gm-Message-State: APjAAAXEbOBHtO62lW1rfby/545rq3eK0AtZRiIxbNi4T4zANoNiMpOU up/PwBnMAvldWf/yQvtp8cPvGdxB20aaXEz3scQ9pQ== X-Google-Smtp-Source: APXvYqxz3aC8LsMyF+BLftU1iVyqiCxfKJUdFiYcmQvRVpeUJZaH2DbiowiTZDfjZC957sQ7gCV/MopMW/ufNg3AEmw= X-Received: by 2002:a6b:e202:: with SMTP id z2mr2371788ioc.23.1582693667501; Tue, 25 Feb 2020 21:07:47 -0800 (PST) MIME-Version: 1.0 References: <40c4a4df-3df6-d95d-53c2-eef905ff45b1@FreeBSD.org> <5e5d423b-0711-7454-626a-cc9cb4b004cd@FreeBSD.org> In-Reply-To: From: Patrick Kelsey Date: Wed, 26 Feb 2020 00:07:33 -0500 Message-ID: Subject: Re: terrible if_vmx / vmxnet3 rx performance with lro (post iflib) To: Josh Paetzel , Andriy Gapon Cc: freebsd-net X-Rspamd-Queue-Id: 48S3hk4hQPz4jkr X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of pkelsey@gmail.com designates 209.85.166.42 as permitted sender) smtp.mailfrom=pkelsey@gmail.com X-Spamd-Result: default: False [-3.02 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DMARC_NA(0.00)[freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; URI_COUNT_ODD(1.00)[15]; IP_SCORE(-2.02)[ip: (-5.37), ipnet: 209.85.128.0/17(-2.99), asn: 15169(-1.67), country: US(-0.05)]; TO_DN_ALL(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[42.166.85.209.list.dnswl.org : 127.0.5.0]; FORGED_SENDER(0.30)[pkelsey@freebsd.org,pkelsey@gmail.com]; RWL_MAILSPIKE_POSSIBLE(0.00)[42.166.85.209.rep.mailspike.net : 127.0.0.17]; MIME_TRACE(0.00)[0:+,1:+,2:~]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[pkelsey@freebsd.org,pkelsey@gmail.com]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Feb 2020 05:07:53 -0000 On Mon, Feb 24, 2020 at 11:40 PM Patrick Kelsey wrote: > > > On Thu, Feb 20, 2020 at 4:58 PM Josh Paetzel wrote: > >> >> >> On Wed, Feb 19, 2020, at 7:17 AM, Andriy Gapon wrote: >> > On 18/02/2020 16:09, Andriy Gapon wrote: >> > > My general experience with post-iflib vmxnet3 is that vmxnet3 has some >> > > peculiarities that result in a certain "impedance mismatch" with >> iflib. >> > > Although we now have a bit less code and it is a bit more regular, >> there are a >> > > few significant (for us, at least) problems: >> > > - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243126 >> > > - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240608 >> > >> > By the way, we (Panzura) use these changes to fix or work around the >> above two >> > problems: https://people.freebsd.org/~avg/iflib-vmx.pz.diff >> > >> > Questions / comments are welcome. >> > Especially from people who worked on iflib. >> > >> > > - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243392 >> > > - the problem described above >> > > - a couple of issues that we already fixed or worked around >> > > >> > > We are contemplating locally reverting to the pre-iflib vmxnet3 and >> we are >> > > wondering if the conversion was really worth it in general. >> > >> > >> > -- >> > Andriy Gapon >> > _______________________________________________ >> > freebsd-net@freebsd.org mailing list >> > https://lists.freebsd.org/mailman/listinfo/freebsd-net >> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> > >> >> I'd like to follow this up just to make it 100% clear. The problem is a >> ~4x regression in RX performance. It affects stock FreeBSD, including >> 12.1-RELEASE. >> >> In my 40Gbps connected lab single thread iperf receive went from 9Gbps to >> 2.5Gbps. >> >> If this can't be fixed or looked at I'd heavily suggest looking at >> reverting "iflib"ing change in stock FreeBSD. >> >> > Consider these datapoints I collected this evening: > > Hypervisor: ESXi 6.7.0 Build 8169922 > Hardware: Xeon E5-1650 v3 @ 3.50GHz (6 physical cores, HT disabled) > > iperf3 client: a VM on the same vswitch as the VM under test, running > Ubuntu 18.04.3 LTS with 2 vCPUs, 4GB RAM, and a VMXNET3 interface used only > for traffic to the VM under test (this VMXNET3 has checksum offload, > TSO/GSO, and LRO/GRO enabled) > iperf3 server: running on the VM under test, either a 12.0-RELEASE VM > (this is before the vmx iflib conversion), or a 12.1-RELEASE VM (this is > after the vmx iflib conversion) with r356703 applied (the recent TSO bug > fix). Both VMs have 3 vCPUs, but the vmx interface only uses 1 tx and 1 rx > queue, as hw.pci.honor_msi_blacklist is at its default of 0, so MSI is used. > > > Test 1: 12.0-RELEASE, single TCP stream receive, standard mtu, TSO > enabled, LRO disabled > ====== > vmx0: flags=8843 metric 0 mtu 1500 > > options=60039b > $ iperf3 -c <12.0 VM IP> -p 1234 > Connecting to host <12.0 VM IP>, port 1234 > [ 4] local port 44664 connected to <12.0 VM IP> port 1234 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 1.11 GBytes 9.52 Gbits/sec 1144 529 KBytes > [ 4] 1.00-2.00 sec 1.09 GBytes 9.40 Gbits/sec 1272 369 KBytes > [ 4] 2.00-3.00 sec 1.11 GBytes 9.51 Gbits/sec 1249 344 KBytes > [ 4] 3.00-4.00 sec 1.06 GBytes 9.12 Gbits/sec 1973 369 KBytes > [ 4] 4.00-5.00 sec 1.11 GBytes 9.50 Gbits/sec 1860 370 KBytes > [ 4] 5.00-6.00 sec 1.08 GBytes 9.28 Gbits/sec 1342 396 KBytes > [ 4] 6.00-7.00 sec 1.09 GBytes 9.38 Gbits/sec 1278 563 KBytes > [ 4] 7.00-8.00 sec 1.05 GBytes 8.99 Gbits/sec 1226 372 KBytes > [ 4] 8.00-9.00 sec 1.03 GBytes 8.87 Gbits/sec 1145 400 KBytes > [ 4] 9.00-10.00 sec 1.08 GBytes 9.28 Gbits/sec 1317 354 KBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 13806 > sender > [ 4] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec > receiver > > > Test 2: 12.0-RELEASE, single TCP stream receive, standard mtu, TSO > enabled, LRO enabled > ====== > vmx0: flags=8843 metric 0 mtu 1500 > > options=60079b > $ iperf3 -c <12.0 VM IP> -p 1234 > Connecting to host <12.0 VM IP>, port 1234 > [ 4] local port 44714 connected to <12.0 VM IP> port 1234 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 3.48 GBytes 29.9 Gbits/sec 0 887 KBytes > [ 4] 1.00-2.00 sec 1.93 GBytes 16.6 Gbits/sec 0 994 KBytes > [ 4] 2.00-3.00 sec 2.03 GBytes 17.5 Gbits/sec 0 1.10 MBytes > [ 4] 3.00-4.00 sec 1.99 GBytes 17.1 Gbits/sec 0 1.10 MBytes > [ 4] 4.00-5.00 sec 2.00 GBytes 17.1 Gbits/sec 0 1.10 MBytes > [ 4] 5.00-6.00 sec 1.93 GBytes 16.6 Gbits/sec 0 1.10 MBytes > [ 4] 6.00-7.00 sec 2.04 GBytes 17.5 Gbits/sec 0 1.10 MBytes > [ 4] 7.00-8.00 sec 2.01 GBytes 17.3 Gbits/sec 0 1.10 MBytes > [ 4] 8.00-9.00 sec 1.97 GBytes 16.9 Gbits/sec 0 1.10 MBytes > [ 4] 9.00-10.00 sec 1.98 GBytes 17.0 Gbits/sec 0 1.10 MBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 21.4 GBytes 18.3 Gbits/sec 0 > sender > [ 4] 0.00-10.00 sec 21.4 GBytes 18.3 Gbits/sec > receiver > > > Test 3: 12.0-RELEASE, single TCP stream receive, standard mtu, TSO > enabled, LRO disabled (LRO disabled and test run after Test 2 above) > ====== > vmx0: flags=8843 metric 0 mtu 1500 > > options=60039b > $ iperf3 -c <12.0 VM IP> -p 1234 > Connecting to host <12.0 VM IP>, port 1234 > [ 4] local port 44718 connected to <12.0 VM IP> port 1234 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 1.14 GBytes 9.76 Gbits/sec 1871 338 KBytes > [ 4] 1.00-2.00 sec 483 MBytes 4.05 Gbits/sec 1307 1.41 KBytes > [ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes > [ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > [ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes > [ 4] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > [ 4] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > [ 4] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes > [ 4] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > [ 4] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 1.61 GBytes 1.38 Gbits/sec 3181 > sender > [ 4] 0.00-10.00 sec 1.60 GBytes 1.38 Gbits/sec > receiver > > > Test 4: 12.0-RELEASE, single TCP stream transmit, standard mtu, TSO > enabled, LRO enabled > ====== > vmx0: flags=8843 metric 0 mtu 1500 > > options=60079b > $ iperf3 -R -c <12.0 VM IP> -p 1234 > Connecting to host <12.0 VM IP>, port 1234 > Reverse mode, remote host <12.0 VM IP> is sending > [ 4] local port 44726 connected to <12.0 VM IP> port 1234 > [ ID] Interval Transfer Bandwidth > [ 4] 0.00-1.00 sec 4.28 GBytes 36.8 Gbits/sec > [ 4] 1.00-2.00 sec 3.31 GBytes 28.4 Gbits/sec > [ 4] 2.00-3.00 sec 3.85 GBytes 33.1 Gbits/sec > [ 4] 3.00-4.00 sec 4.24 GBytes 36.5 Gbits/sec > [ 4] 4.00-5.00 sec 3.16 GBytes 27.1 Gbits/sec > [ 4] 5.00-6.00 sec 3.54 GBytes 30.4 Gbits/sec > [ 4] 6.00-7.00 sec 4.03 GBytes 34.6 Gbits/sec > [ 4] 7.00-8.00 sec 2.93 GBytes 25.1 Gbits/sec > [ 4] 8.00-9.00 sec 3.42 GBytes 29.4 Gbits/sec > [ 4] 9.00-10.00 sec 3.93 GBytes 33.8 Gbits/sec > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 36.7 GBytes 31.5 Gbits/sec 280 > sender > [ 4] 0.00-10.00 sec 36.7 GBytes 31.5 Gbits/sec > receiver > > > Test 5: 12.1-RELEASE with r356703 applied, single stream receive, standard > mtu, TSO enabled, LRO disabled > ====== > vmx0: flags=8843 metric 0 mtu 1500 > > options=e403bb > $ iperf3 -c <12.1 VM IP> -p 1234 > Connecting to host <12.1 VM IP>, port 1234 > [ 4] local port 48392 connected to <12.1 VM IP> port 1234 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 828 MBytes 6.95 Gbits/sec 1247 335 KBytes > [ 4] 1.00-2.00 sec 901 MBytes 7.56 Gbits/sec 1841 345 KBytes > [ 4] 2.00-3.00 sec 909 MBytes 7.62 Gbits/sec 1805 356 KBytes > [ 4] 3.00-4.00 sec 909 MBytes 7.62 Gbits/sec 2337 322 KBytes > [ 4] 4.00-5.00 sec 907 MBytes 7.61 Gbits/sec 1834 354 KBytes > [ 4] 5.00-6.00 sec 907 MBytes 7.61 Gbits/sec 1984 352 KBytes > [ 4] 6.00-7.00 sec 909 MBytes 7.62 Gbits/sec 2189 329 KBytes > [ 4] 7.00-8.00 sec 908 MBytes 7.62 Gbits/sec 2000 338 KBytes > [ 4] 8.00-9.00 sec 907 MBytes 7.61 Gbits/sec 2006 315 KBytes > [ 4] 9.00-10.00 sec 908 MBytes 7.61 Gbits/sec 1764 332 KBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 8.78 GBytes 7.54 Gbits/sec 19007 > sender > [ 4] 0.00-10.00 sec 8.78 GBytes 7.54 Gbits/sec > receiver > > > Test 6: 12.1-RELEASE with r356703 applied, single stream receive, standard > mtu, TSO enabled, LRO disabled, sysctl dev.vmx.0.iflib.tx_abdicate=1 > ====== > vmx0: flags=8843 metric 0 mtu 1500 > > options=e403bb > $ iperf3 -c <12.1 VM IP> -p 1234 > Connecting to host <12.1 VM IP>, port 1234 > [ 4] local port 48416 connected to <12.1 VM IP> port 1234 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 1.29 GBytes 11.1 Gbits/sec 3016 290 KBytes > [ 4] 1.00-2.00 sec 1.33 GBytes 11.4 Gbits/sec 4133 322 KBytes > [ 4] 2.00-3.00 sec 1.34 GBytes 11.5 Gbits/sec 5409 335 KBytes > [ 4] 3.00-4.00 sec 1.35 GBytes 11.6 Gbits/sec 3899 376 KBytes > [ 4] 4.00-5.00 sec 1.35 GBytes 11.6 Gbits/sec 4609 300 KBytes > [ 4] 5.00-6.00 sec 1.35 GBytes 11.6 Gbits/sec 4603 303 KBytes > [ 4] 6.00-7.00 sec 1.36 GBytes 11.7 Gbits/sec 4417 293 KBytes > [ 4] 7.00-8.00 sec 1.34 GBytes 11.5 Gbits/sec 5680 290 KBytes > [ 4] 8.00-9.00 sec 1.33 GBytes 11.5 Gbits/sec 5461 359 KBytes > [ 4] 9.00-10.00 sec 1.03 GBytes 8.86 Gbits/sec 5060 329 KBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 13.1 GBytes 11.2 Gbits/sec 46287 > sender > [ 4] 0.00-10.00 sec 13.1 GBytes 11.2 Gbits/sec > receiver > > > Test 7: 12.1-RELEASE with r356703 applied, single stream receive, standard > mtu, TSO enabled, LRO enabled > ====== > vmx0: flags=8843 metric 0 mtu 1500 > > options=e407bb > $ iperf3 -c <12.1 VM IP> -p 1234 > Connecting to host <12.1 VM IP>, port 1234 > [ 4] local port 48396 connected to <12.1 VM IP> port 1234 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 98.5 MBytes 826 Mbits/sec 129 2.83 KBytes > [ 4] 1.00-2.00 sec 63.6 KBytes 521 Kbits/sec 25 2.83 KBytes > [ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 25 2.83 KBytes > [ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 16 2.83 KBytes > [ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 15 2.83 KBytes > [ 4] 5.00-6.00 sec 63.6 KBytes 521 Kbits/sec 15 2.83 KBytes > [ 4] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 15 2.83 KBytes > [ 4] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 12 2.83 KBytes > [ 4] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 15 2.83 KBytes > [ 4] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 11 1.41 KBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 98.7 MBytes 82.8 Mbits/sec 278 > sender > [ 4] 0.00-10.00 sec 97.8 MBytes 82.0 Mbits/sec > receiver > > > Test 8: 12.1-RELEASE with r356703 applied, single stream transmit, > standard mtu, TSO enabled, LRO disabled > ====== > vmx0: flags=8843 metric 0 mtu 1500 > > options=e403bb > $ iperf3 -R -c <12.1 VM IP> -p 1234 > Connecting to host <12.1 VM IP>, port 1234 > Reverse mode, remote host <12.1 VM IP> is sending > [ 4] local port 48400 connected to <12.1 VM IP> port 1234 > [ ID] Interval Transfer Bandwidth > [ 4] 0.00-1.00 sec 4.25 GBytes 36.5 Gbits/sec > [ 4] 1.00-2.00 sec 3.29 GBytes 28.3 Gbits/sec > [ 4] 2.00-3.00 sec 3.61 GBytes 31.0 Gbits/sec > [ 4] 3.00-4.00 sec 3.93 GBytes 33.8 Gbits/sec > [ 4] 4.00-5.00 sec 4.17 GBytes 35.8 Gbits/sec > [ 4] 5.00-6.00 sec 3.53 GBytes 30.3 Gbits/sec > [ 4] 6.00-7.00 sec 3.22 GBytes 27.7 Gbits/sec > [ 4] 7.00-8.00 sec 3.90 GBytes 33.5 Gbits/sec > [ 4] 8.00-9.00 sec 2.80 GBytes 24.1 Gbits/sec > [ 4] 9.00-10.00 sec 2.78 GBytes 23.9 Gbits/sec > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 35.5 GBytes 30.5 Gbits/sec 571 > sender > [ 4] 0.00-10.00 sec 35.5 GBytes 30.5 Gbits/sec > receiver > > > Based on the above, it looks like: > > (1) The non-LRO single-stream TCP receive performance of the iflib vmx > driver in 12.1 release lags behind the non-LRO single-stream TCP receive > performance of the pre-iflib vmx driver in 12.0 (by about 20%, 7.54 Gbps > [Test 5] vs 9.28 Gbps [Test 1]), unless tx_abdicate is enabled, in which > case the vmx driver performs better (by about 20%, 11.2 Gbps [Test 6] vs > 9.28 Gbps [Test 1]). > > (2) The TSO-enabled single-stream TCP send performance of the iflib vmx > driver in 12.1 release (with TSO bug patch applied) is at parity with the > pre-iflib vmx driver in 12.0 (30.5 Gbps [Test 8] and 31.5 Gbps [Test 4]). > > (3) There are LRO-related bugs in both the pre-iflib vmx driver in 12.0 > (see Test 3) and the iflib vmx driver in 12.1 (see Test 7), they just > surface differently. > > The categories of root causes for bugs and performance issues are: bugs in > the vmx driver, bugs in iflib, and behavioral variations across the many > fielded versions of the VMXNET3 virtual device. Indeed, all of these > categories have been encountered in the past year. Also, there is a rich > set of driver configuration and operating environment parameters, which > makes advancing the overall robustness of the driver (instead of just > shifting issues into or out of one's own operating parameter space) an > arduous task. > > I think the right way to approach this is to continue to fill out the test > matrix and root cause and resolve all of the issues encountered, rather > than argue for reverting to the old driver out of frustration based on a > narrow set of (so far, rather poorly characterized) circumstances. I'm in > a position to do this, from the standpoint of substantial knowledge of the > vmx driver and virtual device, as well as of iflib internals, and I will be > doing this, as non-work cycles become available. > > I spent a bit of time poking at this, and I believe I have root caused all of the reported issues and developed patches (to both iflib and the vmx driver) that solve them. My test system running 12.1 with these patches applied (as well as the TSO patch) operates correctly with and without TSO and/or LRO enabled, and with large MTU values. It exhibits throughput performance parity or better compared to the pre-iflib driver for the single-core / single-stream tests that I am currently using to assess correctness. The primary issue (that resulted in the reported free-list related assertion failures, use-after-free panics, trouble related to jumbo frames, and trouble with LRO) was that both the vmx driver and iflib needed to be fixed in order to correctly handle the case where the vmx virtual device skips descriptors. It's not known why the virtual device sometimes skips descriptors, but this seems to occur frequently, at least under ESXi, when packets span multiple descriptors. A secondary issue was fixed (secondary in that it impacts performance but not correctness) in which the vmx driver was only ever using cluster-sized receive buffers regardless of the MTU, instead of switching to page-sized buffers when the MTU is sufficiently large. There remains an open question as to whether the vmx virtual device consumes a buffer descriptor or not when the completion descriptor indicates zero length. So far I haven't been able to cause zero-length completions to occur. There also remains a concept fail in iflib concerning the refill of receive descriptor rings that can be worked around, to a point, with a sysctl, but that at some point needs to be fixed properly. iflib limits the number of received packets it will process during a receive interrupt according to a budget value, and then it also limits the number of receive descriptors it will refill according to that same budget value (with a magic constant added to it). Generally, packets can span multiple descriptors, and limiting the refill to essentially the number of packets processed completely fails to address this multiplicity, resulting in terrible performance degradation when multi-segment packets are in heavy use (e.g., with LRO or large MTUs). It will take a bit more time to write up all the associated details, post the patches for review, and update the bugs. I think avg@ will recognize in those details the completion of a number of thoughts that he had while trying to debug this. I also think the TSO patch, as well as the correctness fixes noted above, should at some point wind up in an errata release for 12.1. -Patrick