From nobody Sun Sep 15 17:01:07 2024 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X6DpS1Cw3z5WNGl for ; Sun, 15 Sep 2024 17:01:20 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from mail-yw1-x1130.google.com (mail-yw1-x1130.google.com [IPv6:2607:f8b0:4864:20::1130]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4X6DpR0vffz4dP6 for ; Sun, 15 Sep 2024 17:01:19 +0000 (UTC) (envelope-from dfr@rabson.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=rabson-org.20230601.gappssmtp.com header.s=20230601 header.b=3KdWIFdW; dmarc=none; spf=pass (mx1.freebsd.org: domain of dfr@rabson.org designates 2607:f8b0:4864:20::1130 as permitted sender) smtp.mailfrom=dfr@rabson.org Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-6d3f017f80eso18954437b3.1 for ; Sun, 15 Sep 2024 10:01:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rabson-org.20230601.gappssmtp.com; s=20230601; t=1726419678; x=1727024478; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=MSp8NonbLHfPzCLPIrtYx1bYtlq6lwU/+aVm9yMeJOE=; b=3KdWIFdWBgrg6zUUPaFpdcsO2PFgOCQON7tfOz3gvkPwf9qefRqrpTFQcLcSOyFeqe fvL3ZLHl6QuyuCmfVFCtU5lLsP9e4xfWxFido89jFhxead9dxj8hVpHGU79uMGX2w0i8 fj78FM8ORr6ysnZYYjx3yaKF4vazoqPkT/6cCNgcQtaq+ps72cpJOFQXbbvmHabf1+VS oqm4M7mf/wbac3XqCB7sIh/pdABi9JTzAZM+m2TQzeKbWg3Fj5cd/ENEHsQz4v27R0tc blE0+lstBFsMI9M1J3J4+D060YZHXvLa9lkXGYaycmoB//JRCheltpLkwq9edCNOobav Eq3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726419678; x=1727024478; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MSp8NonbLHfPzCLPIrtYx1bYtlq6lwU/+aVm9yMeJOE=; b=j+wWPpRfD/D2U5qT3NyuUUWYItlOJoqaH9FkTrMBe9CvBsAb+KctZz84Mhc/14yvLG HOAiYEljS2W47U+LYK/wTbc0ScL2C3gFJ6oi1cQGZxqLhItfwr+ZSENvc/ydcHAEIv2D Nm1TBGwxD0zVG/icXsksbJwpp+2aNqWFTwwaAF5Lde/EhBqtSK9eRmGDTF/HRwL6pagi gOOa47QYIlGKUJ5ShoS0QP6bV3xdJ0QiE26vvh96gNOS0IZ2+yAVKtDZdyhhkVF58van t+StabTOAhc5jtVu7JA1sJTiGAanV+Q0Ff46cfTouI8wmgdJ7V4ckE+rMz04lw4jg5lC YMFg== X-Forwarded-Encrypted: i=1; AJvYcCUZpVJJFAP/CH+EX28LDQSEJ2dlOPSIaiRnHpKzEPpSJs1Gw5NZqdjxlk72xf7d6U265h23X7+Bq5Yu8Q==@freebsd.org X-Gm-Message-State: AOJu0Ywfn1NhgYMAM41O3XnofPWQB//HspBEJ6vwioRuT3k3DU604GIi 9tYC/Tiknx93oU+zZFI/FBfhDL0enDpd16eURHJmOdMrSjaujkFEmHWJe7LsvW385YzOrnRVNq9 D9c7dFap2LGfOMI7MQgjDtlnp/kRrCBC+zQzb9/LRt25UpaLk X-Google-Smtp-Source: AGHT+IGuBtArHT0qVGTFBB7VZaWXOYisMn6skrcFYVCQe50t3n8BeiGYAHco2MszCtuQnqBMa0QQZf8HLXRD+t0AWbs= X-Received: by 2002:a05:690c:6c09:b0:6dd:c6a8:5775 with SMTP id 00721157ae682-6ddc6a8593emr9918987b3.39.1726419678178; Sun, 15 Sep 2024 10:01:18 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org MIME-Version: 1.0 References: <20240913100938.3eac55c9fbd976fa72d58bb5@gmail.com> <39B2C95D-1E4F-4133-8923-AD305DFA9435@longcount.org> <20240913155439.1e171a88bd01ce9b97558a90@gmail.com> <20240914112516.cfb31bae68ab90b83ca7ad4b@gmail.com> In-Reply-To: <20240914112516.cfb31bae68ab90b83ca7ad4b@gmail.com> From: Doug Rabson Date: Sun, 15 Sep 2024 18:01:07 +0100 Message-ID: Subject: Re: Performance issues with vnet jails + epair + bridge To: Sad Clouds Cc: Zhenlei Huang , Mark Saad , FreeBSD Net Content-Type: multipart/alternative; boundary="000000000000a6cf0c06222b6702" X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.50 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; R_DKIM_ALLOW(-0.20)[rabson-org.20230601.gappssmtp.com:s=20230601]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; FREEFALL_USER(0.00)[dfr]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::1130:from]; MIME_TRACE(0.00)[0:+,1:+,2:~]; MISSING_XM_UA(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_ONE(0.00)[1]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; DMARC_NA(0.00)[rabson.org]; TO_DN_ALL(0.00)[]; RCVD_TLS_LAST(0.00)[]; MLMMJ_DEST(0.00)[freebsd-net@freebsd.org]; FREEMAIL_TO(0.00)[gmail.com]; DKIM_TRACE(0.00)[rabson-org.20230601.gappssmtp.com:+] X-Rspamd-Queue-Id: 4X6DpR0vffz4dP6 --000000000000a6cf0c06222b6702 Content-Type: text/plain; charset="UTF-8" I just did a throughput test with iperf3 client on a FreeBSD 14.1 host with an intel 10GB nic connecting to an iperf3 server running in a vnet jail on a truenas host (13.something) also with an intel 10GB nic and I get full 10GB throughput in this setup. In the past, I had to disable LRO on the truenas host for this to work properly. Doug. On Sat, 14 Sept 2024 at 11:25, Sad Clouds wrote: > On Sat, 14 Sep 2024 10:45:03 +0800 > Zhenlei Huang wrote: > > > The overhead of vnet jail should be neglectable, compared to legacy jail > > or no-jail. Bare in mind when VIMAGE option is enabled, there is a > default > > vnet 0. It is not visible via jls and can not be destroyed. So when you > see > > bottlenecks, for example this case, it is mostly caused by other > components > > such as if_epair, but not the vnet jail itself. > > Perhaps this needs a correction - the vnet itself may be OK, but due to > a single physical NIC on this appliance, I cannot use vnet jails > without virtualised devices like if_epair(4) and if_bridge(4). I think > there may be other scalability bottlenecks. > > I have a similar setup on Solaris > > Here devel is a Solaris zone with exclusive IP configuration, which I > think may be similar to FreeBSD vnet. It has a virtual NIC devel/net0 > which operates over the physical NIC also called net0 in the global > zone: > > $ dladm > LINK CLASS MTU STATE OVER > net0 phys 1500 up -- > net1 phys 1500 up -- > net2 phys 1500 up -- > net3 phys 1500 up -- > pkgsrc/net0 vnic 1500 up net0 > devel/net0 vnic 1500 up net0 > > If I run TCP bulk data benchmark with 64 concurrent threads, 32 > threads with server process in the global zone and 32 threads with > client process in the devel zone, then the system evenly spreads the > load across all CPU cores and none of them are sitting idle: > > $ mpstat -A core 1 > COR minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys st > idl sze > 0 0 0 2262 2561 4 4744 2085 209 7271 0 747842 272 528 > 0 0 8 > 1 0 0 3187 4209 2 9102 3768 514 10605 0 597012 221 579 > 0 0 8 > 2 0 0 2091 3251 7 6768 2884 307 9557 0 658124 244 556 > 0 0 8 > 3 0 0 1745 1786 16 3494 1520 176 8847 0 746373 273 527 > 0 0 8 > 4 0 0 2797 2767 3 5908 2414 371 7849 0 692873 253 547 > 0 0 8 > 5 0 0 2782 2359 5 4857 2012 324 9431 0 684840 251 549 > 0 0 8 > 6 0 0 4324 4133 0 9138 3592 538 12525 0 516342 191 609 > 0 0 8 > 7 0 0 2180 3249 0 6960 2926 321 8825 0 697861 257 543 > 0 0 8 > > With FreeBSD I tried "options RSS" and increasing "net.isr.maxthreads" > however this resulted in some really flaky kernel behavior. So I'm > thinking that if_epair(4) may be OK for some low-bandwidth use cases, > i.e. testing firewall rules, etc, but not suitable for things like > file/object storage servers, etc. > > --000000000000a6cf0c06222b6702 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I just did a throughput test with iperf3 client on a FreeB= SD 14.1 host with an intel 10GB nic connecting to an iperf3 server running = in a vnet jail on a truenas host (13.something) also with an intel 10GB nic= and I get full 10GB throughput in this setup. In the past, I had to disabl= e LRO on the truenas host for this to work properly.

Dou= g.



=
On Sat, 14 Sept 2024 at 11:25, Sad Cl= ouds <cryintothebluesky@g= mail.com> wrote:
On Sat, 14 Sep 2024 10:45= :03 +0800
Zhenlei Huang <zlei@FreeBSD.org> wrote:

> The overhead of vnet jail should be neglectable, compared to legacy ja= il
> or no-jail. Bare in mind when VIMAGE option is enabled, there is a def= ault
> vnet 0. It is not visible via jls and can not be destroyed. So when yo= u see
> bottlenecks, for example this case, it is mostly caused by other compo= nents
> such as if_epair, but not the vnet jail itself.

Perhaps this needs a correction - the vnet itself may be OK, but due to
a single physical NIC on this appliance, I cannot use vnet jails
without virtualised devices like if_epair(4) and if_bridge(4). I think
there may be other scalability bottlenecks.

I have a similar setup on Solaris

Here devel is a Solaris zone with exclusive IP configuration, which I
think may be similar to FreeBSD vnet. It has a virtual NIC devel/net0
which operates over the physical NIC also called net0 in the global
zone:

$ dladm
LINK=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 CLASS=C2=A0 =C2= =A0 =C2=A0MTU=C2=A0 =C2=A0 STATE=C2=A0 =C2=A0 OVER
net0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 phys=C2=A0 =C2= =A0 =C2=A0 1500=C2=A0 =C2=A0up=C2=A0 =C2=A0 =C2=A0 =C2=A0--
net1=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 phys=C2=A0 =C2= =A0 =C2=A0 1500=C2=A0 =C2=A0up=C2=A0 =C2=A0 =C2=A0 =C2=A0--
net2=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 phys=C2=A0 =C2= =A0 =C2=A0 1500=C2=A0 =C2=A0up=C2=A0 =C2=A0 =C2=A0 =C2=A0--
net3=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 phys=C2=A0 =C2= =A0 =C2=A0 1500=C2=A0 =C2=A0up=C2=A0 =C2=A0 =C2=A0 =C2=A0--
pkgsrc/net0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vnic=C2=A0 =C2=A0 =C2=A0 1500= =C2=A0 =C2=A0up=C2=A0 =C2=A0 =C2=A0 =C2=A0net0
devel/net0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vnic=C2=A0 =C2=A0 =C2=A0 1500= =C2=A0 =C2=A0up=C2=A0 =C2=A0 =C2=A0 =C2=A0net0

If I run TCP bulk data benchmark with 64 concurrent threads, 32
threads with server process in the global zone and 32 threads with
client process in the devel zone, then the system evenly spreads the
load across all CPU cores and none of them are sitting idle:

$ mpstat -A core 1
=C2=A0COR minf mjf xcal=C2=A0 intr ithr=C2=A0 csw icsw migr smtx=C2=A0 srw= =C2=A0 syscl=C2=A0 usr sys=C2=A0 st idl sze
=C2=A0 =C2=A00=C2=A0 =C2=A0 0=C2=A0 =C2=A00 2262=C2=A0 2561=C2=A0 =C2=A0 4 = 4744 2085=C2=A0 209 7271=C2=A0 =C2=A0 0 747842=C2=A0 272 528=C2=A0 =C2=A00= =C2=A0 =C2=A00=C2=A0 =C2=A08
=C2=A0 =C2=A01=C2=A0 =C2=A0 0=C2=A0 =C2=A00 3187=C2=A0 4209=C2=A0 =C2=A0 2 = 9102 3768=C2=A0 514 10605=C2=A0 =C2=A00 597012=C2=A0 221 579=C2=A0 =C2=A00= =C2=A0 =C2=A00=C2=A0 =C2=A08
=C2=A0 =C2=A02=C2=A0 =C2=A0 0=C2=A0 =C2=A00 2091=C2=A0 3251=C2=A0 =C2=A0 7 = 6768 2884=C2=A0 307 9557=C2=A0 =C2=A0 0 658124=C2=A0 244 556=C2=A0 =C2=A00= =C2=A0 =C2=A00=C2=A0 =C2=A08
=C2=A0 =C2=A03=C2=A0 =C2=A0 0=C2=A0 =C2=A00 1745=C2=A0 1786=C2=A0 =C2=A016 = 3494 1520=C2=A0 176 8847=C2=A0 =C2=A0 0 746373=C2=A0 273 527=C2=A0 =C2=A00= =C2=A0 =C2=A00=C2=A0 =C2=A08
=C2=A0 =C2=A04=C2=A0 =C2=A0 0=C2=A0 =C2=A00 2797=C2=A0 2767=C2=A0 =C2=A0 3 = 5908 2414=C2=A0 371 7849=C2=A0 =C2=A0 0 692873=C2=A0 253 547=C2=A0 =C2=A00= =C2=A0 =C2=A00=C2=A0 =C2=A08
=C2=A0 =C2=A05=C2=A0 =C2=A0 0=C2=A0 =C2=A00 2782=C2=A0 2359=C2=A0 =C2=A0 5 = 4857 2012=C2=A0 324 9431=C2=A0 =C2=A0 0 684840=C2=A0 251 549=C2=A0 =C2=A00= =C2=A0 =C2=A00=C2=A0 =C2=A08
=C2=A0 =C2=A06=C2=A0 =C2=A0 0=C2=A0 =C2=A00 4324=C2=A0 4133=C2=A0 =C2=A0 0 = 9138 3592=C2=A0 538 12525=C2=A0 =C2=A00 516342=C2=A0 191 609=C2=A0 =C2=A00= =C2=A0 =C2=A00=C2=A0 =C2=A08
=C2=A0 =C2=A07=C2=A0 =C2=A0 0=C2=A0 =C2=A00 2180=C2=A0 3249=C2=A0 =C2=A0 0 = 6960 2926=C2=A0 321 8825=C2=A0 =C2=A0 0 697861=C2=A0 257 543=C2=A0 =C2=A00= =C2=A0 =C2=A00=C2=A0 =C2=A08

With FreeBSD I tried "options RSS" and increasing "net.isr.m= axthreads"
however this resulted in some really flaky kernel behavior. So I'm
thinking that if_epair(4) may be OK for some low-bandwidth use cases,
i.e. testing firewall rules, etc, but not suitable for things like
file/object storage servers, etc.

--000000000000a6cf0c06222b6702--