From nobody Tue Sep 17 03:25:07 2024 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X76bw6BhMz5Vw2w for ; Tue, 17 Sep 2024 03:25:16 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4X76bw5S1Xz4Bq9; Tue, 17 Sep 2024 03:25:16 +0000 (UTC) (envelope-from zlei@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1726543516; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SECk8gjz4e8cS+mDwjaTkzsNlky0ESAv5mPt+Jw7s10=; b=RtF7IqRllGXl/3oBhd/nXdQeqmWijn3UTfZZWtSNQI9NMpyEShnmResLbqaMOuP8fGGqUr IYFnkEmiue8JpVGJB6BgmrPRg6zjgCGb8ajJ9ucsfIOump0HUbAW/F2z2RAD+0OOUvRaaR L8/dyal/c0UWqdZMuHB6GdKBCjQrHjH2h51HZnYUgjlUsyLLhovDrq8yM3z8nppnAEcN9y rr5sHEBuPReY0kTGKf05kZOFhAntCrScIw8JVdWRQ5NNB+xmY7g8EExre8/oJ6zoPDNjGm 2exnqfn5oIBTzyO8gUWCbaCFrruSRZxTwYneJwYSI8f3qIJO/rDdGqaPX5HuRg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1726543516; a=rsa-sha256; cv=none; b=h/MbNCIr3pz2QSxerdNag50YKxpVJV/wxEWmrj9SJfv74knCV65KlJ+zVdmc9upQfNLqBF JWjtn1pLCNZ6anCtnz1ANM+N9KNvQnY0iUXSfeY9hdNr7CxMpaW7JPqApTINVbYcYL2p74 FC8SFHxY+1VB17VQO08NrZUCqs61ra8NoiuV3TdEdB0GqBn5+99cpqkLYKDC+qF5Qp5AS5 qV+C/yEjN4OI0Gcr240Zj5aT98PxKbZ6ALzl69mzPA9ih3SQa1d0fLHFiAEDu6kzE1SGQd CcehoPYxH/8HTxW9NNlN8Q48Isory5IUAtdISAMQca5VAmlHpoFTrFDtFoFQPQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1726543516; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SECk8gjz4e8cS+mDwjaTkzsNlky0ESAv5mPt+Jw7s10=; b=v58w1mMLD3xSHLVoknWsETurrh5UHwJ6OU9ykuVlHVeRfu9q9ynw4bm1gh7DR5eWiIgPSf c85GlK8F2OLUHI8f+ySUBAdkZc6luXz1fA9TmHQYi75aXI6tBpb2F5M4QvNYVBuY2i8CY2 WE/xEFT8kohR4xiRA27tSfjEN4cdOGJ2YvA71Bmphl6rJTA+rGhre8HGn/P8beymIHie3M MwsifNvClECqTPeFQ6u9ZdiDHaRhXqrrdCy0Etndya0DswM8go5eMONKSTUI7b4HUS4wBg IYg4n1egqenVDRMIrLZQV9IHluxVV67/ntPPExtcT6pHNvPPQlDJvjupAESKeg== Received: from smtpclient.apple (unknown [IPv6:2001:19f0:6001:9db:98f0:9fe0:3545:10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: zlei/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4X76bv2RnjzX78; Tue, 17 Sep 2024 03:25:15 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Content-Type: text/plain; charset=us-ascii List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.10\)) Subject: Re: Performance issues with vnet jails + epair + bridge From: Zhenlei Huang In-Reply-To: <214411726497902@mail.yandex.ru> Date: Tue, 17 Sep 2024 11:25:07 +0800 Cc: Sad Clouds , Mark Saad , FreeBSD Net Content-Transfer-Encoding: quoted-printable Message-Id: <7E65BDDA-7105-4C07-9243-4FAF2B4D5515@FreeBSD.org> References: <20240913100938.3eac55c9fbd976fa72d58bb5@gmail.com> <39B2C95D-1E4F-4133-8923-AD305DFA9435@longcount.org> <20240913155439.1e171a88bd01ce9b97558a90@gmail.com> <214411726497902@mail.yandex.ru> To: Aleksandr Fedorov X-Mailer: Apple Mail (2.3696.120.41.1.10) > On Sep 16, 2024, at 10:47 PM, Aleksandr Fedorov = wrote: >=20 > If we are talking about local traffic between jails and/or host, then = in terms of TCP throughput we have a room to improve, for example: Without RSS option enabled, if_epair will only use one thread to move = packets between the pair of interfaces. I reviewed the code and I think it can be improved event without RSS. > =20 > 1. Stop calculating checksums for packets between VNET jails and host. I've local WIP for this, inspired by the introduction of IFCAP_VLAN_MTU. = Should have better improvement especially on low freq CPUs. > =20 > 2. Use large packets (TSO) up to 64k in size. > =20 > Just for example, a simple patch increases the throughput of = if_pair(4) between two ends from 10 Gbps to 30 Gbps. That is impressing ! > =20 > diff --git a/sys/net/if_epair.c b/sys/net/if_epair.c > index aeed993249f5..79c2dfcfc445 100644 > --- a/sys/net/if_epair.c > +++ b/sys/net/if_epair.c > @@ -164,6 +164,10 @@ epair_tx_start_deferred(void *arg, int pending) > while (m !=3D NULL) { > n =3D STAILQ_NEXT(m, m_stailqpkt); > m->m_nextpkt =3D NULL; > + > + m->m_pkthdr.csum_flags =3D CSUM_IP_CHECKED | = CSUM_IP_VALID | CSUM_DATA_VALID | CSUM_PSEUDO_HDR; > + m->m_pkthdr.csum_data =3D 0xFFFF; > + > if_input(ifp, m); > m =3D n; > } > @@ -538,8 +542,9 @@ epair_setup_ifp(struct epair_softc *sc, char = *name, int unit) > ifp->if_dunit =3D unit; > ifp->if_flags =3D IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST; > ifp->if_flags |=3D IFF_KNOWSEPOCH; > - ifp->if_capabilities =3D IFCAP_VLAN_MTU; > - ifp->if_capenable =3D IFCAP_VLAN_MTU; > + ifp->if_capabilities =3D IFCAP_VLAN_MTU | IFCAP_HWCSUM | = IFCAP_HWCSUM_IPV6 | IFCAP_TSO; > + ifp->if_capenable =3D ifp->if_capabilities; > + ifp->if_hwassist =3D (CSUM_IP | CSUM_TCP | CSUM_UDP | = CSUM_IP_TSO); I've not tried TSO on if_epair yet. TSO has special treatment so I guess = the above is not sufficient. > ifp->if_transmit =3D epair_transmit; > ifp->if_qflush =3D epair_qflush; > ifp->if_start =3D epair_start; > =20 > 14.09.2024, 05:45, "Zhenlei Huang" : > =20 > =20 >=20 > On Sep 13, 2024, at 10:54 PM, Sad Clouds = wrote: > =20 > On Fri, 13 Sep 2024 08:08:02 -0400 > Mark Saad wrote: > =20 > Sad > Can you go back a bit you mentioned there is a RPi in the mix ? = Some of the raspberries have their nic usb attached under the covers . = Which will kill the total speed of things. > =20 > Can you cobble together a diagram of what you have on either end ?=20 > Hello, I'm not sending data across the network, only between the host > and the jails. I'm trying to evaluate how FreeBSD handles TCP data > locally within a single host. >=20 > When you take vnet into account, the **locally** traffic should within > on single vnet jail. If you want traffic across vnet jails, if_epair = or netgraph > hooks should be employed, and it of course will introduce some = overhead. >=20 > =20 > I understand that vnet jails will have more overhead, compared to a > shared TCP/IP stack via localhost. So I'm trying to measure it and = see > where the bottlenecks are. >=20 > The overhead of vnet jail should be neglectable, compared to legacy = jail > or no-jail. Bare in mind when VIMAGE option is enabled, there is a = default > vnet 0. It is not visible via jls and can not be destroyed. So when = you see > bottlenecks, for example this case, it is mostly caused by other = components > such as if_epair, but not the vnet jail itself. >=20 > =20 > The Raspberry Pi 4 host has a single vnet jail, exchanging data with > the host via epair(4) and if_bridge(4) interfaces. I don't really = know > what topology FreeBSD is using to represent all this so can't draw = any > diagrams, but I think all data flows through the kernel internally = and > never leaves the physical network interface. >=20 > For vnet jails, when you try to describe the network topology, you can > treat them as VM / physical boxes. >=20 > I have one box with dozens of vnet jails. Each of them has very single > responsibility, e.g. DHCP, LADP, pf firewall, OOB access. The topology = looks quite > clear and it is easy to maintenance. The only overhead is too much > hops between the vnet jail instances. For my use case the performance > is not critical and it works great for years. >=20 > =20 >=20 > Best regards, > Zhenlei