Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Aug 2023 07:07:39 +0000
From:      Wei Hu <weh@microsoft.com>
To:        Mark Millard <marklmi@yahoo.com>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   RE: Very slow scp performance comparing to Linux
Message-ID:  <SI2P153MB0441D7E1C0178139C687A340BBE7A@SI2P153MB0441.APCP153.PROD.OUTLOOK.COM>
In-Reply-To: <07C2C9E3-7317-43AF-A60C-393ADF90079D@yahoo.com>
References:  <948CAEBD-EB60-46B9-96EE-FE41CA6C64A1@yahoo.com> <07C2C9E3-7317-43AF-A60C-393ADF90079D@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Mark,

Sorry for the top posting. But I don't want to make it look too messy. Here=
 is the
Information that I have missed in my original email.

All VMs are running on Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-clas=
s CPU).

FreeBSD VMs are 16 vcpu with 128 GB memory, in non-debug build:
14.0-ALPHA1 FreeBSD 14.0-ALPHA1 amd64 1400094 #7 nodbg-n264692-59e706ffee52=
-dirty... /usr/obj/usr/src/main/amd64.amd64/sys/GENERIC-NODEBUG amd64

Ubuntu VMs are 4 vcpu with 32 GB memory, kernel version:
6.2.0-1009-azure #9~22.04.3-Ubuntu SMP Tue Aug  1 20:51:07 UTC 2023 x86_64 =
x86_64 x86_64 GNU/Linux

I did a couple more tests as suggested by others in this thread. In recap:

Scp to localhost, FreeBSD (ufs) vs Ubuntu (ext4): 70 MB/s vs 550 MB/s
Scp to localhost, FreeBSD (tmpfs) vs Ubuntu (tmpfs): 630 MB/s vs 660 MB/s

Iperf3 single stream to localhost: FreeBSD vs Ubuntu: 30.9 Gb/s vs 48.8 Gb/=
s

Would these numbers suggest that
1. ext4 caches a lot more than ufs?
2. there is a tcp performance gap in the network stack between FreeBSD and =
Ubuntu?

Would you also try run scp on ufs on your bare metal arm host? I am curious=
 to now how different between ufs and zfs.

Thanks,
Wei


> -----Original Message-----
> From: Mark Millard <marklmi@yahoo.com>
> Sent: Tuesday, August 29, 2023 12:16 AM
> To: Wei Hu <weh@microsoft.com>; FreeBSD Hackers <freebsd-
> hackers@freebsd.org>
> Subject: Re: Very slow scp performance comparing to Linux
>=20
> On Aug 28, 2023, at 08:43, Mark Millard <marklmi@yahoo.com> wrote:
>=20
> > Wei Hu <weh_at_microsoft.com> wrote on
> > Date: Mon, 28 Aug 2023 07:32:35 UTC :
> >
> >> When I was testing a new NIC, I found the single stream scp performanc=
e
> was almost 8 time slower than Linux on the RX side. Initially I thought i=
t might
> be something with the NIC. But when I switched to sending the file on
> localhost, the numbers stay the same.
> >>
> >> Here I was sending a 2GB file from sender to receiver using scp. FreeB=
SD is a
> recent NON-DEBUG build from CURRENT. The Ubuntu Linux kernel is 6.2.0.
> Both run in HyperV VMs on the same type of hardware. The FreeBSD VM has
> 16 vcpus, while Ubuntu VM has 4 vcpu.
> >>
> >> Sender Receiver throughput
> >> Linux FreeBSD 70 MB/s
> >> Linux Linux 550 MB/s
> >> FreeBSD FreeBSD 70 MB/s
> >> FreeBSD Linux 350 MB/s
> >> FreeBSD localhost 70 MB/s
> >> Linux localhost 550 MB/s
> >>
> >> From theses test, it seems I can rule out the issue on NIC and its dri=
ver.
> Looks the FreeBSD kernel network stack is much slower than Linux on singl=
e
> stream TCP, or there are some problem with scp?
> >>
> >> I also tried turning on following kernel parameters on FreeBSD kernel.=
 But it
> makes no difference, neither do the other tcp cc algorithms such as htcp =
and
> newreno.
> >>
> >> net.inet.tcp.soreceive_stream=3D"1"
> >> net.isr.maxthreads=3D"-1"
> >> net.isr.bindthreads=3D"1"
> >>
> >> net.inet.ip.intr_queue_maxlen=3D2048
> >> net.inet.tcp.recvbuf_max=3D16777216
> >> net.inet.tcp.recvspace=3D419430
> >> net.inet.tcp.sendbuf_max=3D16777216
> >> net.inet.tcp.sendspace=3D209715
> >> kern.ipc.maxsockbuf=3D16777216
> >>
> >> Any ideas?
> >
> >
> > You do not give explicit commands to try. Nor do you specify your
> > hardware context that is involved, just that HyperV is involved.
> >
> > So, on a HoneyComb (16 cortex-A72's) with Optane boot media in its
> > PCIe slot I, no HyperV or VM involved, tried:
>=20
> I should have listed the non-debug build in use:
>=20
> # uname -apKU
> FreeBSD CA72-16Gp-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64
> 1500000 #110 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:19:53 PDT
> 2023     root@CA72-16Gp-ZFS:/usr/obj/BUILDs/main-CA72-nodbg-
> clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64
> aarch64 1500000 1500000
>=20
> > # scp
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.i
> > mg root@localhost:FreeBSD-14-TEST.img
> > . . .
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                               =
                               100% 5120MB
> 120.2MB/s   00:42
> >
> > It is not a high performance system. 64 GiBytes of RAM.
> >
> > So instead trying a ThreadRipper 1950X that also has Optane in a CPIe
> > slot for its boot media, no HyperV or VM involved,
>=20
> I should have listed the non-debug build in use:
>=20
> # uname -apKU
> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 1500000
> #116 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:19:20 PDT 2023
> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-
> src/amd64.amd64/sys/GENERIC-NODBG amd64 amd64 1500000 1500000
>=20
> (Same source tree content.)
>=20
> > # scp
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.i
> > mg root@localhost:FreeBSD-14-TEST.img
> > . . .
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                               =
                               100% 5120MB
> 299.7MB/s   00:17
> >
> > (These systems do not run with any tmpfs areas, not even /tmp . So I'm
> > not providing that kind of example, at least for now.)
> >
> > 128 GiBytes of RAM.
> >
> > Both systems are ZFS based but with a simple single partition.
> > (Used for bectl BE not for other types of reasons to use ZFS.
> > I could boot UFS variants of the boot media and test that kind of
> > context.)
> >
> > So both show between your FreeBSD figure and the Linux figure.
> > I've no means of checking how reasonable the figures are relative to
> > your test context. I just know the results are better than you report
> > for localhost use.
>=20
> Adding a Windows Dev Kit 2023 booting via USB3 (but via a
> U.2 adapter to Optane media), again ZFS, again no VM involved:
>=20
> # uname -apKU
> FreeBSD CA78C-WDK23-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64
> 1500000 #13 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:20:31 PDT
> 2023     root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-CA78C-nodbg-
> clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA78C arm64
> aarch64 1500000 1500000
>=20
> # scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-
> 77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
> . . .
> FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                               =
                               100% 5120MB
> 168.7MB/s   00:30
>=20
>=20
> Note: the cortex-a72 and cortex-a78c/x1c builds were optimized via -mcpu=
=3D
> use. The ThreadRipper build was not.
>=20
>=20
> Note: I've not controlled for if the reads of the input *.img data were g=
otten
> from memory caching of prior activity or not. I could do so if you want: =
reboot
> before scp command.
>=20
> =3D=3D=3D
> Mark Millard
> marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SI2P153MB0441D7E1C0178139C687A340BBE7A>