From nobody Thu Nov 25 07:38:56 2021 X-Original-To: dev-commits-src-branches@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 8C358189CDE8; Thu, 25 Nov 2021 07:39:09 +0000 (UTC) (envelope-from freebsd@oldach.net) Received: from nuc.oldach.net (hmo.in-vpn.de [IPv6:2001:67c:1407:60::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "nuc.oldach.net", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4J08rr2F5Vz4q71; Thu, 25 Nov 2021 07:39:08 +0000 (UTC) (envelope-from freebsd@oldach.net) Received: from nuc.oldach.net (localhost [127.0.0.1]) by nuc.oldach.net (8.17.1/8.17.1/hmo17dec20) with ESMTPS id 1AP7cvt2042556 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 25 Nov 2021 08:38:57 +0100 (CET) (envelope-from freebsd@oldach.net) Received: (from hmo@localhost) by nuc.oldach.net (8.17.1/8.17.1/Submit) id 1AP7cuCu042555; Thu, 25 Nov 2021 08:38:56 +0100 (CET) (envelope-from freebsd@oldach.net) Message-Id: <202111250738.1AP7cuCu042555@nuc.oldach.net> Subject: Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features In-Reply-To: <8664c9a1-f000-d07f-5f48-3b18d3e5f629@freebsd.org> from Allan Jude at "24 Nov 2021 13:02:47" To: allanjude@freebsd.org (Allan Jude) Date: Thu, 25 Nov 2021 08:38:56 +0100 (CET) Cc: manu@bidouilliste.com, src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: freebsd@oldach.net (Helge Oldach) X-No-Archive: Yes List-Id: Commits to the stable branches of the FreeBSD src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-branches List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-branches@freebsd.org X-BeenThere: dev-commits-src-branches@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: inspected by milter-greylist-4.6.4 (nuc.oldach.net [0.0.0.0]); Thu, 25 Nov 2021 08:38:57 +0100 (CET) for IP:127.0.0.1 DOMAIN:localhost HELO:nuc.oldach.net FROM:freebsd@oldach.net RCPT: X-Rspamd-Queue-Id: 4J08rr2F5Vz4q71 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of freebsd@oldach.net designates 2001:67c:1407:60::1 as permitted sender) smtp.mailfrom=freebsd@oldach.net X-Spamd-Result: default: False [-3.30 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[oldach.net]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_NO_DN(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:29670, ipnet:2001:67c:1400::/45, country:DE]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-Spam: Yes X-ThisMailContainsUnwantedMimeParts: N Hi, Allan Jude wrote on Wed, 24 Nov 2021 19:02:47 +0100 (CET): > On 11/24/2021 3:30 AM, Emmanuel Vadot wrote: > > On Tue, 23 Nov 2021 20:36:40 +0100 (CET) > > freebsd@oldach.net (Helge Oldach) wrote: > > > >> Allan Jude wrote on Tue, 23 Nov 2021 20:14:53 +0100 (CET): > >>> On 11/23/2021 5:00 AM, Helge Oldach wrote: > >>>> Allan Jude wrote on Mon, 22 Nov 2021 19:14:13 +0100 (CET): > >>>> Hmmm. On a RPi4/8G: > >>>> > >>>> Before (FreeBSD 13.0-STABLE (GENERIC) #366 stable/13-n248173-d16fbc488e6): > >>>> | type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > >>>> | aes-256-gcm 35791.98k 38533.57k 39986.77k 41397.59k 39840.43k 39638.36k > >>>> > >>>> After (FreeBSD 13.0-STABLE (GENERIC) #367 stable/13-n248176-f085bb0e621) > >>>> > >>>> | type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > >>>> | aes-256-gcm 21277.62k 23226.64k 23613.90k 23687.51k 23892.93k 23947.95k > >>>> > >>>> It seems that AES throughput is actually cut by almost half? > >>> > >>> Do you know which of the CPU optimizations your RPi4 supports? > >> > >> Is this what you need? > >> > >> Instruction Set Attributes 0 = > > > > So there is no AES+PMULL instruction set on RPI4, I guess that openssl > > uses them for aes-gcm. > > > > I wonder what it uses before that make it have this boost. > > > > On my rockpro64 I do see the improvement btw : > > root@generic:~ # cpuset -l 4,5 openssl speed -evp aes-256-gcm > > ... > > aes-256-gcm 122861.59k 337938.39k 565408.44k 661223.09k 709175.19k 712327.25k > > root@generic:~ # cpuset -l 4,5 env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm > > ... > > aes-256-gcm 34068.11k 38068.62k 39435.24k 39818.75k 39905.34k 39922.35k > > > > Running on the big cores at max freq. > > > >> Instruction Set Attributes 1 = <> > >> Processor Features 0 = > >> Processor Features 1 = <> > >> Memory Model Features 0 = > >> Memory Model Features 1 = <8bit VMID> > >> Memory Model Features 2 = <32bit CCIDX,48bit VA> > >> Debug Features 0 = > >> Debug Features 1 = <> > >> Auxiliary Features 0 = <> > >> Auxiliary Features 1 = <> > >> AArch32 Instruction Set Attributes 5 = > >> AArch32 Media and VFP Features 0 = > >> AArch32 Media and VFP Features 1 = > >> > >>> You can set the environment variable OPENSSL_armcap to override > >>> OpenSSL's detection. > >>> > >>> Try: env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm > >> > >> On FreeBSD 13.0-STABLE (GENERIC) #367 stable/13-n248176-f085bb0e621 again (i.e. after this commit): > >> > >> hmo@p48 ~ $ env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm > >> Doing aes-256-gcm for 3s on 16 size blocks: 6445704 aes-256-gcm's in 3.08s > >> Doing aes-256-gcm for 3s on 64 size blocks: 1861149 aes-256-gcm's in 3.00s > >> Doing aes-256-gcm for 3s on 256 size blocks: 479664 aes-256-gcm's in 3.01s > >> Doing aes-256-gcm for 3s on 1024 size blocks: 122853 aes-256-gcm's in 3.04s > >> Doing aes-256-gcm for 3s on 8192 size blocks: 15181 aes-256-gcm's in 3.00s > >> Doing aes-256-gcm for 3s on 16384 size blocks: 7796 aes-256-gcm's in 3.07s > >> OpenSSL 1.1.1l-freebsd 24 Aug 2021 > >> built on: reproducible build, date unspecified > >> options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr) > >> compiler: clang > >> The 'numbers' are in 1000s of bytes per second processed. > >> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > >> aes-256-gcm 33504.57k 39704.51k 40825.01k 41394.83k 41454.25k 41601.52k > >> hmo@p48 ~ $ openssl speed -evp aes-256-gcm > >> Doing aes-256-gcm for 3s on 16 size blocks: 4066201 aes-256-gcm's in 3.00s > >> Doing aes-256-gcm for 3s on 64 size blocks: 1087387 aes-256-gcm's in 3.00s > >> Doing aes-256-gcm for 3s on 256 size blocks: 280110 aes-256-gcm's in 3.03s > >> Doing aes-256-gcm for 3s on 1024 size blocks: 70412 aes-256-gcm's in 3.04s > >> Doing aes-256-gcm for 3s on 8192 size blocks: 8762 aes-256-gcm's in 3.00s > >> Doing aes-256-gcm for 3s on 16384 size blocks: 4402 aes-256-gcm's in 3.02s > >> OpenSSL 1.1.1l-freebsd 24 Aug 2021 > >> built on: reproducible build, date unspecified > >> options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr) > >> compiler: clang > >> The 'numbers' are in 1000s of bytes per second processed. > >> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > >> aes-256-gcm 21686.41k 23197.59k 23656.30k 23725.04k 23926.10k 23916.23k > >> hmo@p48 ~ $ > >> > >> Kind regards, > >> Helge > > > > > > So based on results from Manu, and Mark Millard, it seems almost every > ARM platform is faster when it takes advantage of the CPU features, > except the RPi4(B). > > As Manu pointed out, it doesn't appear to have the AES+PMULL feature, > which means it must be something else that is slowing it down. > > What might help, is to try each feature in turn, and figure out which > one is causing slower results. > > #define HWCAP_FP 0x00000001 > #define HWCAP_ASIMD 0x00000002 > #define HWCAP_EVTSTRM 0x00000004 > #define HWCAP_AES 0x00000008 > #define HWCAP_PMULL 0x00000010 > #define HWCAP_SHA1 0x00000020 > #define HWCAP_SHA2 0x00000040 > #define HWCAP_CRC32 0x00000080 > > So try: > env OPENSSL_armcap=1 openssl speed -evp aes-256-gcm > as well as with armcap=2, 3 (both FP and ASIMD), 8 (just AES) etc. hmo@p48 ~ $ for f in 0 1 2 3 8 16 32 64 128 ; do echo -n $f:; env OPENSSL_armcap=$f openssl speed -evp aes-256-gcm 2>&1 | tail -1 | cut -wf7; done 0:42295.15k 1:23891.19k 2:42208.57k 3:23970.56k 8:42354.98k 16:42199.06k 32:size Illegal instruction (core dumped) 64:42322.42k 128:42275.00k hmo@p48 ~ $ So I guess HWCAP_FP is the culprit? Maybe related to hard/soft floating point math which indeed is kind of special on the Pi? > For ones where the CPU lacks the feature, it will crash with 'Illegal > instruction' > > Separately, it might also be interesting to see the results of `openssl > speed -evp sha256` before/after/with the different OPENSSL_armcap values Please let me know in case you still require this. Kind regards Helge