Date: Thu, 25 Nov 2021 08:38:56 +0100 (CET) From: freebsd@oldach.net (Helge Oldach) To: allanjude@freebsd.org (Allan Jude) Cc: manu@bidouilliste.com, src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org Subject: Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features Message-ID: <202111250738.1AP7cuCu042555@nuc.oldach.net> In-Reply-To: <8664c9a1-f000-d07f-5f48-3b18d3e5f629@freebsd.org> from Allan Jude at "24 Nov 2021 13:02:47"
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, Allan Jude wrote on Wed, 24 Nov 2021 19:02:47 +0100 (CET): > On 11/24/2021 3:30 AM, Emmanuel Vadot wrote: > > On Tue, 23 Nov 2021 20:36:40 +0100 (CET) > > freebsd@oldach.net (Helge Oldach) wrote: > > > >> Allan Jude wrote on Tue, 23 Nov 2021 20:14:53 +0100 (CET): > >>> On 11/23/2021 5:00 AM, Helge Oldach wrote: > >>>> Allan Jude wrote on Mon, 22 Nov 2021 19:14:13 +0100 (CET): > >>>> Hmmm. On a RPi4/8G: > >>>> > >>>> Before (FreeBSD 13.0-STABLE (GENERIC) #366 stable/13-n248173-d16fbc488e6): > >>>> | type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > >>>> | aes-256-gcm 35791.98k 38533.57k 39986.77k 41397.59k 39840.43k 39638.36k > >>>> > >>>> After (FreeBSD 13.0-STABLE (GENERIC) #367 stable/13-n248176-f085bb0e621) > >>>> > >>>> | type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > >>>> | aes-256-gcm 21277.62k 23226.64k 23613.90k 23687.51k 23892.93k 23947.95k > >>>> > >>>> It seems that AES throughput is actually cut by almost half? > >>> > >>> Do you know which of the CPU optimizations your RPi4 supports? > >> > >> Is this what you need? > >> > >> Instruction Set Attributes 0 = <CRC32> > > > > So there is no AES+PMULL instruction set on RPI4, I guess that openssl > > uses them for aes-gcm. > > > > I wonder what it uses before that make it have this boost. > > > > On my rockpro64 I do see the improvement btw : > > root@generic:~ # cpuset -l 4,5 openssl speed -evp aes-256-gcm > > ... > > aes-256-gcm 122861.59k 337938.39k 565408.44k 661223.09k 709175.19k 712327.25k > > root@generic:~ # cpuset -l 4,5 env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm > > ... > > aes-256-gcm 34068.11k 38068.62k 39435.24k 39818.75k 39905.34k 39922.35k > > > > Running on the big cores at max freq. > > > >> Instruction Set Attributes 1 = <> > >> Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32> > >> Processor Features 1 = <> > >> Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,16TB PA> > >> Memory Model Features 1 = <8bit VMID> > >> Memory Model Features 2 = <32bit CCIDX,48bit VA> > >> Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8> > >> Debug Features 1 = <> > >> Auxiliary Features 0 = <> > >> Auxiliary Features 1 = <> > >> AArch32 Instruction Set Attributes 5 = <CRC32,SEVL> > >> AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD> > >> AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> > >> > >>> You can set the environment variable OPENSSL_armcap to override > >>> OpenSSL's detection. > >>> > >>> Try: env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm > >> > >> On FreeBSD 13.0-STABLE (GENERIC) #367 stable/13-n248176-f085bb0e621 again (i.e. after this commit): > >> > >> hmo@p48 ~ $ env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm > >> Doing aes-256-gcm for 3s on 16 size blocks: 6445704 aes-256-gcm's in 3.08s > >> Doing aes-256-gcm for 3s on 64 size blocks: 1861149 aes-256-gcm's in 3.00s > >> Doing aes-256-gcm for 3s on 256 size blocks: 479664 aes-256-gcm's in 3.01s > >> Doing aes-256-gcm for 3s on 1024 size blocks: 122853 aes-256-gcm's in 3.04s > >> Doing aes-256-gcm for 3s on 8192 size blocks: 15181 aes-256-gcm's in 3.00s > >> Doing aes-256-gcm for 3s on 16384 size blocks: 7796 aes-256-gcm's in 3.07s > >> OpenSSL 1.1.1l-freebsd 24 Aug 2021 > >> built on: reproducible build, date unspecified > >> options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr) > >> compiler: clang > >> The 'numbers' are in 1000s of bytes per second processed. > >> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > >> aes-256-gcm 33504.57k 39704.51k 40825.01k 41394.83k 41454.25k 41601.52k > >> hmo@p48 ~ $ openssl speed -evp aes-256-gcm > >> Doing aes-256-gcm for 3s on 16 size blocks: 4066201 aes-256-gcm's in 3.00s > >> Doing aes-256-gcm for 3s on 64 size blocks: 1087387 aes-256-gcm's in 3.00s > >> Doing aes-256-gcm for 3s on 256 size blocks: 280110 aes-256-gcm's in 3.03s > >> Doing aes-256-gcm for 3s on 1024 size blocks: 70412 aes-256-gcm's in 3.04s > >> Doing aes-256-gcm for 3s on 8192 size blocks: 8762 aes-256-gcm's in 3.00s > >> Doing aes-256-gcm for 3s on 16384 size blocks: 4402 aes-256-gcm's in 3.02s > >> OpenSSL 1.1.1l-freebsd 24 Aug 2021 > >> built on: reproducible build, date unspecified > >> options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr) > >> compiler: clang > >> The 'numbers' are in 1000s of bytes per second processed. > >> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > >> aes-256-gcm 21686.41k 23197.59k 23656.30k 23725.04k 23926.10k 23916.23k > >> hmo@p48 ~ $ > >> > >> Kind regards, > >> Helge > > > > > > So based on results from Manu, and Mark Millard, it seems almost every > ARM platform is faster when it takes advantage of the CPU features, > except the RPi4(B). > > As Manu pointed out, it doesn't appear to have the AES+PMULL feature, > which means it must be something else that is slowing it down. > > What might help, is to try each feature in turn, and figure out which > one is causing slower results. > > #define HWCAP_FP 0x00000001 > #define HWCAP_ASIMD 0x00000002 > #define HWCAP_EVTSTRM 0x00000004 > #define HWCAP_AES 0x00000008 > #define HWCAP_PMULL 0x00000010 > #define HWCAP_SHA1 0x00000020 > #define HWCAP_SHA2 0x00000040 > #define HWCAP_CRC32 0x00000080 > > So try: > env OPENSSL_armcap=1 openssl speed -evp aes-256-gcm > as well as with armcap=2, 3 (both FP and ASIMD), 8 (just AES) etc. hmo@p48 ~ $ for f in 0 1 2 3 8 16 32 64 128 ; do echo -n $f:; env OPENSSL_armcap=$f openssl speed -evp aes-256-gcm 2>&1 | tail -1 | cut -wf7; done 0:42295.15k 1:23891.19k 2:42208.57k 3:23970.56k 8:42354.98k 16:42199.06k 32:size Illegal instruction (core dumped) 64:42322.42k 128:42275.00k hmo@p48 ~ $ So I guess HWCAP_FP is the culprit? Maybe related to hard/soft floating point math which indeed is kind of special on the Pi? > For ones where the CPU lacks the feature, it will crash with 'Illegal > instruction' > > Separately, it might also be interesting to see the results of `openssl > speed -evp sha256` before/after/with the different OPENSSL_armcap values Please let me know in case you still require this. Kind regards Helge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202111250738.1AP7cuCu042555>