Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Nov 2021 08:38:56 +0100 (CET)
From:      freebsd@oldach.net (Helge Oldach)
To:        allanjude@freebsd.org (Allan Jude)
Cc:        manu@bidouilliste.com, src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org
Subject:   Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features
Message-ID:  <202111250738.1AP7cuCu042555@nuc.oldach.net>
In-Reply-To: <8664c9a1-f000-d07f-5f48-3b18d3e5f629@freebsd.org> from Allan Jude at "24 Nov 2021 13:02:47"

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

Allan Jude wrote on Wed, 24 Nov 2021 19:02:47 +0100 (CET):
> On 11/24/2021 3:30 AM, Emmanuel Vadot wrote:
> > On Tue, 23 Nov 2021 20:36:40 +0100 (CET)
> > freebsd@oldach.net (Helge Oldach) wrote:
> > 
> >> Allan Jude wrote on Tue, 23 Nov 2021 20:14:53 +0100 (CET):
> >>> On 11/23/2021 5:00 AM, Helge Oldach wrote:
> >>>> Allan Jude wrote on Mon, 22 Nov 2021 19:14:13 +0100 (CET):
> >>>> Hmmm. On a RPi4/8G:
> >>>>
> >>>> Before (FreeBSD 13.0-STABLE (GENERIC) #366 stable/13-n248173-d16fbc488e6):
> >>>> | type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
> >>>> | aes-256-gcm      35791.98k    38533.57k    39986.77k    41397.59k    39840.43k    39638.36k
> >>>>
> >>>> After (FreeBSD 13.0-STABLE (GENERIC) #367 stable/13-n248176-f085bb0e621)
> >>>>
> >>>> | type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
> >>>> | aes-256-gcm      21277.62k    23226.64k    23613.90k    23687.51k    23892.93k    23947.95k
> >>>>
> >>>> It seems that AES throughput is actually cut by almost half?
> >>>
> >>> Do you know which of the CPU optimizations your RPi4 supports?
> >>
> >> Is this what you need?
> >>
> >>   Instruction Set Attributes 0 = <CRC32>
> > 
> >   So there is no AES+PMULL instruction set on RPI4, I guess that openssl
> > uses them for aes-gcm.
> > 
> >   I wonder what it uses before that make it have this boost.
> > 
> >   On my rockpro64 I do see the improvement btw :
> > root@generic:~ # cpuset -l 4,5 openssl speed -evp aes-256-gcm
> > ...
> > aes-256-gcm     122861.59k   337938.39k   565408.44k   661223.09k   709175.19k   712327.25k
> > root@generic:~ # cpuset -l 4,5 env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm
> > ...
> > aes-256-gcm      34068.11k    38068.62k    39435.24k    39818.75k    39905.34k    39922.35k
> > 
> >   Running on the big cores at max freq.
> > 
> >>   Instruction Set Attributes 1 = <>
> >>           Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32>
> >>           Processor Features 1 = <>
> >>        Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,16TB PA>
> >>        Memory Model Features 1 = <8bit VMID>
> >>        Memory Model Features 2 = <32bit CCIDX,48bit VA>
> >>               Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8>
> >>               Debug Features 1 = <>
> >>           Auxiliary Features 0 = <>
> >>           Auxiliary Features 1 = <>
> >> AArch32 Instruction Set Attributes 5 = <CRC32,SEVL>
> >> AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD>
> >> AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ>
> >>
> >>> You can set the environment variable OPENSSL_armcap to override
> >>> OpenSSL's detection.
> >>>
> >>> Try: env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm
> >>
> >> On FreeBSD 13.0-STABLE (GENERIC) #367 stable/13-n248176-f085bb0e621 again (i.e. after this commit):
> >>
> >> hmo@p48 ~ $ env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm
> >> Doing aes-256-gcm for 3s on 16 size blocks: 6445704 aes-256-gcm's in 3.08s
> >> Doing aes-256-gcm for 3s on 64 size blocks: 1861149 aes-256-gcm's in 3.00s
> >> Doing aes-256-gcm for 3s on 256 size blocks: 479664 aes-256-gcm's in 3.01s
> >> Doing aes-256-gcm for 3s on 1024 size blocks: 122853 aes-256-gcm's in 3.04s
> >> Doing aes-256-gcm for 3s on 8192 size blocks: 15181 aes-256-gcm's in 3.00s
> >> Doing aes-256-gcm for 3s on 16384 size blocks: 7796 aes-256-gcm's in 3.07s
> >> OpenSSL 1.1.1l-freebsd  24 Aug 2021
> >> built on: reproducible build, date unspecified
> >> options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr)
> >> compiler: clang
> >> The 'numbers' are in 1000s of bytes per second processed.
> >> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
> >> aes-256-gcm      33504.57k    39704.51k    40825.01k    41394.83k    41454.25k    41601.52k
> >> hmo@p48 ~ $ openssl speed -evp aes-256-gcm
> >> Doing aes-256-gcm for 3s on 16 size blocks: 4066201 aes-256-gcm's in 3.00s
> >> Doing aes-256-gcm for 3s on 64 size blocks: 1087387 aes-256-gcm's in 3.00s
> >> Doing aes-256-gcm for 3s on 256 size blocks: 280110 aes-256-gcm's in 3.03s
> >> Doing aes-256-gcm for 3s on 1024 size blocks: 70412 aes-256-gcm's in 3.04s
> >> Doing aes-256-gcm for 3s on 8192 size blocks: 8762 aes-256-gcm's in 3.00s
> >> Doing aes-256-gcm for 3s on 16384 size blocks: 4402 aes-256-gcm's in 3.02s
> >> OpenSSL 1.1.1l-freebsd  24 Aug 2021
> >> built on: reproducible build, date unspecified
> >> options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr)
> >> compiler: clang
> >> The 'numbers' are in 1000s of bytes per second processed.
> >> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
> >> aes-256-gcm      21686.41k    23197.59k    23656.30k    23725.04k    23926.10k    23916.23k
> >> hmo@p48 ~ $
> >>
> >> Kind regards,
> >> Helge
> > 
> > 
> 
> So based on results from Manu, and Mark Millard, it seems almost every 
> ARM platform is faster when it takes advantage of the CPU features, 
> except the RPi4(B).
> 
> As Manu pointed out, it doesn't appear to have the AES+PMULL feature, 
> which means it must be something else that is slowing it down.
> 
> What might help, is to try each feature in turn, and figure out which 
> one is causing slower results.
> 
> #define HWCAP_FP                0x00000001
> #define HWCAP_ASIMD             0x00000002
> #define HWCAP_EVTSTRM           0x00000004
> #define HWCAP_AES               0x00000008
> #define HWCAP_PMULL             0x00000010
> #define HWCAP_SHA1              0x00000020
> #define HWCAP_SHA2              0x00000040
> #define HWCAP_CRC32             0x00000080
> 
> So try:
> env OPENSSL_armcap=1 openssl speed -evp aes-256-gcm
> as well as with armcap=2, 3 (both FP and ASIMD), 8 (just AES) etc.

hmo@p48 ~ $ for f in 0 1 2 3 8 16 32 64 128 ; do echo -n $f:; env OPENSSL_armcap=$f openssl speed -evp aes-256-gcm 2>&1 | tail -1 | cut -wf7; done
0:42295.15k
1:23891.19k
2:42208.57k
3:23970.56k
8:42354.98k
16:42199.06k
32:size
Illegal instruction (core dumped)
64:42322.42k
128:42275.00k
hmo@p48 ~ $

So I guess HWCAP_FP is the culprit? Maybe related to hard/soft floating
point math which indeed is kind of special on the Pi?

> For ones where the CPU lacks the feature, it will crash with 'Illegal 
> instruction'
> 
> Separately, it might also be interesting to see the results of `openssl 
> speed -evp sha256` before/after/with the different OPENSSL_armcap values

Please let me know in case you still require this.

Kind regards
Helge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202111250738.1AP7cuCu042555>