Date: Wed, 24 Nov 2021 01:51:52 -0800 From: Mark Millard via arm <arm@freebsd.org> To: allanjude@freebsd.org, "freebsd-arm@freebsd.org" <arm@freebsd.org> Subject: Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features Message-ID: <0CEA37B8-CE7F-4BAE-92B7-E71C5FD1BC22@yahoo.com> References: <0CEA37B8-CE7F-4BAE-92B7-E71C5FD1BC22.ref@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[Actually, the main [so: 14] equivalent.] All Cortex-A72 based . . . First, older system versions (before that update) then after the update: RPi4B 8 GiByte (older FreeBSD first, otherwise new), Cortex-A72's: # openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 51925.92k 58449.46k 60430.32k 61050.13k = 61180.98k 61482.75k type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 28880.07k 30837.33k 31630.29k 31855.62k = 31921.54k 32034.53k So: slowed down, unlike the other examples below. # env OPENSSL_armcap=3D0 openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 51894.33k 58540.45k 60815.22k 61534.47k = 61906.84k 62042.10k So: back to the prior speed. But all these are based on config.txt containing: over_voltage=3D6=20 arm_freq=3D2000=20 sdram_freq_min=3D3200=20 force_turbo=3D1 (The RPi4B has a heat-sink and a fan.) Note: See later about the RPi4B CPU features. MACCHIATObin Double Shot (older first), Cortex-A72's: # openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 50808.49k 58466.08k 60769.11k 61444.92k = 61767.94k 61707.61k type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 163579.14k 456319.27k 786544.01k 940234.41k = 1003230.55k 1005671.31k HoneyComb (older first), Cortex-A782's: # openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 57659.60k 64599.05k 67719.81k 68373.74k = 68724.24k 68793.80k type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 177925.57k 502311.65k 866287.95k 1036500.35k = 1106598.06k 1106721.91k Rock64 (older first), Cortex-A53's: # openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 18378.23k 23401.45k 24834.99k 25206.10k = 25337.86k 25258.19k type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 52711.29k 163586.49k 318738.69k 420277.93k = 461373.44k 463192.06k OPi+2E (older first), Cortex-A7's (so armv7): # openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 9343.10k 11156.39k 11827.64k 11995.30k = 12025.86k 12031.32k type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 11013.41k 13598.44k 14034.26k 15045.97k = 15262.90k 15302.66k For reference: For the RPi4B examples (2 notes added): CPU 0: ARM Cortex-A72 r0p3 affinity: 0 Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> Instruction Set Attributes 0 =3D <CRC32> *** NOTE the lack of ",SHA2,SHA1,AES+PMULL" above *** Instruction Set Attributes 1 =3D <> Processor Features 0 =3D <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 = 32> Processor Features 1 =3D <> Memory Model Features 0 =3D <TGran4,TGran64,SNSMem,BigEnd,16bit = ASID,16TB PA> Memory Model Features 1 =3D <8bit VMID> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> Debug Features 0 =3D <DoubleLock,2 CTX BKPTs,4 = Watchpoints,6 Breakpoints,PMUv3,Debugv8> Debug Features 1 =3D <> Auxiliary Features 0 =3D <> Auxiliary Features 1 =3D <> AArch32 Instruction Set Attributes 5 =3D <CRC32,SEVL> *** NOTE the lack of ",SHA2,SHA1,AES+VMULL" above *** AArch32 Media and VFP Features 0 =3D <FPRound,FPSqrt,FPDivide,DP = VFPv3+v4,SP VFPv3+v4,AdvSIMD> AArch32 Media and VFP Features 1 =3D <SIMDFMAC,FPHP DP Conv,SIMDHP SP = Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> For the MACCHIATObin Double Shot examples: CPU 0: ARM Cortex-A72 r0p1 affinity: 0 0 Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> Instruction Set Attributes 0 =3D <CRC32,SHA2,SHA1,AES+PMULL> Instruction Set Attributes 1 =3D <> Processor Features 0 =3D <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 = 32> Processor Features 1 =3D <> Memory Model Features 0 =3D <TGran4,TGran64,SNSMem,BigEnd,16bit = ASID,16TB PA> Memory Model Features 1 =3D <8bit VMID> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> Debug Features 0 =3D <DoubleLock,2 CTX BKPTs,4 = Watchpoints,6 Breakpoints,PMUv3,Debugv8> Debug Features 1 =3D <> Auxiliary Features 0 =3D <> Auxiliary Features 1 =3D <> AArch32 Instruction Set Attributes 5 =3D = <CRC32,SHA2,SHA1,AES+VMULL,SEVL> AArch32 Media and VFP Features 0 =3D <FPRound,FPSqrt,FPDivide,DP = VFPv3+v4,SP VFPv3+v4,AdvSIMD> AArch32 Media and VFP Features 1 =3D <SIMDFMAC,FPHP DP Conv,SIMDHP SP = Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> For the HoneyComb examples: CPU 0: ARM Cortex-A72 r0p3 affinity: 0 0 Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> Instruction Set Attributes 0 =3D <CRC32,SHA2,SHA1,AES+PMULL> Instruction Set Attributes 1 =3D <> Processor Features 0 =3D <GIC,AdvSIMD,FP,EL3 32,EL2 32,EL1 = 32,EL0 32> Processor Features 1 =3D <> Memory Model Features 0 =3D <TGran4,TGran64,SNSMem,BigEnd,16bit = ASID,16TB PA> Memory Model Features 1 =3D <8bit VMID> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> Debug Features 0 =3D <DoubleLock,2 CTX BKPTs,4 = Watchpoints,6 Breakpoints,PMUv3,Debugv8> Debug Features 1 =3D <> Auxiliary Features 0 =3D <> Auxiliary Features 1 =3D <> AArch32 Instruction Set Attributes 5 =3D = <CRC32,SHA2,SHA1,AES+VMULL,SEVL> AArch32 Media and VFP Features 0 =3D <FPRound,FPSqrt,FPDivide,DP = VFPv3+v4,SP VFPv3+v4,AdvSIMD> AArch32 Media and VFP Features 1 =3D <SIMDFMAC,FPHP DP Conv,SIMDHP SP = Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> For the Rock64 examples: CPU 0: ARM Cortex-A53 r0p4 affinity: 0 Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG> Instruction Set Attributes 0 =3D <CRC32,SHA2,SHA1,AES+PMULL> Instruction Set Attributes 1 =3D <> Processor Features 0 =3D <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 = 32> Processor Features 1 =3D <> Memory Model Features 0 =3D <TGran4,TGran64,SNSMem,BigEnd,16bit = ASID,1TB PA> Memory Model Features 1 =3D <8bit VMID> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> Debug Features 0 =3D <DoubleLock,2 CTX BKPTs,4 = Watchpoints,6 Breakpoints,PMUv3,Debugv8> Debug Features 1 =3D <> Auxiliary Features 0 =3D <> Auxiliary Features 1 =3D <> AArch32 Instruction Set Attributes 5 =3D = <CRC32,SHA2,SHA1,AES+VMULL,SEVL> AArch32 Media and VFP Features 0 =3D <FPRound,FPSqrt,FPDivide,DP = VFPv3+v4,SP VFPv3+v4,AdvSIMD> AArch32 Media and VFP Features 1 =3D <SIMDFMAC,FPHP DP Conv,SIMDHP SP = Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> C For the OPi+2E examples: CPU: ARM Cortex-A7 r0p5 (ECO: 0x00000000) CPU Features:=20 Multiprocessing, Thumb2, Security, Virtualization, Generic Timer, = VMSAv7, PXN, LPAE, Coherent Walk Optional instructions:=20 SDIV/UDIV, UMULL, SMULL, SIMD(ext) LoUU:2 LoC:3 LoUIS:2=20 Cache level 1: 32KB/64B 4-way data cache WB Read-Alloc Write-Alloc 32KB/32B 2-way instruction cache Read-Alloc Cache level 2: 512KB/64B 8-way unified cache WB Read-Alloc Write-Alloc =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0CEA37B8-CE7F-4BAE-92B7-E71C5FD1BC22>