Date: Fri, 05 Mar 2021 15:45:51 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 254040] AMD 5950X hyperthreading strange performance swings Message-ID: <bug-254040-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254040 Bug ID: 254040 Summary: AMD 5950X hyperthreading strange performance swings Product: Base System Version: 12.2-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: dennis.noordsij@alumni.helsinki.fi I plan to upgrade our server to a Ryzen 9 5950X system, 16 cores, 3400MHz base frequency, 128GB RAM, and ran into an issue while testing. For reference I will use a very simple command: dd if=/dev/zero bs=1M count=1000 | bzip2 - | wc which, using a Linux rescue system (i.e. nothing else running) consistently and repeatedly completes in about: 1048576000 bytes (1.0 GB, 1000 MiB) copied, 4.67561 s, 224 MB/s Now for FreeBSD 12.2-RELEASE, on a basic boot not running anything except ssh: if I _disable_ hyperthreading using machdep.hyperthreading_allowed=0, I get the following approximate result consistently and repeatedly: dd if=/dev/zero bs=1M count=1000 | bzip2 - | wc 1048576000 bytes transferred in 4.874335 secs (215121876 bytes/sec) Slightly slower than Linux, not sure if this is in how bzip2 is compiled etc, but nothing that worries me. However, if I _enable_ hyperthreading, i.e. the default I started with, then I will get: dd if=/dev/zero bs=1M count=1000 | bzip2 - | wc 1048576000 bytes transferred in 4.887522 secs (214541450 bytes/sec) 1048576000 bytes transferred in 7.507138 secs (139677190 bytes/sec) 1048576000 bytes transferred in 6.227179 secs (168386989 bytes/sec) 1048576000 bytes transferred in 7.590263 secs (138147516 bytes/sec) 1048576000 bytes transferred in 7.421037 secs (141297776 bytes/sec) 1048576000 bytes transferred in 4.922986 secs (212995935 bytes/sec) 1048576000 bytes transferred in 4.945138 secs (212041827 bytes/sec) 1048576000 bytes transferred in 7.671600 secs (136682828 bytes/sec) 1048576000 bytes transferred in 7.673428 secs (136650273 bytes/sec) i.e. very consistently varying results with relatively large differences in commands executed immediately after one another (and no other load whatsoever). I'm curious why this is happening. I am not running powerd or touched any of the cpu settings. Booting _without_ hyperthreading: CPU: AMD Ryzen 9 5950X 16-Core Processor (3393.70-MHz K8-class CPU) Origin="AuthenticAMD" Id=0xa20f10 Family=0x19 Model=0x21 Stepping=0 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> AMD Features2=0x75c237ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX,<b30>> Structured Extended Features=0x219c97a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA> Structured Extended Features2=0x40068c<UMIP,PKU,VAES,VPCLMULQDQ,RDPID> Structured Extended Features3=0x10 XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> AMD Extended Feature Extensions ID EBX=0x111ef657<CLZERO,IRPerf,XSaveErPtr> SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 TSC: P-state invariant, performance statistics real memory = 137434759168 (131068 MB) avail memory = 133793423360 (127595 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <ALASKA A M I > FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 1 package(s) x 2 cache groups x 8 core(s) x 2 hardware threads FreeBSD/SMP Online: 1 package(s) x 2 cache groups x 8 core(s) # sysctl dev.cpu.0 dev.cpu.0.cx_method: C1/hlt C2/io dev.cpu.0.cx_usage_counters: 11922 0 dev.cpu.0.cx_usage: 100.00% 0.00% last 43430us dev.cpu.0.cx_lowest: C1 dev.cpu.0.cx_supported: C1/1/1 C2/2/18 dev.cpu.0.freq_levels: 3400/3740 2800/2800 2200/1980 dev.cpu.0.freq: 3400 dev.cpu.0.%parent: acpi0 dev.cpu.0.%pnpinfo: _HID=ACPI0007 _UID=0 dev.cpu.0.%location: handle=\_SB_.PLTF.C000 dev.cpu.0.%driver: cpu dev.cpu.0.%desc: ACPI CPU Booting _with_ hyperthreading: CPU: AMD Ryzen 9 5950X 16-Core Processor (3393.69-MHz K8-class CPU) Origin="AuthenticAMD" Id=0xa20f10 Family=0x19 Model=0x21 Stepping=0 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> AMD Features2=0x75c237ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX,<b30>> Structured Extended Features=0x219c97a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA> Structured Extended Features2=0x40068c<UMIP,PKU,VAES,VPCLMULQDQ,RDPID> Structured Extended Features3=0x10 XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> AMD Extended Feature Extensions ID EBX=0x111ef657<CLZERO,IRPerf,XSaveErPtr> SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 TSC: P-state invariant, performance statistics real memory = 137434759168 (131068 MB) avail memory = 133793423360 (127595 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <ALASKA A M I > FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs FreeBSD/SMP: 1 package(s) x 2 cache groups x 8 core(s) x 2 hardware threads # sysctl dev.cpu.0 dev.cpu.0.cx_method: C1/hlt C2/io dev.cpu.0.cx_usage_counters: 3232 0 dev.cpu.0.cx_usage: 100.00% 0.00% last 65311us dev.cpu.0.cx_lowest: C1 dev.cpu.0.cx_supported: C1/1/1 C2/2/18 dev.cpu.0.freq_levels: 3400/3740 2800/2800 2200/1980 dev.cpu.0.freq: 3400 dev.cpu.0.%parent: acpi0 dev.cpu.0.%pnpinfo: _HID=ACPI0007 _UID=0 dev.cpu.0.%location: handle=\_SB_.PLTF.C000 dev.cpu.0.%driver: cpu dev.cpu.0.%desc: ACPI CPU zenstates.py reports: # ./zenstates.py -l P0 - Enabled - FID = 88 - DID = 8 - VID = 48 - IDD = 22( / 1 ) - Ratio = 34.00 - vCore = 1.10000 P1 - Enabled - FID = 8C - DID = A - VID = 58 - IDD = 1C( / 1 ) - Ratio = 28.00 - vCore = 1.00000 P2 - Enabled - FID = 84 - DID = C - VID = 68 - IDD = 16( / 1 ) - Ratio = 22.00 - vCore = 0.90000 P3 - Disabled P4 - Disabled P5 - Disabled P6 - Disabled P7 - Disabled Core Performance Boost - Enabled C6 State - Package - Disabled C6 State - Core - Enabled FWIW if I disable core performance boost the varying execution times shift from ~4.8 and ~7.8 to to ~6.5 and ~10.8 seconds respectively, i.e. same behaviour just slower. I was hoping someone could explain why this is happening (note it doesn't happen on Linux), if it is expected, and/or how it can be worked around or fixed, or where the problem would be (p-states?). Happy to test anything. PS - I tried a 13-BETA2 rescue boot which has HT enabled and it behaves exactly the same. -- You are receiving this mail because: You are the assignee for the bug.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-254040-227>
