Date: Wed, 22 Dec 2021 14:42:48 +0200 From: Andriy Gapon <avg@FreeBSD.org> To: FreeBSD Current <freebsd-current@FreeBSD.org> Subject: observations on Ryzen 5xxx (Zen 3) processors Message-ID: <cc1dd541-81fc-b56a-81ca-da76d20a095b@FreeBSD.org>
next in thread | raw e-mail | index | archive | help
There have been some reports on strange / unexpected things with Ryzen 5xxx processors. I think I have seen 5950X, 5900X and 5800X mentioned, not sure about others. Since I have 5800X myself I looked into a couple of issues that have straightforward demonstrators. I would like to share my findings and observations on those issues. Issue 1. High wake-up latency for CPU idle states. This seems to be related to the so called CC6 idle state. The official information on it is very sparse. The state is not explicitly exposed to the OS, at least, though ACPI interfaces that FreeBSD currently supports. In my tests I see that if all logical processors enter an idle state then an external interrupt can be delayed by 500+ us. Specifically, I observed this with an MSI-X interrupt from a discrete network chip. Interrupts from internal components seem to be affected as well, but to a lesser degree. The deep state in question can be entered regardless of whether C2 (via I/O) is enabled, C1 (via hlt) is sufficient. In fact, with machdep.idle=hlt it works the same. The state is not entered if at least one logical CPU is not idle. The state is not entered if machdep.idle=mwait is used. Apparently, the processors do not attempt to automatically enter as deep idle modes with mwait as they do with hlt. Finally, the state is not entered if zenstates.py utility is used to disable C6 / CC6 state via an undocumented (publicly) MSR. For me personally that state does not cause any annoyances but anyone who experiences problems related to "stuttering", "jitter", latency might want to look into this. Issue 2. Uneven performance of CPU intensive tasks, especially with SCHED_ULE, when SMT is enabled. I found out that at least on my hardware all even numbered logical CPUs can perform much better than odd numbered logical CPUs. It seems that hardware threads within a core are not equal. Maybe this is related to ability to use boosted frequencies, but maybe something else, I am not sure. From a brief look at the ULE code it looks that the selection of a hw thread within a core is intentionally random when all other things are equal. I suspect that the hardware + firmware may actually describe that performance disparity via ACPI CPPC (_CPC object, etc), but right now we do not support querying that or making use of it. It would interesting to see if other owners of similar processors can confirm or provide counter-examples to my observations. Simple tests for issue 1: - ping a host attached to the same switch (so, with very low expected latency) - ping 127.0.0.1 For issue 2: take some CPU intensive single-threaded task and bind it (with cpuset -l) to different logical CPUs. Multiple such tasks can be run concurrently on different logical CPUs. References: - https://forums.freebsd.org/threads/variable-ping-latency-on-ryzen-setup.82791/ - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256594 - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254040 - https://github.com/r4m0n/ZenStates-Linux - https://github.com/meowthink/ZenStates-FreeBSD -- has a bug - https://github.com/avg-I/ZenStates-FreeBSD -- has a fix - https://www.kernel.org/doc/html/latest/admin-guide/acpi/cppc_sysfs.html - https://static.linaro.org/connect/lvc21/presentations/lvc21-219.pdf - https://uefi.org/specs/ACPI/6.4/14_Platform_Communications_Channel/Platform_Comm_Channel.html -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?cc1dd541-81fc-b56a-81ca-da76d20a095b>