Date: Tue, 23 Jul 2019 13:45:40 +0200 From: Dennis Noordsij <dennis.noordsij@alumni.helsinki.fi> To: freebsd-virtualization@freebsd.org Subject: Timecounter problem as a guest Message-ID: <641a92ec-81ad-2e26-a3b8-3a7238736757@iletsel.nl>
next in thread | raw e-mail | index | archive | help
Hi, Recently I moved some FreeBSD systems to VPS provider TransIP, who use Linux-KVM based virtualization. Initial performance was surprisingly bad, and the CPU graphs very were spikey with as much system time being spent as user time. Via PostgreSQL I ended up trying out pg_test_timing which reported the following for the default timecounter (HPET): (note choices are: kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) TSC-low(-100) dummy(-1000000)) Testing timing overhead for 3 seconds. Per loop time including overhead: 6481.08 ns Histogram of timing durations: < us % of total count 1 0.00000 0 2 0.00000 0 4 0.00000 0 8 88.79165 411005 16 9.53451 44134 32 1.03848 4807 64 0.49796 2305 128 0.10370 480 256 0.02981 138 512 0.00259 12 1024 0.00086 4 2048 0.00022 1 4096 0.00000 0 8192 0.00000 0 16384 0.00022 1 With the other timecounter choices of i8245 and ACPI-fast the result look like the above, no results under 4us. Only with TSC-low does it look like: Testing timing overhead for 3 seconds. Per loop time including overhead: 41.22 ns Histogram of timing durations: < us % of total count 1 95.97088 69846421 2 4.02214 2927264 4 0.00136 988 8 0.00288 2096 16 0.00132 958 32 0.00074 542 64 0.00047 345 128 0.00016 114 256 0.00004 29 512 0.00000 3 1024 0.00000 2 2048 0.00000 3 and indeed the CPU graphs cleaned up completely with much lower CPU averages and no excessive system CPU time after switching to TSC-low. Webserver and database response times dropped as well (at least according to their own reporting). To rule out this being just a symptom of timekeeping: the providers own CPU graphs (so from the outside of the VPS as a whole) also show this VPS to consume roughly half the CPU it does with TSC-low compared to the other options, and you can tell the difference right away when changing the kern.timecounter.hardware sysctl. The main problem however is that the system clock now keeps time atrociously badly. Chrony with the most aggressive settings barely manages to keep the time and the CPU graphs now show regular gaps where the system time jumped because of a correction. It looks very sloppy to the users if the recorded times of their actions/files are not correct. This is all on a 6 core system with lots of threads and churn and short lived apps coming and going. A 4-core database system, with a stable number of threads and processes, running in the same virtualization environment, doesn't really have either of these problems, that is, CPU usage wasn't that spikey or system CPU usage that high even with HPET, and the time doesn't drift as much either with TSC-low. I figured this is a virtualization question as these kinds of symptoms are probably generic. What is the host doing? Additional information from within the guest: hw.machine: amd64 hw.model: Westmere E56xx/L56xx/X56xx (Nehalem-C) hw.ncpu: 6 hw.hv_vendor: KVMKVMKVM hw.clockrate: 2593 (has 24GB memory) (They do perform live migrations so I don't know what the real underlying hardware is but probably similar, it's pretty stale at this point) I wonder if anyone could talk a bit about what might be going on. Thank you, Dennis
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?641a92ec-81ad-2e26-a3b8-3a7238736757>