Date: Fri, 5 Apr 2019 15:38:02 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Mark Millard <marklmi@yahoo.com> Cc: Bruce Evans <brde@optusnet.com.au>, Konstantin Belousov <kostikbel@gmail.com>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190405150236.A959@besplex.bde.org> In-Reply-To: <F22CCA2C-08BB-452E-B00C-A36CD4611540@yahoo.com> References: <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> <20190324110138.GR1923@kib.kiev.ua> <E0785613-2B6E-4BB3-95CD-03DD96902CD8@fh-muenster.de> <20190403070045.GW1923@kib.kiev.ua> <20190404011802.E2390@besplex.bde.org> <F22CCA2C-08BB-452E-B00C-A36CD4611540@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 4 Apr 2019, Mark Millard wrote: > On 2019-Apr-3, at 08:47, Bruce Evans <brde at optusnet.com.au> wrote: >> . . . >> >> I noticed (or better realized) a general problem with multiple >> timehands. ntpd can slew the clock at up to 500 ppm, and at least an >> old version of it uses a rate of 50 ppm to fix up fairly small drifts >> in the milliseconds range. 500 ppm is enormous in CPU cycles -- it is >> 500 thousand nsec or 2 million cycles at 4GHz. Winding up the timecounter >> every 1 msec reduces this to only 2000 cycles. >> >> More details of ordering and timing for 1 thread: >> ... > Thanks for the description of an example way that sbinuptime and > the like might not give weakly increasing results. > > Unfortunately, all the multi-socket contexts that I sometimes have > access to are old PowerMacs. And, currently, the only such context > is the G5 with 2 sockets, 2 cores per socket (powerpc64). So I've > not been able to set up other types of examples to see if problems > repeat. > > I do not have access to a single-socket powerpc64 for contrast in > that direction. Testing 1 socket is time-consuming enough. Do these old systems use the equivalent of an x86 TSC for the timecounter? With multiple sockets, it isn't clear how even a hardware timer independent of the CPUs can be distributed so as to appear to be monotonic on all cors. > One oddity is that the eventtimer's decrementer and timecounter > may change (nearly) together: both change at 33,333,333 Hz, as if > they are tied to the same clock (at least on one socket). I think this is from a normal hardware implementation. On all of my x86 systems with a TSC, the TSC frequency is an exact fractional multiple of the i8254, the ACPI timer (if present) and the HPET (if present). Only the RTC has an independent frequency. The fraction is changed by changing the nominal TSC frequency in the BIOS, but is not changed by temperature variations. This must be because most clocks are derived from a common clock using a PLL. I use this to calibrate all clocks (except the RTC) by calibrating only 1. > In case it helps with knowing how analogous your investigations > are to the original problem context, I report the following. > > If you do not care for such information, stop reading here. > > # grep ntpd /etc/rc.conf > ntpd_enable="YES" > ntpd_sync_on_start="YES" > > # sysctl kern.eventtimer > kern.eventtimer.periodic: 0 > kern.eventtimer.timer: decrementer > kern.eventtimer.idletick: 0 > kern.eventtimer.singlemul: 2 > kern.eventtimer.choice: decrementer(1000) > kern.eventtimer.et.decrementer.quality: 1000 > kern.eventtimer.et.decrementer.frequency: 33333333 > kern.eventtimer.et.decrementer.flags: 7 > > # vmstat -ai | grep decrementer > cpu0:decrementer 4451007 35 > cpu3:decrementer 1466010 11 > cpu2:decrementer 1481722 12 > cpu1:decrementer 1478618 12 Powerpc seems to have a PLL in software too. Event timers don't need to be very precise or accurate. > (That last is from a basically-idle timeframe.) > > # sysctl -a | grep hz > kern.clockrate: { hz = 1000, tick = 1000, profhz = 8128, stathz = 127 } > kern.hz: 1000 x86 is similar. I think synchronization from using PLLs still gives unfair scheduling, but with multiple CPUs and often more cycles than can be used, no one cares about accidental synchronization or bothers to steal cycles using intentional synchronization. > # sysctl kern.timecounter > kern.timecounter.fast_gettime: 1 > kern.timecounter.tick: 1 > kern.timecounter.choice: timebase(0) dummy(-1000000) > kern.timecounter.hardware: timebase > kern.timecounter.alloweddeviation: 5 > kern.timecounter.stepwarnings: 0 > kern.timecounter.tc.timebase.quality: 0 > kern.timecounter.tc.timebase.frequency: 33333333 > kern.timecounter.tc.timebase.counter: 1144662532 > kern.timecounter.tc.timebase.mask: 4294967295 > > (The actual Time Base Register (tbr) i s 64 bits > and freebsd truncates it down.) > > # sysctl -a | grep 'cpu.*freq' > device cpufreq > debug.cpufreq.verbose: 0 > debug.cpufreq.lowest: 0 > dev.cpufreq.0.%parent: cpu3 > dev.cpufreq.0.%pnpinfo: > dev.cpufreq.0.%location: > dev.cpufreq.0.%driver: cpufreq > dev.cpufreq.0.%desc: > dev.cpufreq.%parent: > dev.cpu.3.freq_levels: 2500/-1 1250/-1 > dev.cpu.3.freq: 2500 > > So 2500 MHz / 33333333 Hz is very near 75 clock periods per > timebase counter value. Looks like it is exactly 75. Fractions are especially easy to guess and verify when they are integral. > I do sometimes have access to a Ryzen Threadripper 1950X based system: > > FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs > FreeBSD/SMP: 1 package(s) x 2 groups x 2 cache groups x 4 core(s) x 2 hardware threads > > but it is single=socket. It has . . . > ... Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190405150236.A959>