Date: Fri, 5 Apr 2019 01:40:07 -0700 From: Mark Millard <marklmi@yahoo.com> To: Bruce Evans <brde@optusnet.com.au> Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <F3BBD355-5198-41A0-A461-8E5E12984BE7@yahoo.com> In-Reply-To: <20190405150236.A959@besplex.bde.org> References: <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> <20190324110138.GR1923@kib.kiev.ua> <E0785613-2B6E-4BB3-95CD-03DD96902CD8@fh-muenster.de> <20190403070045.GW1923@kib.kiev.ua> <20190404011802.E2390@besplex.bde.org> <F22CCA2C-08BB-452E-B00C-A36CD4611540@yahoo.com> <20190405150236.A959@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-Apr-4, at 21:38, Bruce Evans <brde at optusnet.com.au> wrote: > On Thu, 4 Apr 2019, Mark Millard wrote: >=20 >> On 2019-Apr-3, at 08:47, Bruce Evans <brde at optusnet.com.au> wrote: >>> . . . >>>=20 >>> I noticed (or better realized) a general problem with multiple >>> timehands. ntpd can slew the clock at up to 500 ppm, and at least = an >>> old version of it uses a rate of 50 ppm to fix up fairly small = drifts >>> in the milliseconds range. 500 ppm is enormous in CPU cycles -- it = is >>> 500 thousand nsec or 2 million cycles at 4GHz. Winding up the = timecounter >>> every 1 msec reduces this to only 2000 cycles. >>>=20 >>> More details of ordering and timing for 1 thread: >>> ... >> Thanks for the description of an example way that sbinuptime and >> the like might not give weakly increasing results. >>=20 >> Unfortunately, all the multi-socket contexts that I sometimes have >> access to are old PowerMacs. And, currently, the only such context >> is the G5 with 2 sockets, 2 cores per socket (powerpc64). So I've >> not been able to set up other types of examples to see if problems >> repeat. >>=20 >> I do not have access to a single-socket powerpc64 for contrast in >> that direction. >=20 > Testing 1 socket is time-consuming enough. Do these old systems > use the equivalent of an x86 TSC for the timecounter? With multiple > sockets, it isn't clear how even a hardware timer independent of the > CPUs can be distributed so as to appear to be monotonic on all cors. "The DEC frequency is based on the same implementation-dependent frequency that drives the time base." The frequency may well be fixed by the PowerMac G5 model implementation but is not fixed by the powerpc64 architecture. The Time Base Register (TBR) in a powerpc64 core (cpu in FreeBSD terms) increments at 33,333,333 Hz (nominal) and is 64 bits wide. Its value can be set (mttb instruction) and the boot sequence in FreeBSD does attempt to adjust as the FreeBSD CPU is brought-up/started. mftb is used to read the 64-bit value. FreeBSD masks it down to 32-bits to contribute to the time-counter. (Is that description sufficient for what you were after? I've never seen documentation of how the 33,333,333 MHz is produced.) As FreeBSD supports multi-socket, what are its criteria for a sufficient context for it to work with for supporting sbinuptime and the like? Is FreeBSD supposed to then make it appear that sbinputime and the like are weakly increasing, even as threads migrate between CPUs (cores, hw-threads)? >> One oddity is that the eventtimer's decrementer and timecounter >> may change (nearly) together: both change at 33,333,333 Hz, as if >> they are tied to the same clock (at least on one socket). >=20 > I think this is from a normal hardware implementation. On all of > my x86 systems with a TSC, the TSC frequency is an exact fractional > multiple of the i8254, the ACPI timer (if present) and the HPET (if > present). Only the RTC has an independent frequency. The fraction is > changed by changing the nominal TSC frequency in the BIOS, but is not > changed by temperature variations. This must be because most clocks = are > derived from a common clock using a PLL. I use this to calibrate all > clocks (except the RTC) by calibrating only 1. I'm not aware of the OpenFirmware having any control over the TBR-change frequency behavior. I've no evidence about any variability based on temperature. >> In case it helps with knowing how analogous your investigations >> are to the original problem context, I report the following. >>=20 >> If you do not care for such information, stop reading here. >>=20 >> # grep ntpd /etc/rc.conf >> ntpd_enable=3D"YES" >> ntpd_sync_on_start=3D"YES" >>=20 >> # sysctl kern.eventtimer >> kern.eventtimer.periodic: 0 >> kern.eventtimer.timer: decrementer >> kern.eventtimer.idletick: 0 >> kern.eventtimer.singlemul: 2 >> kern.eventtimer.choice: decrementer(1000) >> kern.eventtimer.et.decrementer.quality: 1000 >> kern.eventtimer.et.decrementer.frequency: 33333333 >> kern.eventtimer.et.decrementer.flags: 7 >>=20 >> # vmstat -ai | grep decrementer >> cpu0:decrementer 4451007 35 >> cpu3:decrementer 1466010 11 >> cpu2:decrementer 1481722 12 >> cpu1:decrementer 1478618 12 >=20 > Powerpc seems to have a PLL in software too. Event timers don't need = to > be very precise or accurate. powerpc64 sets the value to count down from (in the 32-bit DEC regsiter) via the mtdec instruction. I'm not ware of being=20 able to change the frequency that the countdown occurs at on the old PowerMac G5's. (mtdec is a form of mtspr SPRNUM,rs .) [Accessing the DEC's value is via a privileged instruction on powerpc64 vs. a non-privileged one on power, different instruction encodings but the same mnemonic unless mfspr is used directly as the mnemonic. Just a difference in one bit position in the SPUNUM in teh encoding.] >> (That last is from a basically-idle timeframe.) >>=20 >> # sysctl -a | grep hz >> kern.clockrate: { hz =3D 1000, tick =3D 1000, profhz =3D 8128, stathz = =3D 127 } >> kern.hz: 1000 >=20 > x86 is similar. I think synchronization from using PLLs still gives > unfair scheduling, but with multiple CPUs and often more cycles than = can > be used, no one cares about accidental synchronization or bothers to = steal > cycles using intentional synchronization. Good to know. >> # sysctl kern.timecounter >> kern.timecounter.fast_gettime: 1 >> kern.timecounter.tick: 1 >> kern.timecounter.choice: timebase(0) dummy(-1000000) >> kern.timecounter.hardware: timebase >> kern.timecounter.alloweddeviation: 5 >> kern.timecounter.stepwarnings: 0 >> kern.timecounter.tc.timebase.quality: 0 >> kern.timecounter.tc.timebase.frequency: 33333333 >> kern.timecounter.tc.timebase.counter: 1144662532 >> kern.timecounter.tc.timebase.mask: 4294967295 >>=20 >> (The actual Time Base Register (tbr) i s 64 bits >> and freebsd truncates it down.) >>=20 >> # sysctl -a | grep 'cpu.*freq' >> device cpufreq >> debug.cpufreq.verbose: 0 >> debug.cpufreq.lowest: 0 >> dev.cpufreq.0.%parent: cpu3 >> dev.cpufreq.0.%pnpinfo: >> dev.cpufreq.0.%location: >> dev.cpufreq.0.%driver: cpufreq >> dev.cpufreq.0.%desc: >> dev.cpufreq.%parent: >> dev.cpu.3.freq_levels: 2500/-1 1250/-1 >> dev.cpu.3.freq: 2500 >>=20 >> So 2500 MHz / 33333333 Hz is very near 75 clock periods per >> timebase counter value. >=20 > Looks like it is exactly 75. Fractions are especially easy to guess = and > verify when they are integral. I'm not sure what happens for DEC and TBR change frequencies if the 2500 MHz cpu frequency is dropped down to 1250 MHz. But as I understand my context, 1250 MHz is not in use at all, just 2500 MHz. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F3BBD355-5198-41A0-A461-8E5E12984BE7>