Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Apr 2019 15:38:02 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        Bruce Evans <brde@optusnet.com.au>,  Konstantin Belousov <kostikbel@gmail.com>,  freebsd-hackers Hackers <freebsd-hackers@freebsd.org>,  FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID:  <20190405150236.A959@besplex.bde.org>
In-Reply-To: <F22CCA2C-08BB-452E-B00C-A36CD4611540@yahoo.com>
References:  <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> <20190324110138.GR1923@kib.kiev.ua> <E0785613-2B6E-4BB3-95CD-03DD96902CD8@fh-muenster.de> <20190403070045.GW1923@kib.kiev.ua> <20190404011802.E2390@besplex.bde.org> <F22CCA2C-08BB-452E-B00C-A36CD4611540@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 4 Apr 2019, Mark Millard wrote:

> On 2019-Apr-3, at 08:47, Bruce Evans <brde at optusnet.com.au> wrote:
>> . . .
>>
>> I noticed (or better realized) a general problem with multiple
>> timehands.  ntpd can slew the clock at up to 500 ppm, and at least an
>> old version of it uses a rate of 50 ppm to fix up fairly small drifts
>> in the milliseconds range.  500 ppm is enormous in CPU cycles -- it is
>> 500 thousand nsec or 2 million cycles at 4GHz.  Winding up the timecounter
>> every 1 msec reduces this to only 2000 cycles.
>>
>> More details of ordering and timing for 1 thread:
>> ...
> Thanks for the description of an example way that sbinuptime and
> the like might not give weakly increasing results.
>
> Unfortunately, all the multi-socket contexts that I sometimes have
> access to are old PowerMacs. And, currently, the only such context
> is the G5 with 2 sockets, 2 cores per socket (powerpc64). So I've
> not been able to set up other types of examples to see if problems
> repeat.
>
> I do not have access to a single-socket powerpc64 for contrast in
> that direction.

Testing 1 socket is time-consuming enough.  Do these old systems
use the equivalent of an x86 TSC for the timecounter?  With multiple
sockets, it isn't clear how even a hardware timer independent of the
CPUs can be distributed so as to appear to be monotonic on all cors.

> One oddity is that the eventtimer's decrementer and timecounter
> may change (nearly) together: both change at 33,333,333 Hz, as if
> they are tied to the same clock (at least on one socket).

I think this is from a normal hardware implementation.  On all of
my x86 systems with a TSC, the TSC frequency is an exact fractional
multiple of the i8254, the ACPI timer (if present) and the HPET (if
present).  Only the RTC has an independent frequency.  The fraction is
changed by changing the nominal TSC frequency in the BIOS, but is not
changed by temperature variations.  This must be because most clocks are
derived from a common clock using a PLL.  I use this to calibrate all
clocks (except the RTC) by calibrating only 1.

> In case it helps with knowing how analogous your investigations
> are to the original problem context, I report the following.
>
> If you do not care for such information, stop reading here.
>
> # grep ntpd /etc/rc.conf
> ntpd_enable="YES"
> ntpd_sync_on_start="YES"
>
> # sysctl kern.eventtimer
> kern.eventtimer.periodic: 0
> kern.eventtimer.timer: decrementer
> kern.eventtimer.idletick: 0
> kern.eventtimer.singlemul: 2
> kern.eventtimer.choice: decrementer(1000)
> kern.eventtimer.et.decrementer.quality: 1000
> kern.eventtimer.et.decrementer.frequency: 33333333
> kern.eventtimer.et.decrementer.flags: 7
>
> # vmstat -ai | grep decrementer
> cpu0:decrementer                 4451007         35
> cpu3:decrementer                 1466010         11
> cpu2:decrementer                 1481722         12
> cpu1:decrementer                 1478618         12

Powerpc seems to have a PLL in software too.  Event timers don't need to
be very precise or accurate.

> (That last is from a basically-idle timeframe.)
>
> # sysctl -a | grep hz
> kern.clockrate: { hz = 1000, tick = 1000, profhz = 8128, stathz = 127 }
> kern.hz: 1000

x86 is similar.  I think synchronization from using PLLs still gives
unfair scheduling, but with multiple CPUs and often more cycles than can
be used, no one cares about accidental synchronization or bothers to steal
cycles using intentional synchronization.

> # sysctl kern.timecounter
> kern.timecounter.fast_gettime: 1
> kern.timecounter.tick: 1
> kern.timecounter.choice: timebase(0) dummy(-1000000)
> kern.timecounter.hardware: timebase
> kern.timecounter.alloweddeviation: 5
> kern.timecounter.stepwarnings: 0
> kern.timecounter.tc.timebase.quality: 0
> kern.timecounter.tc.timebase.frequency: 33333333
> kern.timecounter.tc.timebase.counter: 1144662532
> kern.timecounter.tc.timebase.mask: 4294967295
>
> (The actual Time Base Register (tbr) i s 64 bits
> and freebsd truncates it down.)
>
> # sysctl -a | grep 'cpu.*freq'
> device	cpufreq
> debug.cpufreq.verbose: 0
> debug.cpufreq.lowest: 0
> dev.cpufreq.0.%parent: cpu3
> dev.cpufreq.0.%pnpinfo:
> dev.cpufreq.0.%location:
> dev.cpufreq.0.%driver: cpufreq
> dev.cpufreq.0.%desc:
> dev.cpufreq.%parent:
> dev.cpu.3.freq_levels: 2500/-1 1250/-1
> dev.cpu.3.freq: 2500
>
> So 2500 MHz / 33333333 Hz is very near 75 clock periods per
> timebase counter value.

Looks like it is exactly 75.  Fractions are especially easy to guess and
verify when they are integral.

> I do sometimes have access to a Ryzen Threadripper 1950X based system:
>
> FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
> FreeBSD/SMP: 1 package(s) x 2 groups x 2 cache groups x 4 core(s) x 2 hardware threads
>
> but it is single=socket. It has . . .
> ...

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190405150236.A959>