Date: Thu, 4 Apr 2019 13:15:58 -0700 From: Mark Millard <marklmi@yahoo.com> To: Bruce Evans <brde@optusnet.com.au> Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <F22CCA2C-08BB-452E-B00C-A36CD4611540@yahoo.com> In-Reply-To: <20190404011802.E2390@besplex.bde.org> References: <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> <20190324110138.GR1923@kib.kiev.ua> <E0785613-2B6E-4BB3-95CD-03DD96902CD8@fh-muenster.de> <20190403070045.GW1923@kib.kiev.ua> <20190404011802.E2390@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-Apr-3, at 08:47, Bruce Evans <brde at optusnet.com.au> wrote: > . . . >=20 > I noticed (or better realized) a general problem with multiple > timehands. ntpd can slew the clock at up to 500 ppm, and at least an > old version of it uses a rate of 50 ppm to fix up fairly small drifts > in the milliseconds range. 500 ppm is enormous in CPU cycles -- it is > 500 thousand nsec or 2 million cycles at 4GHz. Winding up the = timecounter > every 1 msec reduces this to only 2000 cycles. >=20 > More details of ordering and timing for 1 thread: > - thread N calls binuptime() and it loads timehands > - another or even the same thread runs tc_windup(). This modifies = timehands. > - thread N is preempted for a long time, but less than the time for > <number of timehands> updates > - thread N checks the generation count. Since this is for the = timehands > contents and not for the timehands pointer, it hasn't changed, so the > old timehands is used > - and instant later, the same thread calls binuptime again() and uses = the > new timehands - now suppose only 2 timehands (as in -current) the = worst (?) case of a > slew of 500 ppm for the old timehands and -500 ppm for the new = timehands > and almost the worst case of 10 msec for the oldness of the old = timehands > relative to the new timehands, with the new timehands about to be = updated > after 10 msec (assume perfectly periodiodic updates every 10 msec). = The > calculated times are: >=20 > old bintime =3D old_base + (20 msec) * (1 + 500e-6) > new base =3D old_base + 10 msec * (1 + 500e-6) # calc by = tc_windup() > new bintime =3D new_base + (10 msec) * (1 - 500e-6) + epsilon >=20 > error =3D epsilon - (20 msec) * 500e-6 =3D epsilon - 10000 nsec >=20 > Errors in the negative direction are most harmful. ntpd normally = doesn't > change the skew as much as that in one step, but it is easy for = adjtime(2) > to change the skew like that and there are no reasonable = microadjustments > that would accidentally work around this kernel bug (it seems = unreasonable > to limit the skew to 1 ppm and that still gives an error of epsilon + = 20 > nsec. >=20 > phk didn't want to slow down timecounters using extra code to make > them them monotonic and coherent (getbinuptime() is incoherent with > binuptime() since it former lags the latter by the update interval), > but this wouldn't be very slow within a thread. >=20 > Monotonicity across threads is a larger problem and not helped by = using > a faked forced monotonic time within threads. >=20 > So it seems best to fix the above problem by moving the generation = count > from the timehands contents to the timehands pointer, and maybe also > reduce the number of timehands to 1. With 2 timehands, this gives a > shorter race: >=20 > - thread N loads timehands > - tc_windup() > - thread N preempted > - thread N uses old timehands > - case tc_windup() completes first: no problem -- thread N checks the > generation count on the pointer and loops > - case binuptime() completes first: lost a race -- binuptime() is off > by approx <tc_windup() execution time> * <difference in skews>. >=20 > The main point of having multiple timehands (after introducing the = per- > timehands generation count) is to avoid blocking thread N during the > update, but this doesn't actually work, even for only 2 timehands and = a global generation count. >=20 Thanks for the description of an example way that sbinuptime and the like might not give weakly increasing results. Unfortunately, all the multi-socket contexts that I sometimes have access to are old PowerMacs. And, currently, the only such context is the G5 with 2 sockets, 2 cores per socket (powerpc64). So I've not been able to set up other types of examples to see if problems repeat. I do not have access to a single-socket powerpc64 for contrast in that direction. One oddity is that the eventtimer's decrementer and timecounter may change (nearly) together: both change at 33,333,333 Hz, as if they are tied to the same clock (at least on one socket). In case it helps with knowing how analogous your investigations are to the original problem context, I report the following. If you do not care for such information, stop reading here. # grep ntpd /etc/rc.conf ntpd_enable=3D"YES" ntpd_sync_on_start=3D"YES" # sysctl kern.eventtimer kern.eventtimer.periodic: 0 kern.eventtimer.timer: decrementer kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 2 kern.eventtimer.choice: decrementer(1000) kern.eventtimer.et.decrementer.quality: 1000 kern.eventtimer.et.decrementer.frequency: 33333333 kern.eventtimer.et.decrementer.flags: 7 # vmstat -ai | grep decrementer cpu0:decrementer 4451007 35 cpu3:decrementer 1466010 11 cpu2:decrementer 1481722 12 cpu1:decrementer 1478618 12 (That last is from a basically-idle timeframe.) # sysctl -a | grep hz kern.clockrate: { hz =3D 1000, tick =3D 1000, profhz =3D 8128, stathz =3D = 127 } kern.hz: 1000 # sysctl kern.timecounter kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: timebase(0) dummy(-1000000) kern.timecounter.hardware: timebase kern.timecounter.alloweddeviation: 5 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.timebase.quality: 0 kern.timecounter.tc.timebase.frequency: 33333333 kern.timecounter.tc.timebase.counter: 1144662532 kern.timecounter.tc.timebase.mask: 4294967295 (The actual Time Base Register (tbr) i s 64 bits and freebsd truncates it down.) # sysctl -a | grep 'cpu.*freq' device cpufreq debug.cpufreq.verbose: 0 debug.cpufreq.lowest: 0 dev.cpufreq.0.%parent: cpu3 dev.cpufreq.0.%pnpinfo:=20 dev.cpufreq.0.%location:=20 dev.cpufreq.0.%driver: cpufreq dev.cpufreq.0.%desc:=20 dev.cpufreq.%parent:=20 dev.cpu.3.freq_levels: 2500/-1 1250/-1 dev.cpu.3.freq: 2500 So 2500 MHz / 33333333 Hz is very near 75 clock periods per timebase counter value. I do sometimes have access to a Ryzen Threadripper 1950X based system: FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs FreeBSD/SMP: 1 package(s) x 2 groups x 2 cache groups x 4 core(s) x 2 = hardware threads but it is single=3Dsocket. It has . . . [It turns out that I've accidentally been running it without ntpd being enabled in /etc.rc.conf . Just fixed for next boot.] # sysctl kern.eventtimer = = kern.eventtimer.periodic: = 0 kern.eventtimer.timer: LAPIC kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 2 kern.eventtimer.choice: LAPIC(600) HPET(350) HPET1(350) HPET2(350) = i8254(100) RTC(0) kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.HPET2.quality: 350 kern.eventtimer.et.HPET2.frequency: 14318180 kern.eventtimer.et.HPET2.flags: 3 kern.eventtimer.et.HPET1.quality: 350 kern.eventtimer.et.HPET1.frequency: 14318180 kern.eventtimer.et.HPET1.flags: 3 kern.eventtimer.et.HPET.quality: 350 kern.eventtimer.et.HPET.frequency: 14318180 kern.eventtimer.et.HPET.flags: 3 kern.eventtimer.et.LAPIC.quality: 600 kern.eventtimer.et.LAPIC.frequency: 49907650 kern.eventtimer.et.LAPIC.flags: 7 # vmstat -ai | grep timer irq0: attimer0 0 0 cpu0:timer 609974 14 cpu1:timer 160546 4 cpu2:timer 99803 2 cpu3:timer 158073 4 cpu4:timer 102870 2 cpu5:timer 168433 4 cpu6:timer 163004 4 cpu7:timer 162947 4 cpu8:timer 163501 4 cpu9:timer 160595 4 cpu10:timer 169100 4 cpu11:timer 163888 4 cpu12:timer 144278 3 cpu13:timer 162706 4 cpu14:timer 164161 4 cpu15:timer 167101 4 cpu16:timer 187732 4 cpu17:timer 189228 4 cpu18:timer 179423 4 cpu19:timer 182701 4 cpu20:timer 139658 3 cpu21:timer 186046 4 cpu22:timer 201248 5 cpu23:timer 184823 4 cpu24:timer 186839 4 cpu25:timer 186008 4 cpu26:timer 191473 4 cpu27:timer 182573 4 cpu28:timer 181213 4 cpu29:timer 197659 5 cpu30:timer 188190 4 cpu31:timer 189092 4 (Again: from a basically-idle timeframe.) # sysctl -a | grep hz kern.clockrate: { hz =3D 1000, tick =3D 1000, profhz =3D 8128, stathz =3D = 127 } kern.hz: 1000 debug.psm.hz: 20 # sysctl -a | grep 'cpu.*freq' device cpufreq debug.cpufreq.verbose: 0 debug.cpufreq.lowest: 0 dev.cpufreq.0.%parent: cpu0 dev.cpufreq.0.%pnpinfo:=20 dev.cpufreq.0.%location:=20 dev.cpufreq.0.%driver: cpufreq dev.cpufreq.0.%desc:=20 dev.cpufreq.%parent:=20 dev.cpu.0.freq_levels: 3400/-1 2800/-1 2200/-1 dev.cpu.0.freq: 3400 # sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 1 kern.timecounter.invariant_tsc: 1 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: ACPI-fast(900) i8254(0) HPET(950) TSC-low(1000) = dummy(-1000000) kern.timecounter.hardware: TSC-low kern.timecounter.alloweddeviation: 5 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.counter: 3941257163 kern.timecounter.tc.ACPI-fast.mask: 4294967295 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 54249 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.HPET.frequency: 14318180 kern.timecounter.tc.HPET.counter: 796737466 kern.timecounter.tc.HPET.mask: 4294967295 kern.timecounter.tc.TSC-low.quality: 1000 kern.timecounter.tc.TSC-low.frequency: 1696860321 kern.timecounter.tc.TSC-low.counter: 3613142695 kern.timecounter.tc.TSC-low.mask: 4294967295 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F22CCA2C-08BB-452E-B00C-A36CD4611540>