From owner-freebsd-hackers@freebsd.org Thu Apr 4 20:16:05 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C7EDF1556353 for ; Thu, 4 Apr 2019 20:16:04 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic310-21.consmr.mail.gq1.yahoo.com (sonic310-21.consmr.mail.gq1.yahoo.com [98.137.69.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8AA168AC24 for ; Thu, 4 Apr 2019 20:16:03 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: tUl2yRAVM1mWa9EZN.Spbbd98seZjY_PZeD2.x5vKu1m3eempaq_3JfWWx7FY_s fvBY3fOt9Fhg5ve0M.CrRi1sXpQrxrXHOcSce9A4IxZMcNbIpicsSM5DTSaj7wd1maxyMuvOMtgU 0AnY1xYulmHaMv4219_1R0SWjhqtUcP9.31K3zc60yJKeeyFhFMwdNGyjGxo5YGC2Uy9tvr2E7eM itLgBgk11F2J3JqdeOEbk9TT0SdMK52qqfEWO1MUachqz8kTISq5FRHXCSpIPzWJpWF2xXynI1Vw 39j_bGs5luP4YX9bQ2Z0RFd45gt5h7QhPBWGXfso2G3s0c_NwAMZGxPlF59GUuCCKSGmyYUPXnRa blQozlJvxTi9vZ3GZkGkERWYUCfsT88o1P7x2N3m8kOANxfumKc2ESY5WZJjB03GVkhoHNmG4Af4 YaZkZaL.aLp_q.IeV3qeEFbyhk.ZbI2.4ZffDpsj4v4sup_JEAU58mPc94gZsUaX3k9zXPqKfuiR M7ThWGybRThlbxPSUv96VZARbP1MafRTWYu6tolvGMCOZQo314oqK6eAak_VpJMkXqE29NyoYawm LE9mcj42jxE2pmTBG2z6F9.Q8NeNLt8JiXOsPZ.qNb1e2csLotagwgjo1JOQqYLfI.nc5I0xxjKd Fu7LTwjvfIz70zOGH6wzHUP914MoEIL02S.boLroVAXg4tDJHav1J_Kce9HtrKBroioGaNI36jSJ vu9J4G80CyOfvwa.2qvO9P4OL9xvKnDBhiOQ3Ya4D2GMzcgLfQT6BIA0J5I6iJYwRtiNKbLI4RbY P_AntPHcrfIIoFsjahVbSZGRp5k_EIyfyCX_A1pob4ateBe3SBrCzM8ZhcTnBKfX1XVu97vdbE7p Y2puXXEYJ8u2JyzNJh_lSyqCSgRdsyMLCNmNsygAPCiFv1fp6BATXbQV0wYOPMbMrV5RKudE2dXh 6ZusHZ1n6eevT7.bHTQ5qUGqJEZ6bkgeecvlXjtWwWTmXGzhLNQug1Ur8yE_2IzMZd1.v2DYNS6V bjyQH7NX7N2JMb4eJu1cH0rYt9UoVxTYVyBBJrCuoF90sy54_EIRMlZFbGAZB11ZUm14ayC5gxoS CZw-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic310.consmr.mail.gq1.yahoo.com with HTTP; Thu, 4 Apr 2019 20:16:01 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113]) ([67.170.167.181]) by smtp420.mail.gq1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 6a5cfc0e336cb9a55c16cba217ec176c; Thu, 04 Apr 2019 20:15:59 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] From: Mark Millard In-Reply-To: <20190404011802.E2390@besplex.bde.org> Date: Thu, 4 Apr 2019 13:15:58 -0700 Cc: Konstantin Belousov , freebsd-hackers Hackers , FreeBSD PowerPC ML Content-Transfer-Encoding: quoted-printable Message-Id: References: <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> <20190324110138.GR1923@kib.kiev.ua> <20190403070045.GW1923@kib.kiev.ua> <20190404011802.E2390@besplex.bde.org> To: Bruce Evans X-Mailer: Apple Mail (2.3445.104.8) X-Rspamd-Queue-Id: 8AA168AC24 X-Spamd-Bar: / X-Spamd-Result: default: False [0.97 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.21)[-0.212,0]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_SPAM_SHORT(0.70)[0.698,0]; NEURAL_HAM_LONG(-0.10)[-0.102,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.09)[ip: (3.81), ipnet: 98.137.64.0/21(0.96), asn: 36647(0.76), country: US(-0.06)]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[147.69.137.98.list.dnswl.org : 127.0.5.0]; FREEMAIL_CC(0.00)[gmail.com] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Apr 2019 20:16:05 -0000 On 2019-Apr-3, at 08:47, Bruce Evans wrote: > . . . >=20 > I noticed (or better realized) a general problem with multiple > timehands. ntpd can slew the clock at up to 500 ppm, and at least an > old version of it uses a rate of 50 ppm to fix up fairly small drifts > in the milliseconds range. 500 ppm is enormous in CPU cycles -- it is > 500 thousand nsec or 2 million cycles at 4GHz. Winding up the = timecounter > every 1 msec reduces this to only 2000 cycles. >=20 > More details of ordering and timing for 1 thread: > - thread N calls binuptime() and it loads timehands > - another or even the same thread runs tc_windup(). This modifies = timehands. > - thread N is preempted for a long time, but less than the time for > updates > - thread N checks the generation count. Since this is for the = timehands > contents and not for the timehands pointer, it hasn't changed, so the > old timehands is used > - and instant later, the same thread calls binuptime again() and uses = the > new timehands - now suppose only 2 timehands (as in -current) the = worst (?) case of a > slew of 500 ppm for the old timehands and -500 ppm for the new = timehands > and almost the worst case of 10 msec for the oldness of the old = timehands > relative to the new timehands, with the new timehands about to be = updated > after 10 msec (assume perfectly periodiodic updates every 10 msec). = The > calculated times are: >=20 > old bintime =3D old_base + (20 msec) * (1 + 500e-6) > new base =3D old_base + 10 msec * (1 + 500e-6) # calc by = tc_windup() > new bintime =3D new_base + (10 msec) * (1 - 500e-6) + epsilon >=20 > error =3D epsilon - (20 msec) * 500e-6 =3D epsilon - 10000 nsec >=20 > Errors in the negative direction are most harmful. ntpd normally = doesn't > change the skew as much as that in one step, but it is easy for = adjtime(2) > to change the skew like that and there are no reasonable = microadjustments > that would accidentally work around this kernel bug (it seems = unreasonable > to limit the skew to 1 ppm and that still gives an error of epsilon + = 20 > nsec. >=20 > phk didn't want to slow down timecounters using extra code to make > them them monotonic and coherent (getbinuptime() is incoherent with > binuptime() since it former lags the latter by the update interval), > but this wouldn't be very slow within a thread. >=20 > Monotonicity across threads is a larger problem and not helped by = using > a faked forced monotonic time within threads. >=20 > So it seems best to fix the above problem by moving the generation = count > from the timehands contents to the timehands pointer, and maybe also > reduce the number of timehands to 1. With 2 timehands, this gives a > shorter race: >=20 > - thread N loads timehands > - tc_windup() > - thread N preempted > - thread N uses old timehands > - case tc_windup() completes first: no problem -- thread N checks the > generation count on the pointer and loops > - case binuptime() completes first: lost a race -- binuptime() is off > by approx * . >=20 > The main point of having multiple timehands (after introducing the = per- > timehands generation count) is to avoid blocking thread N during the > update, but this doesn't actually work, even for only 2 timehands and = a global generation count. >=20 Thanks for the description of an example way that sbinuptime and the like might not give weakly increasing results. Unfortunately, all the multi-socket contexts that I sometimes have access to are old PowerMacs. And, currently, the only such context is the G5 with 2 sockets, 2 cores per socket (powerpc64). So I've not been able to set up other types of examples to see if problems repeat. I do not have access to a single-socket powerpc64 for contrast in that direction. One oddity is that the eventtimer's decrementer and timecounter may change (nearly) together: both change at 33,333,333 Hz, as if they are tied to the same clock (at least on one socket). In case it helps with knowing how analogous your investigations are to the original problem context, I report the following. If you do not care for such information, stop reading here. # grep ntpd /etc/rc.conf ntpd_enable=3D"YES" ntpd_sync_on_start=3D"YES" # sysctl kern.eventtimer kern.eventtimer.periodic: 0 kern.eventtimer.timer: decrementer kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 2 kern.eventtimer.choice: decrementer(1000) kern.eventtimer.et.decrementer.quality: 1000 kern.eventtimer.et.decrementer.frequency: 33333333 kern.eventtimer.et.decrementer.flags: 7 # vmstat -ai | grep decrementer cpu0:decrementer 4451007 35 cpu3:decrementer 1466010 11 cpu2:decrementer 1481722 12 cpu1:decrementer 1478618 12 (That last is from a basically-idle timeframe.) # sysctl -a | grep hz kern.clockrate: { hz =3D 1000, tick =3D 1000, profhz =3D 8128, stathz =3D = 127 } kern.hz: 1000 # sysctl kern.timecounter kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: timebase(0) dummy(-1000000) kern.timecounter.hardware: timebase kern.timecounter.alloweddeviation: 5 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.timebase.quality: 0 kern.timecounter.tc.timebase.frequency: 33333333 kern.timecounter.tc.timebase.counter: 1144662532 kern.timecounter.tc.timebase.mask: 4294967295 (The actual Time Base Register (tbr) i s 64 bits and freebsd truncates it down.) # sysctl -a | grep 'cpu.*freq' device cpufreq debug.cpufreq.verbose: 0 debug.cpufreq.lowest: 0 dev.cpufreq.0.%parent: cpu3 dev.cpufreq.0.%pnpinfo:=20 dev.cpufreq.0.%location:=20 dev.cpufreq.0.%driver: cpufreq dev.cpufreq.0.%desc:=20 dev.cpufreq.%parent:=20 dev.cpu.3.freq_levels: 2500/-1 1250/-1 dev.cpu.3.freq: 2500 So 2500 MHz / 33333333 Hz is very near 75 clock periods per timebase counter value. I do sometimes have access to a Ryzen Threadripper 1950X based system: FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs FreeBSD/SMP: 1 package(s) x 2 groups x 2 cache groups x 4 core(s) x 2 = hardware threads but it is single=3Dsocket. It has . . . [It turns out that I've accidentally been running it without ntpd being enabled in /etc.rc.conf . Just fixed for next boot.] # sysctl kern.eventtimer = = kern.eventtimer.periodic: = 0 kern.eventtimer.timer: LAPIC kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 2 kern.eventtimer.choice: LAPIC(600) HPET(350) HPET1(350) HPET2(350) = i8254(100) RTC(0) kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.HPET2.quality: 350 kern.eventtimer.et.HPET2.frequency: 14318180 kern.eventtimer.et.HPET2.flags: 3 kern.eventtimer.et.HPET1.quality: 350 kern.eventtimer.et.HPET1.frequency: 14318180 kern.eventtimer.et.HPET1.flags: 3 kern.eventtimer.et.HPET.quality: 350 kern.eventtimer.et.HPET.frequency: 14318180 kern.eventtimer.et.HPET.flags: 3 kern.eventtimer.et.LAPIC.quality: 600 kern.eventtimer.et.LAPIC.frequency: 49907650 kern.eventtimer.et.LAPIC.flags: 7 # vmstat -ai | grep timer irq0: attimer0 0 0 cpu0:timer 609974 14 cpu1:timer 160546 4 cpu2:timer 99803 2 cpu3:timer 158073 4 cpu4:timer 102870 2 cpu5:timer 168433 4 cpu6:timer 163004 4 cpu7:timer 162947 4 cpu8:timer 163501 4 cpu9:timer 160595 4 cpu10:timer 169100 4 cpu11:timer 163888 4 cpu12:timer 144278 3 cpu13:timer 162706 4 cpu14:timer 164161 4 cpu15:timer 167101 4 cpu16:timer 187732 4 cpu17:timer 189228 4 cpu18:timer 179423 4 cpu19:timer 182701 4 cpu20:timer 139658 3 cpu21:timer 186046 4 cpu22:timer 201248 5 cpu23:timer 184823 4 cpu24:timer 186839 4 cpu25:timer 186008 4 cpu26:timer 191473 4 cpu27:timer 182573 4 cpu28:timer 181213 4 cpu29:timer 197659 5 cpu30:timer 188190 4 cpu31:timer 189092 4 (Again: from a basically-idle timeframe.) # sysctl -a | grep hz kern.clockrate: { hz =3D 1000, tick =3D 1000, profhz =3D 8128, stathz =3D = 127 } kern.hz: 1000 debug.psm.hz: 20 # sysctl -a | grep 'cpu.*freq' device cpufreq debug.cpufreq.verbose: 0 debug.cpufreq.lowest: 0 dev.cpufreq.0.%parent: cpu0 dev.cpufreq.0.%pnpinfo:=20 dev.cpufreq.0.%location:=20 dev.cpufreq.0.%driver: cpufreq dev.cpufreq.0.%desc:=20 dev.cpufreq.%parent:=20 dev.cpu.0.freq_levels: 3400/-1 2800/-1 2200/-1 dev.cpu.0.freq: 3400 # sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 1 kern.timecounter.invariant_tsc: 1 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: ACPI-fast(900) i8254(0) HPET(950) TSC-low(1000) = dummy(-1000000) kern.timecounter.hardware: TSC-low kern.timecounter.alloweddeviation: 5 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.counter: 3941257163 kern.timecounter.tc.ACPI-fast.mask: 4294967295 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 54249 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.HPET.frequency: 14318180 kern.timecounter.tc.HPET.counter: 796737466 kern.timecounter.tc.HPET.mask: 4294967295 kern.timecounter.tc.TSC-low.quality: 1000 kern.timecounter.tc.TSC-low.frequency: 1696860321 kern.timecounter.tc.TSC-low.counter: 3613142695 kern.timecounter.tc.TSC-low.mask: 4294967295 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)