Date: Fri, 5 Apr 2019 14:39:12 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: Michael Tuexen <tuexen@fh-muenster.de>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190405113912.GB1923@kib.kiev.ua> In-Reply-To: <20190404011802.E2390@besplex.bde.org> References: <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> <20190324110138.GR1923@kib.kiev.ua> <E0785613-2B6E-4BB3-95CD-03DD96902CD8@fh-muenster.de> <20190403070045.GW1923@kib.kiev.ua> <20190404011802.E2390@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Apr 04, 2019 at 02:47:34AM +1100, Bruce Evans wrote: > I noticed (or better realized) a general problem with multiple > timehands. ntpd can slew the clock at up to 500 ppm, and at least an > old version of it uses a rate of 50 ppm to fix up fairly small drifts > in the milliseconds range. 500 ppm is enormous in CPU cycles -- it is > 500 thousand nsec or 2 million cycles at 4GHz. Winding up the timecounter > every 1 msec reduces this to only 2000 cycles. > > More details of ordering and timing for 1 thread: > - thread N calls binuptime() and it loads timehands > - another or even the same thread runs tc_windup(). This modifies timehands. > - thread N is preempted for a long time, but less than the time for > <number of timehands> updates > - thread N checks the generation count. Since this is for the timehands > contents and not for the timehands pointer, it hasn't changed, so the > old timehands is used > - and instant later, the same thread calls binuptime again() and uses the > new timehands > - now suppose only 2 timehands (as in -current) the worst (?) case of a > slew of 500 ppm for the old timehands and -500 ppm for the new timehands > and almost the worst case of 10 msec for the oldness of the old timehands > relative to the new timehands, with the new timehands about to be updated > after 10 msec (assume perfectly periodiodic updates every 10 msec). The > calculated times are: > > old bintime = old_base + (20 msec) * (1 + 500e-6) > new base = old_base + 10 msec * (1 + 500e-6) # calc by tc_windup() > new bintime = new_base + (10 msec) * (1 - 500e-6) + epsilon > > error = epsilon - (20 msec) * 500e-6 = epsilon - 10000 nsec > > Errors in the negative direction are most harmful. ntpd normally doesn't > change the skew as much as that in one step, but it is easy for adjtime(2) > to change the skew like that and there are no reasonable microadjustments > that would accidentally work around this kernel bug (it seems unreasonable > to limit the skew to 1 ppm and that still gives an error of epsilon + 20 > nsec. > > phk didn't want to slow down timecounters using extra code to make > them them monotonic and coherent (getbinuptime() is incoherent with > binuptime() since it former lags the latter by the update interval), > but this wouldn't be very slow within a thread. > > Monotonicity across threads is a larger problem and not helped by using > a faked forced monotonic time within threads. > > So it seems best to fix the above problem by moving the generation count > from the timehands contents to the timehands pointer, and maybe also > reduce the number of timehands to 1. With 2 timehands, this gives a > shorter race: > > - thread N loads timehands > - tc_windup() > - thread N preempted > - thread N uses old timehands > - case tc_windup() completes first: no problem -- thread N checks the > generation count on the pointer and loops > - case binuptime() completes first: lost a race -- binuptime() is off > by approx <tc_windup() execution time> * <difference in skews>. > > The main point of having multiple timehands (after introducing the per- > timehands generation count) is to avoid blocking thread N during the > update, but this doesn't actually work, even for only 2 timehands and > a global generation count. You are describing the generic race between reader and writer. The same race would exist even with one timehand (and/or one global generation counter), where ntp adjustment might come earlier or later of some consumer accessing the timehands. If timehand instance was read before tc_windup() run but code consumed the result after the windup, it might appear as if time went backward, and this cannot be fixed without either re-reading the time after time-depended calculations were done and restarting, or some globabl lock ensuring serialization.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190405113912.GB1923>