Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Apr 2019 14:39:12 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Michael Tuexen <tuexen@fh-muenster.de>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID:  <20190405113912.GB1923@kib.kiev.ua>
In-Reply-To: <20190404011802.E2390@besplex.bde.org>
References:  <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> <20190324110138.GR1923@kib.kiev.ua> <E0785613-2B6E-4BB3-95CD-03DD96902CD8@fh-muenster.de> <20190403070045.GW1923@kib.kiev.ua> <20190404011802.E2390@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Apr 04, 2019 at 02:47:34AM +1100, Bruce Evans wrote:
> I noticed (or better realized) a general problem with multiple
> timehands.  ntpd can slew the clock at up to 500 ppm, and at least an
> old version of it uses a rate of 50 ppm to fix up fairly small drifts
> in the milliseconds range.  500 ppm is enormous in CPU cycles -- it is
> 500 thousand nsec or 2 million cycles at 4GHz.  Winding up the timecounter
> every 1 msec reduces this to only 2000 cycles.
> 
> More details of ordering and timing for 1 thread:
> - thread N calls binuptime() and it loads timehands
> - another or even the same thread runs tc_windup().  This modifies timehands.
> - thread N is preempted for a long time, but less than the time for
>    <number of timehands> updates
> - thread N checks the generation count.  Since this is for the timehands
>    contents and not for the timehands pointer, it hasn't changed, so the
>    old timehands is used
> - and instant later, the same thread calls binuptime again() and uses the
>    new timehands 
> - now suppose only 2 timehands (as in -current) the worst (?) case of a
>    slew of 500 ppm for the old timehands and -500 ppm for the new timehands
>    and almost the worst case of 10 msec for the oldness of the old timehands
>    relative to the new timehands, with the new timehands about to be updated
>    after 10 msec (assume perfectly periodiodic updates every 10 msec).  The
>    calculated times are:
> 
>    old bintime = old_base + (20 msec) * (1 + 500e-6)
>    new base = old_base + 10 msec * (1 + 500e-6)    # calc by tc_windup()
>    new bintime = new_base + (10 msec) * (1 - 500e-6) + epsilon
> 
>    error = epsilon - (20 msec) * 500e-6 = epsilon - 10000 nsec
> 
> Errors in the negative direction are most harmful.  ntpd normally doesn't
> change the skew as much as that in one step, but it is easy for adjtime(2)
> to change the skew like that and there are no reasonable microadjustments
> that would accidentally work around this kernel bug (it seems unreasonable
> to limit the skew to 1 ppm and that still gives an error of epsilon + 20
> nsec.
> 
> phk didn't want to slow down timecounters using extra code to make
> them them monotonic and coherent (getbinuptime() is incoherent with
> binuptime() since it former lags the latter by the update interval),
> but this wouldn't be very slow within a thread.
> 
> Monotonicity across threads is a larger problem and not helped by using
> a faked forced monotonic time within threads.
> 
> So it seems best to fix the above problem by moving the generation count
> from the timehands contents to the timehands pointer, and maybe also
> reduce the number of timehands to 1.  With 2 timehands, this gives a
> shorter race:
> 
> - thread N loads timehands
> - tc_windup()
> - thread N preempted
> - thread N uses old timehands
> - case tc_windup() completes first: no problem -- thread N checks the
>    generation count on the pointer and loops
> - case binuptime() completes first: lost a race -- binuptime() is off
>    by approx <tc_windup() execution time> * <difference in skews>.
> 
> The main point of having multiple timehands (after introducing the per-
> timehands generation count) is to avoid blocking thread N during the
> update, but this doesn't actually work, even for only 2 timehands and 
> a global generation count.

You are describing the generic race between reader and writer. The same
race would exist even with one timehand (and/or one global generation
counter), where ntp adjustment might come earlier or later of some
consumer accessing the timehands. If timehand instance was read before
tc_windup() run but code consumed the result after the windup, it might
appear as if time went backward, and this cannot be fixed without either
re-reading the time after time-depended calculations were done and
restarting, or some globabl lock ensuring serialization.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190405113912.GB1923>