From owner-freebsd-ppc@freebsd.org Fri Apr 5 11:39:23 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 60FC01573DEB; Fri, 5 Apr 2019 11:39:23 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AC98A8D528; Fri, 5 Apr 2019 11:39:22 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x35BdCH1003695 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 5 Apr 2019 14:39:15 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x35BdCH1003695 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x35BdCH7003692; Fri, 5 Apr 2019 14:39:12 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 5 Apr 2019 14:39:12 +0300 From: Konstantin Belousov To: Bruce Evans Cc: Michael Tuexen , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190405113912.GB1923@kib.kiev.ua> References: <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> <20190324110138.GR1923@kib.kiev.ua> <20190403070045.GW1923@kib.kiev.ua> <20190404011802.E2390@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190404011802.E2390@besplex.bde.org> User-Agent: Mutt/1.11.4 (2019-03-13) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Apr 2019 11:39:23 -0000 On Thu, Apr 04, 2019 at 02:47:34AM +1100, Bruce Evans wrote: > I noticed (or better realized) a general problem with multiple > timehands. ntpd can slew the clock at up to 500 ppm, and at least an > old version of it uses a rate of 50 ppm to fix up fairly small drifts > in the milliseconds range. 500 ppm is enormous in CPU cycles -- it is > 500 thousand nsec or 2 million cycles at 4GHz. Winding up the timecounter > every 1 msec reduces this to only 2000 cycles. > > More details of ordering and timing for 1 thread: > - thread N calls binuptime() and it loads timehands > - another or even the same thread runs tc_windup(). This modifies timehands. > - thread N is preempted for a long time, but less than the time for > updates > - thread N checks the generation count. Since this is for the timehands > contents and not for the timehands pointer, it hasn't changed, so the > old timehands is used > - and instant later, the same thread calls binuptime again() and uses the > new timehands > - now suppose only 2 timehands (as in -current) the worst (?) case of a > slew of 500 ppm for the old timehands and -500 ppm for the new timehands > and almost the worst case of 10 msec for the oldness of the old timehands > relative to the new timehands, with the new timehands about to be updated > after 10 msec (assume perfectly periodiodic updates every 10 msec). The > calculated times are: > > old bintime = old_base + (20 msec) * (1 + 500e-6) > new base = old_base + 10 msec * (1 + 500e-6) # calc by tc_windup() > new bintime = new_base + (10 msec) * (1 - 500e-6) + epsilon > > error = epsilon - (20 msec) * 500e-6 = epsilon - 10000 nsec > > Errors in the negative direction are most harmful. ntpd normally doesn't > change the skew as much as that in one step, but it is easy for adjtime(2) > to change the skew like that and there are no reasonable microadjustments > that would accidentally work around this kernel bug (it seems unreasonable > to limit the skew to 1 ppm and that still gives an error of epsilon + 20 > nsec. > > phk didn't want to slow down timecounters using extra code to make > them them monotonic and coherent (getbinuptime() is incoherent with > binuptime() since it former lags the latter by the update interval), > but this wouldn't be very slow within a thread. > > Monotonicity across threads is a larger problem and not helped by using > a faked forced monotonic time within threads. > > So it seems best to fix the above problem by moving the generation count > from the timehands contents to the timehands pointer, and maybe also > reduce the number of timehands to 1. With 2 timehands, this gives a > shorter race: > > - thread N loads timehands > - tc_windup() > - thread N preempted > - thread N uses old timehands > - case tc_windup() completes first: no problem -- thread N checks the > generation count on the pointer and loops > - case binuptime() completes first: lost a race -- binuptime() is off > by approx * . > > The main point of having multiple timehands (after introducing the per- > timehands generation count) is to avoid blocking thread N during the > update, but this doesn't actually work, even for only 2 timehands and > a global generation count. You are describing the generic race between reader and writer. The same race would exist even with one timehand (and/or one global generation counter), where ntp adjustment might come earlier or later of some consumer accessing the timehands. If timehand instance was read before tc_windup() run but code consumed the result after the windup, it might appear as if time went backward, and this cannot be fixed without either re-reading the time after time-depended calculations were done and restarting, or some globabl lock ensuring serialization.