From owner-freebsd-ppc@freebsd.org Fri Apr 5 12:52:40 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8572F157664F; Fri, 5 Apr 2019 12:52:40 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id E92F190085; Fri, 5 Apr 2019 12:52:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 2CA103DC95D; Fri, 5 Apr 2019 23:52:29 +1100 (AEDT) Date: Fri, 5 Apr 2019 23:52:27 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Bruce Evans , Michael Tuexen , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190405113912.GB1923@kib.kiev.ua> Message-ID: <20190405230717.D3383@besplex.bde.org> References: <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> <20190324110138.GR1923@kib.kiev.ua> <20190403070045.GW1923@kib.kiev.ua> <20190404011802.E2390@besplex.bde.org> <20190405113912.GB1923@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 cx=a_idp_d a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=MfOOJuOeLnfdOtmGB0YA:9 a=-f6ZOhMsL1iUIJkq:21 a=gaGQGk8RQ06_Sp8i:21 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: E92F190085 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.98)[-0.982,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Apr 2019 12:52:40 -0000 On Fri, 5 Apr 2019, Konstantin Belousov wrote: > On Thu, Apr 04, 2019 at 02:47:34AM +1100, Bruce Evans wrote: >> I noticed (or better realized) a general problem with multiple >> timehands. ntpd can slew the clock at up to 500 ppm, and at least an >> old version of it uses a rate of 50 ppm to fix up fairly small drifts >> in the milliseconds range. 500 ppm is enormous in CPU cycles -- it is >> 500 thousand nsec or 2 million cycles at 4GHz. Winding up the timecounter >> every 1 msec reduces this to only 2000 cycles. >> ... >> The main point of having multiple timehands (after introducing the per- >> timehands generation count) is to avoid blocking thread N during the >> update, but this doesn't actually work, even for only 2 timehands and >> a global generation count. > > You are describing the generic race between reader and writer. The same > race would exist even with one timehand (and/or one global generation > counter), where ntp adjustment might come earlier or later of some > consumer accessing the timehands. If timehand instance was read before > tc_windup() run but code consumed the result after the windup, it might > appear as if time went backward, and this cannot be fixed without either > re-reading the time after time-depended calculations were done and > restarting, or some globabl lock ensuring serialization. With 1 timehand, its generation count would be global. I think its ordering is strong enough to ensure serialization. I think the fix in the kernel to use a global generation count (with > 1 timehands) is simply s/th->th_generation/tc_generation/g. Oops, that makes multiple timehands useless and gives some blocking. The critical case is when a new timehands is under construction. The old timehands becomes unsafe to use when the writer (tc_windup()) updates the offset. tc_windup() currently sets th_generation to 0 to indicate that the new timehands is unsafe to use. Doing the same with a global tc_generation would give serialization at the cost of busy-waiting for tc_generation to become nonzero. It would indicate that all timehands are unsafe to use. In the library, does it just work to put the global generation count in the shared page? Bruce