Date: Fri, 4 Oct 2019 00:11:35 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Bruce Evans <brde@optusnet.com.au>, Sebastian Huber <sebastian.huber@embedded-brains.de>, FreeBSD <freebsd-hackers@freebsd.org> Subject: Re: Why is tc_get_timecount() called two times in tc_init()? Message-ID: <20191003214837.L1757@besplex.bde.org> In-Reply-To: <20191003084021.GI44691@kib.kiev.ua> References: <0e27fb3e-0f60-68e1-dbba-f17c3d91c332@embedded-brains.de> <20191002140040.GA44691@kib.kiev.ua> <20191003013314.O2151@besplex.bde.org> <20191002163946.GE44691@kib.kiev.ua> <20191003030837.C2787@besplex.bde.org> <20191003084021.GI44691@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 3 Oct 2019, Konstantin Belousov wrote: > On Thu, Oct 03, 2019 at 03:12:26AM +1000, Bruce Evans wrote: >> On Wed, 2 Oct 2019, Konstantin Belousov wrote: >> >>> On Thu, Oct 03, 2019 at 02:25:46AM +1000, Bruce Evans wrote: >>>> On Wed, 2 Oct 2019, Konstantin Belousov wrote: >>> So the conclusion is that the second call can be removed, am I right ? >> >> Yes. >> >> All tc_get_timecount() functions should be checked for doing sufficient >> initialization in one call (so that deltas for subsequent calls are >> correct). > > This should be it. Hmm, there are a lot of calls from subsystems that shouldn't know about this. > But, is even a single call to tc_get_timecount() needed for "warmup" ? Yes, something must initialize the hardware timecounter if it is not already initialized (and free-running). The i8254 timecounter is a good example. It has internal state that is only updated while it is active. Something must also initialize the software timecounter. This seems to have been broken since 2002 when switch_timecounter() was removed. Most of the software state is initialized early, but th_offset_count is initialized late in tc_windup(), so going live with 'timecounter = newtc' never works. The order in sysctl_kern_timecounter_hardware() is: ...tc_get_timecount(newtc); /* initialize h/w tc. */ /* Fail to initialize newtc->th_offset with new h/w value. */ timecounter = newtc; /* go live with uninitialized tc. */ > diff --git a/sys/dev/acpica/acpi.c b/sys/dev/acpica/acpi.c > index 382d139617d..28d0d738a58 100644 > --- a/sys/dev/acpica/acpi.c > +++ b/sys/dev/acpica/acpi.c > @@ -3191,7 +3191,6 @@ acpi_resync_clock(struct acpi_softc *sc) > * Warm up timecounter again and reset system clock. > */ > (void)timecounter->tc_get_timecount(timecounter); > - (void)timecounter->tc_get_timecount(timecounter); > inittodr(time_second + sc->acpi_sleep_delay); > } Fairly bad code with bogus comment. Comment more bogus than before. The timecounter hasn't been warmed up (or even initialized) before the comment. "again" in the comment refers to removed second tc call. The actual initialization is done by inittodr(). No tc initialization or warming should be needed before it it. It first reads the RTC, then calls tc_setclock() to set the timecounter to the value read. There is not enough locking around this (real-time locking to guarantee that the value read is not too out of date when it is used...). tc_setclock() now has mutex locking which might be good enough if it were around the whole operation. But it isn't even around all of itself, and tc_setclock() depends on the timecounter being initialized at this point. I think th_offset is used uninitialized after resume. Various bugs and magic limit the damage from this. There is the large bug that CLOCK_MONOTONIC stops while suspended. I think acpi does start its timer earlier and the comment about "warming up again" is partly correct for a single tc call -- the call doesn't warm up again, but further warming which is probably unnecessary. The uninitialized th_offset provides a garbage delta, but tc_setclock() compensates using other deltas: XX void XX tc_setclock(struct timespec *ts) XX { XX struct timespec tbef, taft; XX struct bintime bt, bt2; XX XX timespec2bintime(ts, &bt); XX nanotime(&tbef); Race above here by not locking yet. Use th_offset uninitialized above and below here in some cases. nanotime() gives a garbage time after acpi resume. Since the acpi timer mask is fairly large and the acpi timer frequency is fairly small, there is a chance that overflow doesn't occur so that the garbage time is almost correct as a relative time (the correct monotonic time less the suspended time). XX mtx_lock_spin(&tc_setclock_mtx); Now have almost adequate locking. XX cpu_tick_calibrate(1); XX binuptime(&bt2); XX bintime_sub(&bt, &bt2); This gives an almost correct relative time with some magic: - inittodr() passed us the RTC time which is absolute and now in bt - bt2 is garbage calculated using the garbage th_offset - whatever garbage bt2 is, the final bt gives the delta needed to recover the RTC provided we keep using the garbage offset and any overflows from this give consistent deltas. XX XX /* XXX fiddle all the little crinkly bits around the fiords... */ XX tc_windup(&bt); tc_windup() finally initializes th_offset. Now the deltas aren't from a consistent garbage offset, but the damage is limited. The bt arg here is only used to subvert boottimebin to a non-boottime so that adding the broken monotic time to the boot time gives the RTC time modulo the consistency bugs near here and drift. Monotonic times use the new offset and are only especially broken if that goes backwards over suspend/resume. It is unclear if other subsystems can call timecounter code during resume (or suspend) before the timecounter is fully reinitialized here. XX mtx_unlock_spin(&tc_setclock_mtx); XX ... > > diff --git a/sys/dev/acpica/acpi_timer.c b/sys/dev/acpica/acpi_timer.c > index 34a832089a2..d768397a785 100644 > --- a/sys/dev/acpica/acpi_timer.c > +++ b/sys/dev/acpica/acpi_timer.c > @@ -274,7 +274,6 @@ acpi_timer_resume_handler(struct timecounter *newtc) > "restoring timecounter, %s -> %s\n", > tc->tc_name, newtc->tc_name); > (void)newtc->tc_get_timecount(newtc); > - (void)newtc->tc_get_timecount(newtc); > timecounter = newtc; > } > } Oops, here is the first incomplete reinitialization and warmup. It clearly uses th_offset uninitialized, just like for switching the timecounter hardware. This function (and similarly acpi_timer_suspend_handler()) seem to be mostly wrong. When the acpi timecounter is the current timecounter, suspend doesn't schedule resume and resume has an unnecessary check. When the acpi timecounter is not the current timecounter, suspend forces a switch to it and schules resume to undo this foot-shooting. Supsend and resume do the usual buggy switch with a warmup or 2 but th_offset uninitialized. Note that acpi_resync_clock() for resume (where inittodr() is called) syncs the current timecounter() and doesn't belong under acpi. > diff --git a/sys/dev/xen/control/control.c b/sys/dev/xen/control/control.c > index 98ab5bf3a6b..1e132f4d866 100644 > --- a/sys/dev/xen/control/control.c > +++ b/sys/dev/xen/control/control.c > @@ -303,7 +303,6 @@ xctrl_suspend() > * Warm up timecounter again and reset system clock. > */ > timecounter->tc_get_timecount(timecounter); > - timecounter->tc_get_timecount(timecounter); > inittodr(time_second); Similar with "again" in comment. Actually worse than that: for acpi in the foot-shooting case the acpi timecounter was warmed in suspend when it was bogusly activated; in resume is inactivated and the original timecounter warmed (but not again) and reactivated. For xen, there is no corresponding switching and just this misplaced incomplete reinitialization. > > #ifdef EARLY_AP_STARTUP > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 9920a9a9304..847fbbbf35d 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c > @@ -1257,7 +1257,6 @@ tc_init(struct timecounter *tc) > tc->tc_frequency < timecounter->tc_frequency) > return; > (void)tc->tc_get_timecount(tc); > - (void)tc->tc_get_timecount(tc); > timecounter = tc; > } > This might work for the very first initialization. > @@ -1519,7 +1518,6 @@ sysctl_kern_timecounter_hardware(SYSCTL_HANDLER_ARGS) > > /* Warm up new timecounter. */ > (void)newtc->tc_get_timecount(newtc); > - (void)newtc->tc_get_timecount(newtc); > > timecounter = newtc; > Original problem only appeared to be cosmetic. > @@ -2011,7 +2009,6 @@ inittimecounter(void *dummy) > > /* warm up new timecounter (again) and get rolling. */ > (void)timecounter->tc_get_timecount(timecounter); > - (void)timecounter->tc_get_timecount(timecounter); > mtx_lock_spin(&tc_setclock_mtx); > tc_windup(NULL); > mtx_unlock_spin(&tc_setclock_mtx); > This might work since it doesn't defer the call to tc_windup(). "again" in its comment seems to be bogus. "twice" might have been correct but banal for the 2 calls. This function is a SYSINIT() so it is unclear how much is initialized before it. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20191003214837.L1757>