From owner-freebsd-hackers@freebsd.org Sun Mar 24 11:01:48 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4F0811554ECD; Sun, 24 Mar 2019 11:01:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8787777643; Sun, 24 Mar 2019 11:01:47 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x2OB1cmi085645 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 24 Mar 2019 13:01:41 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x2OB1cmi085645 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x2OB1cIF085644; Sun, 24 Mar 2019 13:01:38 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 24 Mar 2019 13:01:38 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190324110138.GR1923@kib.kiev.ua> References: <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> <20190309144844.K1166@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190309144844.K1166@besplex.bde.org> User-Agent: Mutt/1.11.4 (2019-03-13) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Mar 2019 11:01:48 -0000 On Sat, Mar 09, 2019 at 06:00:14PM +1100, Bruce Evans wrote: > I more strongly disclike (sic) the more complete merge. The central APIs > have even more parameters and reduced type safety to describe objects as > (offset, size) pairs. I changed the patch to be type-safe. Now I like it even more. It provides 1. internal 2. concise 3. type-safe API to fetch data from timehands. The implementation needs to be read only once. > Small delicate loops are ideal for duplicating. They are easier to > understand individually and short enough to compare without using diff > to see gratuitous and substantive differences. Multiple instances are > only hard to write and maintain. Since these multiple instances are > already written, they are only harder to maintain. This is a good argument to have bintime_off and getthmember unmerged (there are two small but delicate loops). The API is internal, so it is only matter for maintainer, which means that the absence of duplication is important. More, all that arguments clearly explain why there should be not twenty similar loops scattered over the source. > > >> XX void > >> XX binuptime(struct bintime *bt) > >> XX { > >> XX @@ -361,7 +383,7 @@ > >> XX th = timehands; > >> XX gen = atomic_load_acq_int(&th->th_generation); > >> XX *bt = th->th_offset; > >> XX - bintime_addx(bt, th->th_scale * tc_delta(th)); > >> XX + bintime_adddelta(bt, th); > >> XX atomic_thread_fence_acq(); > >> XX } while (gen == 0 || gen != th->th_generation); > >> XX } > >> > >> This is the kind of non-churning change that I like. > > Ok. I made all cases where timehands are read, more uniform by > > moving calculations after the generation loop. This makes the > > atomic part of the functions easier to see, and loop body has lower > > chance to hit generation reset. > > I think this change is slightly worse: > - it increases register pressure. 'scale' and 'delta' must be read in a > alost program program before the loop exit test. The above order uses > them and stores the results to memory, so more registers are free for > the exit test. i386 certainly runs out of registers. IIRC, i386 now > spills 'gen'. It would have to spill something to load 'gen' or 'th' > for the test. Which does not matter on any modern architecture anyway. > - it enlarges the window between reading 'scale' and 'delta' and the > caller seeing the results. Preemption in this window gives results > that may be far in the past. My opinion is that quickly exiting the code and avoid retry is more important (as in performance) than making an impression that we protect against preemption. If preemption is important for the caller, then the calling place must use some measures like interrupt disabling and re-checking the time after the operation. Preemption can occur after the loop exit with the same consequences. > The 'get' name is another problem. I would like all the get*time > functions and not add new names starting with 'get'. The library > implementation already doesn't bother optimizing the get*time functions, > but always uses the hardware timecounter. > > getfoo() is a more natural name than foo_get() for the action of getting > foo, but the latter is better for consistency, especially in code that > puts the subsystem name first in nearby code. > > The get*time functions would be better if they were more like > time_second. Note that time_second is racy if time_t is too larger > for the arch so that accesses to it are not atomic, as happens on > 32-bit arches with premature 64-bit time_t. However, in this 32/64 > case, the race is only run every 136 years, with the next event > scheduled in 2038, so this race is even less important now than other > events scheduled in 2038. Bintimes are 96 or 128 bits, so directly > copying a global like time_second for them would race every 1/2**32 > second on 2-bit arches or every 1 second on 64-bit arches. Most of > the loops on the generation count are for fixing these races, but > perhaps a simpler method would work. On 64-bit arches with atomic > 64 accesses on 32-bit boundaries, the following would work: > - set the lower 32 bits of the fraction to 0, or ignore them > - load the higher 32 bits of the fraction and the lower 32 bits of the > seconds > - race once every 136 years starting in 2038 reading the higher 32 bits > of the seconds non-atomically. > - alternatively, break instead of racing in 2038 by setting the higher > 32 bits to 0. This is the same as using sbintimes instead of bintimes. > - drop a few more lower bits by storing a right-shifted value. Right > shifting by just 1 gives a race frequency of once per 272 years, with > the next one in 2006. It would make sense if the functions were written from scratch, but since we already have the generation counts, it is not obvious that such change is useful. But if we decide to go that route (later), my current patch only requires exactly one location getthmember() to experiment with and to change after. So you effectively made yet another, and perhaps most convincing, argument, for me. > > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 2656fb4d22f..8d12847f2cd 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > > @@ -200,20 +202,56 @@ tc_delta(struct timehands *th) > > * the comment in for a description of these 12 functions. > > */ > > > > -#ifdef FFCLOCK > > -void > > -fbclock_binuptime(struct bintime *bt) > > +static __inline void > > +bintime_off(struct bintime *bt, u_int off) > > { > > struct timehands *th; > > - unsigned int gen; > > + struct bintime *btp; > > + uint64_t scale, x; > > + u_int delta, gen, large_delta; > > > > do { > > th = timehands; > > gen = atomic_load_acq_int(&th->th_generation); > > - *bt = th->th_offset; > > - bintime_addx(bt, th->th_scale * tc_delta(th)); > > You didn't fully obfuscate this by combinining this function with > getthmember() so as to deduplicate the loop. > > > + btp = (struct bintime *)((vm_offset_t)th + off); > > Ugly conversion to share code. This is technically incorrect. Improving > the casts gives: > > btp = (void *)(uintptr_t)((uintptr_t)(void *)th + off); > > but this assumes that arithmetic on the intermediate integer does what > is espected. uintptr_t is only guaranteed to work when the intermediate > representation held in it is not adjusted. vm_offset_t has the semantic that is needed for the arithmetic. It is better uintptr_t for kernel, where we know that all object pointers are compatible with vm_offset_t (AKA flat tag-less memory model). > > Fixing the API gives > > static __inline void > bintime_off(struct bintime *btp, struct bintime *base_btp) > > where base_btp is &th->th_bintime or &th->th_offset. > > (th_offset and th_bintime are badly named. th_offset is really a base > time and the offset is tc_delta(). th_bintime is also a base time. > It is the same as th_offset with another actual offset (the difference > between UTC and local time) already added to it as an optimization. In > old versions, th_bintime didn't exist, but the related struct members > th_nanotime and th_microtime existed, since these benefit more from > not converting on every call. How could it be &th->th_offset, when th is calculated inside the call ? But I did modified the API in this spirit, indeed. It takes the member name directly as an argument. > > My old version even documents the struct members, while -current still > has no comments. The comments were lost to staticization. My version > mostly adds "duh" to the banal comments after recovering them: > > XX /* > XX * XXX rotted comment cloned from . > XX * > XX * th_counter is undocumented (duh). > XX * > XX * th_adjustment [PPM << 16] which means that the smallest unit of correction > XX * you can apply amounts to 481.5 usec/year. > XX * > XX * th_scale is undocumented (duh). > XX * > XX * th_offset_count is the contents of the counter which corresponds to the > XX * > XX * rest of the offset_* values. > XX * > XX * th_offset is undocumented (duh). > XX * > XX * th_microtime is undocumented (duh). > XX * > XX * th_nanotime is undocumented (duh). > XX * > XX * XXX especially massive bitrot here. "three" is now "many"... > XX * Each timecounter must supply an array of three timecounters. This is needed > XX * to guarantee atomicity in the code. Index zero is used to transport > XX * modifications, for instance done with sysctl, into the timecounter being > XX * used in a safe way. Such changes may be adopted with a delay of up to 1/HZ. > XX * Index one and two are used alternately for the actual timekeeping. > XX * > XX * th_generation is undocumented (duh). > XX * > XX * th_next is undocumented (duh). > XX */ > > > + *bt = *btp; > > + scale = th->th_scale; > > + delta = tc_delta(th); > > + large_delta = th->th_large_delta; > > I had forgotten that th_scale is so volatile (it may be adjusted on > every windup). th_large_delta is equally volatile. So moving the > calculation outside of the loop gives even more register pressure > than I noticed above. > > > atomic_thread_fence_acq(); > > } while (gen == 0 || gen != th->th_generation); > > + > > + if (__predict_false(delta < large_delta)) { > > + /* Avoid overflow for scale * delta. */ > > + x = (scale >> 32) * delta; > > + bt->sec += x >> 32; > > + bintime_addx(bt, x << 32); > > + bintime_addx(bt, (scale & 0xffffffff) * delta); > > + } else { > > + bintime_addx(bt, scale * delta); > > + } > > +} > > + > > +static __inline void > > +getthmember(void *out, size_t out_size, u_int off) > > +{ > > + struct timehands *th; > > + u_int gen; > > + > > + do { > > + th = timehands; > > + gen = atomic_load_acq_int(&th->th_generation); > > + memcpy(out, (char *)th + off, out_size); > > This isn't so ugly or technically incorrect. Now the object is generic, > so the reference to it should be passed as (void *objp, size_t objsize) > instead of the type-safe (struct bintime *base_bpt). _Generic is what gave me a hint how to make the implementation type-safe. diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..4e94f762026 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +72,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + u_int th_large_delta; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -90,6 +91,7 @@ static struct timehands th1 = { static struct timehands th0 = { .th_counter = &dummy_timecounter, .th_scale = (uint64_t)-1 / 1000000, + .th_large_delta = 1000000, .th_offset = { .sec = 1 }, .th_generation = 1, .th_next = &th1 @@ -200,20 +202,72 @@ tc_delta(struct timehands *th) * the comment in for a description of these 12 functions. */ -#ifdef FFCLOCK -void -fbclock_binuptime(struct bintime *bt) +static __inline void +bintime_off(struct bintime *bt, u_int off) { struct timehands *th; - unsigned int gen; + struct bintime *btp; + uint64_t scale, x; + u_int delta, gen, large_delta; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + btp = (struct bintime *)((vm_offset_t)th + off); + *bt = *btp; + scale = th->th_scale; + delta = tc_delta(th); + large_delta = th->th_large_delta; atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); + + if (__predict_false(delta >= large_delta)) { + /* Avoid overflow for scale * delta. */ + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +} +#define GETTHBINTIME(dst, member) \ +do { \ + _Static_assert(_Generic(((struct timehands *)NULL)->member, \ + struct bintime: 1, default: 0) == 1, \ + "struct timehands member is not of struct bintime type"); \ + bintime_off(dst, __offsetof(struct timehands, member)); \ +} while (0) + +static __inline void +getthmember(void *out, size_t out_size, u_int off) +{ + struct timehands *th; + u_int gen; + + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + memcpy(out, (char *)th + off, out_size); + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); +} +#define GETTHMEMBER(dst, member) \ +do { \ + _Static_assert(_Generic(*dst, \ + __typeof(((struct timehands *)NULL)->member): 1, \ + default: 0) == 1, \ + "*dst and struct timehands member have different types"); \ + getthmember(dst, sizeof(*dst), __offsetof(struct timehands, \ + member)); \ +} while (0) + +#ifdef FFCLOCK +void +fbclock_binuptime(struct bintime *bt) +{ + + GETTHBINTIME(bt, th_offset); } void @@ -237,16 +291,8 @@ fbclock_microuptime(struct timeval *tvp) void fbclock_bintime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHBINTIME(bt, th_bintime); } void @@ -270,100 +316,55 @@ fbclock_microtime(struct timeval *tvp) void fbclock_getbinuptime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(bt, th_offset); } void fbclock_getnanouptime(struct timespec *tsp) { - struct timehands *th; - unsigned int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timespec(&th->th_offset, tsp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(&bt, th_offset); + bintime2timespec(&bt, tsp); } void fbclock_getmicrouptime(struct timeval *tvp) { - struct timehands *th; - unsigned int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timeval(&th->th_offset, tvp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(&bt, th_offset); + bintime2timeval(&bt, tvp); } void fbclock_getbintime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(bt, th_bintime); } void fbclock_getnanotime(struct timespec *tsp) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(tsp, th_nanotime); } void fbclock_getmicrotime(struct timeval *tvp) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tvp = th->th_microtime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(tvp, th_microtime); } #else /* !FFCLOCK */ + void binuptime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHBINTIME(bt, th_offset); } void @@ -387,16 +388,8 @@ microuptime(struct timeval *tvp) void bintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHBINTIME(bt, th_bintime); } void @@ -420,85 +413,47 @@ microtime(struct timeval *tvp) void getbinuptime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(bt, th_offset); } void getnanouptime(struct timespec *tsp) { - struct timehands *th; - u_int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timespec(&th->th_offset, tsp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(&bt, th_offset); + bintime2timespec(&bt, tsp); } void getmicrouptime(struct timeval *tvp) { - struct timehands *th; - u_int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timeval(&th->th_offset, tvp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(&bt, th_offset); + bintime2timeval(&bt, tvp); } void getbintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(bt, th_bintime); } void getnanotime(struct timespec *tsp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(tsp, th_nanotime); } void getmicrotime(struct timeval *tvp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tvp = th->th_microtime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(tvp, th_microtime); } #endif /* FFCLOCK */ @@ -514,15 +469,8 @@ getboottime(struct timeval *boottime) void getboottimebin(struct bintime *boottimebin) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *boottimebin = th->th_boottime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(boottimebin, th_boottime); } #ifdef FFCLOCK @@ -1038,15 +986,8 @@ getmicrotime(struct timeval *tvp) void dtrace_getnanotime(struct timespec *tsp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + GETTHMEMBER(tsp, th_nanotime); } /* @@ -1464,6 +1405,7 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; + th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX); /* * Now that the struct timehands is again consistent, set the new