Date: Mon, 11 Feb 2019 16:17:57 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Conrad Meyer <cem@freebsd.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r343985 - head/sys/kern Message-ID: <20190211141730.Y1118@besplex.bde.org> In-Reply-To: <201902102307.x1AN7lj8011617@repo.freebsd.org> References: <201902102307.x1AN7lj8011617@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 10 Feb 2019, Conrad Meyer wrote: > Log: > Prevent overflow for usertime/systime in caclru1 > > PR: 76972 and duplicates > Reported by: Dr. Christopher Landauer <cal AT aero.org>, > Steinar Haug <sthaug AT nethelp.no> > Submitted by: Andrey Zonov <andrey AT zonov.org> (earlier version) > MFC after: 2 weeks kib asked me to fix this a couple of days ago. I wrote a much better version, following the hints in my review of PR 76972. > Modified: head/sys/kern/kern_resource.c > ============================================================================== > --- head/sys/kern/kern_resource.c Sun Feb 10 22:33:41 2019 (r343984) > +++ head/sys/kern/kern_resource.c Sun Feb 10 23:07:46 2019 (r343985) > @@ -863,6 +863,15 @@ rufetchtd(struct thread *td, struct rusage *ru) > calcru1(p, &td->td_rux, &ru->ru_utime, &ru->ru_stime); > } > > +static uint64_t > +mul64_by_fraction(uint64_t a, uint64_t b, uint64_t c) > +{ > + /* > + * Compute floor(a * (b / c)) without overflowing, (b / c) <= 1.0. > + */ > + return ((a / c) * b + (a % c) * (b / c) + (a % c) * (b % c) / c); > +} > + This is the slowest correct fix in the PR followup. kib predicted that I wouldn't like it. It does 2 64-bit divmods (after optimization) and many multiplications per call. Times 2 calls. clang will probably inline this, giving only 3 64-bit divmods instead of 4. Better version: + /* + * Subdivide tu. Combine avoiding overflow with reduction to 32-bit + * operands in the multiplications in the usual case of tu <= + * UINT32_MAX usec = 4294 seconds. + */ + if (__predict_true(tu <= UINT32_MAX && tt <= UINT32_MAX)) { + uu = ((uint64_t)(uint32_t)tu * (uint32_t)ut) / tt; + su = ((uint64_t)(uint32_t)tu * (uint32_t)st) / tt; + } else { + q = tu / tt; + r = tu % tt; + uu = q * ut + (r * ut) / tt; + su = q * st + (r * st) / tt; + } This does 2 32-bit divisions in the usual case, and 1 64-bit divmod and 2 64-bit divs in the slow case. The second clause is directly from Landauer's version. The a = b * q + r method is much easier to understand when written using assignments to variables named q and r. The first clause is from Landauer's version with less magic numbers. I over-optimized it a little and am going to remove the casts. They are only a small optimization for the 32-bit case, and compilers can do them anyway, and the 32-bit udiv/mod libcalls can get closer to doing them. I wrote further fixes and optimizations: - restore the invariant that user + sys time doesn't exceed total time - restore monotonicity of interrupt time (this implies the previous invariant) - merge interrupt time into sys time and make interrupt time 0. Everything becomes simpler and faster and less buggy. - optimize udiv/mod for usec to timeval conversions. Interrupt times became worse than useless with ithreads in SMPng. Sys time for ithreads gives a much better place to record interrupt time than scattered interrupt times in all threads. Interrupt times are now stored in all threads indirectly as tu - uu - su. They are always 0 for non-ithreads, except for bugs. One of the bugs is that rounding down uu and su makes the indirect interrupt time tu - uu - su often 1 (usec) instead of 0. Versions without bugs in invariants and monotonictity had to do complicated adjustments on many later calls to preserve this value as 1. Versions with these bugs have a sticky off-by-1 error instead. I thought that I might have written this overflow bug, but actually it was in 4.4BSD-Lite and I didn't notice it when I cleaned that up. Lite2 does: XX u = sec * 1000000 + usec; XX st = (u * st) / tot; XX [... similarly for other 2 times] Overflow here. The 64-bit tick counters are useless except for them not needing upcasts here, since they although they don't overflow for billions of years, calculations like this overflow after 108 hours. FreeBSD-1 (Net/2?) doesn't have this problem, since it maintains the times as timevals (very inaccurately but obviously monotonically by incrementing the times in hardclock()). It has lots of overflow bugs 248 days from using signed int tick counters and hz = 100. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190211141730.Y1118>