Date: Sun, 10 Feb 2019 21:58:22 -0800 From: Conrad Meyer <cem@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r343985 - head/sys/kern Message-ID: <CAG6CVpW8wPtDS-fSqaj7CkgYq_8Mqnso26r7RmxWqrKcP2Noow@mail.gmail.com> In-Reply-To: <20190211141730.Y1118@besplex.bde.org> References: <201902102307.x1AN7lj8011617@repo.freebsd.org> <20190211141730.Y1118@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Bruce, On Sun, Feb 10, 2019 at 9:18 PM Bruce Evans <brde@optusnet.com.au> wrote: > > On Sun, 10 Feb 2019, Conrad Meyer wrote: > > > Log: > > Prevent overflow for usertime/systime in caclru1 > > > > PR: 76972 and duplicates > ... > I wrote a much better version, > following the hints in my review of PR 76972. Great. If your version is better (and correct), please go ahead and commit it. I noticed this bug had been languishing for over a decade with a reasonable patch attached; verified it was correct; and went ahead and committed it. If there's something even better, fantastic. > This is the slowest correct fix in the PR followup. kib predicted > that I wouldn't like it. It does 2 64-bit divmods (after optimization) > and many multiplications per call. Times 2 calls. clang will probably > inline this, giving only 3 64-bit divmods instead of 4. Did you measure any of this, or is this speculation? I plugged both versions into Godbolt just for amusement: https://godbolt.org/z/KE_FF8 (GCC 8.2), https://godbolt.org/z/WSepYg (Clang 7.0.0). Andrey's version has no branches; yours has two conditional branches as well as a large NOP to align the branch target (GCC); Clang manages only a single branch and doesn't pad the branch target. Andrey's version has five divs at gcc8.2 -O2, and six imuls. In the happy case, your version has two cmp+ja, two divs, and two imuls. In the unhappy case, your version has two cmp+ja, three div, and four imul. Just eyeballing it, your code might be marginally larger, but it's fairly similar. Does it matter? I doubt it. Modern CPUs are crazy superscalar OOO magic and as long as there aren't bad data dependencies, it can cruise along. All values reside in registers and imul isn't much slower than add. div is a bit slower, but probably cheaper than an L1 miss. Feel free to measure and demonstrate a difference if you feel it is important. I don't care, as long as it's correct (which it was not for the past 14 years). Conrad
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG6CVpW8wPtDS-fSqaj7CkgYq_8Mqnso26r7RmxWqrKcP2Noow>