Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Feb 2019 21:58:22 -0800
From:      Conrad Meyer <cem@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r343985 - head/sys/kern
Message-ID:  <CAG6CVpW8wPtDS-fSqaj7CkgYq_8Mqnso26r7RmxWqrKcP2Noow@mail.gmail.com>
In-Reply-To: <20190211141730.Y1118@besplex.bde.org>
References:  <201902102307.x1AN7lj8011617@repo.freebsd.org> <20190211141730.Y1118@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Bruce,

On Sun, Feb 10, 2019 at 9:18 PM Bruce Evans <brde@optusnet.com.au> wrote:
>
> On Sun, 10 Feb 2019, Conrad Meyer wrote:
>
> > Log:
> >  Prevent overflow for usertime/systime in caclru1
> >
> >  PR:          76972 and duplicates
> ...
> I wrote a much better version,
> following the hints in my review of PR 76972.

Great.

If your version is better (and correct), please go ahead and commit
it.  I noticed this bug had been languishing for over a decade with a
reasonable patch attached; verified it was correct; and went ahead and
committed it.  If there's something even better, fantastic.

> This is the slowest correct fix in the PR followup.  kib predicted
> that I wouldn't like it.  It does 2 64-bit divmods (after optimization)
> and many multiplications per call.  Times 2 calls.  clang will probably
> inline this, giving only 3 64-bit divmods instead of 4.

Did you measure any of this, or is this speculation?  I plugged both
versions into Godbolt just for amusement: https://godbolt.org/z/KE_FF8
(GCC 8.2), https://godbolt.org/z/WSepYg (Clang 7.0.0).

Andrey's version has no branches; yours has two conditional branches
as well as a large NOP to align the branch target (GCC); Clang manages
only a single branch and doesn't pad the branch target.

Andrey's version has five divs at gcc8.2 -O2, and six imuls.

In the happy case, your version has two cmp+ja, two divs, and two
imuls.  In the unhappy case, your version has two cmp+ja, three div,
and four imul.  Just eyeballing it, your code might be marginally
larger, but it's fairly similar.

Does it matter?  I doubt it.  Modern CPUs are crazy superscalar OOO
magic and as long as there aren't bad data dependencies, it can cruise
along.  All values reside in registers and imul isn't much slower than
add.  div is a bit slower, but probably cheaper than an L1 miss.  Feel
free to measure and demonstrate a difference if you feel it is
important.  I don't care, as long as it's correct (which it was not
for the past 14 years).

Conrad



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG6CVpW8wPtDS-fSqaj7CkgYq_8Mqnso26r7RmxWqrKcP2Noow>