Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Jan 2018 18:17:10 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Conrad Meyer <cem@freebsd.org>
Cc:        "Rodney W. Grimes" <rgrimes@freebsd.org>,  src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r327354 - head/sys/vm
Message-ID:  <20180120154359.R1063@besplex.bde.org>
In-Reply-To: <CAG6CVpVQqyhub0g-iOjKbZYEaEqAy87WdrocoQ_MxYhvbz1k%2BQ@mail.gmail.com>
References:  <601ee1a2-8f4e-518d-4c86-89871cd652af@vangyzen.net> <201801191704.w0JH4rgT072967@pdx.rh.CN85.dnsmgr.net> <CAG6CVpVQqyhub0g-iOjKbZYEaEqAy87WdrocoQ_MxYhvbz1k%2BQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 19 Jan 2018, Conrad Meyer wrote:

> On Fri, Jan 19, 2018 at 9:04 AM, Rodney W. Grimes
> <freebsd@pdx.rh.cn85.dnsmgr.net> wrote:
>> BUT I do not believe this bit of "style" has anything to do with
>> readability of code, and has more to do with how code runs on a
>> processor and stack frames.   If you defer the declaration of
>> "int i" in the above code does that compiler emmit code to allocate
>> a new stack frame, or does it just add space to the function stack
>> frame for this?
>>
>> What happens if you do
>>         for (int i =3D 0; i < pages;) { }
>>
>>         for (int i =3D 1; i < pages;) { }
>> as 2 seperate loops, do we allocate 2 i's on the stack at
>> function startup, or do we defer and allacte each one
>> only when that basic block runs, or do we allocate 1 i
>> and reuse it, I know that the compiler makes functional
>> code but how is that functionality done?  The current
>> style leaves no doubt about how things are done from
>> that perspective.
>
> Modern (and I'm using that word very loosely here =E2=80=94 think GCC did=
 this
> 10+ years ago) optimizing compilers do something called liveness

gcc-1 did this 25 years ago (if not 30 years ago).

> tracking[0] for variables to determine the scope they are used in
> (something like the region between last write and last read).  So in
> that sense, optimizing compilers do not care whether you declare the
> variable at function scope or local scope =E2=80=94 they always determine=
 the
> local scope the variable is alive in.  (Other than for shadowing,
> which we strongly discourage and is considered bad style.)

gcc did this more primitively 25 years ago, but it always (except for
alloca(3)) allocated space for all variables in a function on entry
to the function (except for alloca(3)).  -O0 doesn't do much more than
allocate all variables on the stack.  -O moves a few variables to
registers.

Debugging is also easier with all variables on the stack, allocated to
fixed positions with a lifetime extending to the end of the function.
gcc does many minor pessimizations, at least with -O, so that -g mostly
works.  clang does the opposite, so that -O -g mostly doesn't work (it
tends to give "value optimized out" even for args).

Debugging is another reason to declare all variables at the start of
functions.  If you reuse a function-scope loop variable named i, then
you can't see what its value was for previous loops in the function,
but you can at least write "display" and "watch" directives for it
without these breaking by the variable going out of scope before the
end of the function.

> Liveness analysis is part of register allocation[1], which typically
> uses a graph coloring algorithm to determine the minimal number of
> distinct registers needed to hold live values.  If the number of
> registers needed is more than the machine provides, some values must
> be spilled to the stack.  (On modern x86 it doesn't matter too much
> what you spill to the stack, because the top few words of the stack
> region is actually quite fast, but clever compilers targeting other
> platforms may attempt to spill less frequently accessed values.)

Not usually the top.  Variables that can't be kept entirely in registers
are usually allocated at a fixed place in the stack which is not
especially likely to be at the top.  Caching works the same everywhere
on the stack.  Sometimes related variables end up in the same cache line
so that caching works best, but doing this intentionally is much harder
than register allocation.

> I think I recall Clang and other LLVM frontends do something nutty
> when they emit intermediary representation, like using a new register
> for each assignment.  This relies on the register allocater to reduce
> that to something sane for the target machine.

gcc also depends on the register allocator to undo initial intentionally
very stupid allocation.  New (physical) registers should be used whenever
possible to maximize use of CPU resources.

Bruce
From owner-svn-src-all@freebsd.org  Sat Jan 20 08:58:04 2018
Return-Path: <owner-svn-src-all@freebsd.org>
Delivered-To: svn-src-all@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E8947EC3C4B;
 Sat, 20 Jan 2018 08:58:04 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id 6F79E6F2FB;
 Sat, 20 Jan 2018 08:58:03 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id CB4AA1047A3A;
 Sat, 20 Jan 2018 19:57:55 +1100 (AEDT)
Date: Sat, 20 Jan 2018 19:57:55 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Don Lewis <truckman@freebsd.org>
cc: cem@freebsd.org, "Rodney W. Grimes" <rgrimes@freebsd.org>, 
 src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, 
 svn-src-head@freebsd.org
Subject: Re: svn commit: r328159 - head/sys/modules
In-Reply-To: <tkrat.a8bb488b61eec3e0@FreeBSD.org>
Message-ID: <20180120183216.U1478@besplex.bde.org>
References: <CAG6CVpV6Suft3v-=08f5UH6BTH2NEJgU_4kYd-UphLZ6yoJB4Q@mail.gmail.com>
 <201801191737.w0JHbM90073097@pdx.rh.CN85.dnsmgr.net>
 <CAG6CVpUj3SfiuHAaPMB1zGXpXPw=U-CsHgk+ivEPyrzhvrrPKw@mail.gmail.com>
 <tkrat.a8bb488b61eec3e0@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=DIX/22Fb c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=iKhvJSA4AAAA:8 a=je3P-fZqYKzYcMC0QtAA:9
 a=CjuIK1q_8ugA:10 a=odh9cflL3HIXMm4fY7Wr:22
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>;
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Jan 2018 08:58:05 -0000

On Fri, 19 Jan 2018, Don Lewis wrote:

> On 19 Jan, Conrad Meyer wrote:
>> On Fri, Jan 19, 2018 at 9:37 AM, Rodney W. Grimes
>> <freebsd@pdx.rh.cn85.dnsmgr.net> wrote:
>>> If you think in assembler it is easy to understand why this is UB,
>>> most (all) architectures Right Logic or Arithmetic Shift only accept an
>>> operand that is a size that can hold log2(wordsize).

The not-unused x86 arch is one that does this.  IIRC, some history of this
is:

- on the 8086, the shift count was taken mod 32.  16 bits was enough for
   anyone, and shifting left or right by 16 through 31 (but not by 32)
   shifted out all of the bits (in the unsigned case) to give 0.

- for the 80386, someone forgot why the 8086 took the count mod 32 instead
   of just 16, and kept using 32.  16 bits was not enough for anyone, and
   shifting left or right by 32 had no effect (even in the signed case?).

   C was standardized at much the same time as the 80386 came out, so
   shifting right by 32 was not required to work.  It gave undefined
   behaviour.  Optimizing compilers took advantage of the UB to give the
   same do-nothing behaviour as the hardware for shift counts of 32
   (or do-something-strange-and-undocumented for larger shift counts).
   Pessimizing compilers could have taken advantage of the UB to shift
   out all of the bits in the sme way at runtime as at compile time like
   some programmers expect.  This would pessimize the usual case (extra
   code would be needed when the produce 0 at runtime when the shift
   count is >= 32).

- binary compatibility prevented anyone fixing this on 32-bit x86's

- modulo 32 is no good for 64-bit mode.  Either someone forgot about
   the 8086 again, or there is some binary compatibility problem that
   inhibited expanding 32 to 128 or "infinity".  (It certainly can't
   be "infinity" because even INT16_MAX is unreachable due to the
   shift count being limited to 256 by the old mistake^Woptimization
   of keeping it in %cl.)

- binary compatibilty prevented fixing this on 64-bit x86's in 32-bit
   mode.

>> This is a logical right shift by a constant larger than the width of
>> the left operand.  As a result, it would a constant zero in any
>> emitted machine code.  It is a bug in the C standard and a concession
>> to naive, non-optimizing compilers that this is considered UB.

This isn't a logical right shift, but it is what the hardware does.  It
is a feature in the C standard and a concession to smart, optimizing
compilers that this is UB.  UB allows the compiler to do anything,
including optimizing to do what the hardware does or pessimizing to
give logical shifts.

It is interesting that the behaviour is undefined even for unsigned
left operands.

UB is not strictly required.  The behaviour could also be implementation
defined or perhaps unspecified.  This makes little difference in practice.
It is unclear if the implementation can define the behaviour as back to
undefined.

> Generating one answer when compiler knows that everything is constant
> and can figure out the "correct" value at compile time, but generating
> an entirely different answer when the shift value is still constant, but
> passed in as a function parameter and hides that information from the
> compiler so the result is generated at runtime sounds like a good way to
> introduce bugs.

My pre-C90 compiler does this for integer division.  C99 requires
incorrect rounding (round towards 0 instead of towards -infinity for
positive divisors), but my compiler does correct rounding for divisions
done at compile time and in software and whatever the hardware does
(usually incorrect) otherwise.  In C90, the rounding is implementation-
defined, so it can even be correct, but in practice it cannot be trusted.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180120154359.R1063>