Date: Sat, 20 Jan 2018 18:17:10 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Conrad Meyer <cem@freebsd.org> Cc: "Rodney W. Grimes" <rgrimes@freebsd.org>, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r327354 - head/sys/vm Message-ID: <20180120154359.R1063@besplex.bde.org> In-Reply-To: <CAG6CVpVQqyhub0g-iOjKbZYEaEqAy87WdrocoQ_MxYhvbz1k%2BQ@mail.gmail.com> References: <601ee1a2-8f4e-518d-4c86-89871cd652af@vangyzen.net> <201801191704.w0JH4rgT072967@pdx.rh.CN85.dnsmgr.net> <CAG6CVpVQqyhub0g-iOjKbZYEaEqAy87WdrocoQ_MxYhvbz1k%2BQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 19 Jan 2018, Conrad Meyer wrote: > On Fri, Jan 19, 2018 at 9:04 AM, Rodney W. Grimes > <freebsd@pdx.rh.cn85.dnsmgr.net> wrote: >> BUT I do not believe this bit of "style" has anything to do with >> readability of code, and has more to do with how code runs on a >> processor and stack frames. If you defer the declaration of >> "int i" in the above code does that compiler emmit code to allocate >> a new stack frame, or does it just add space to the function stack >> frame for this? >> >> What happens if you do >> for (int i =3D 0; i < pages;) { } >> >> for (int i =3D 1; i < pages;) { } >> as 2 seperate loops, do we allocate 2 i's on the stack at >> function startup, or do we defer and allacte each one >> only when that basic block runs, or do we allocate 1 i >> and reuse it, I know that the compiler makes functional >> code but how is that functionality done? The current >> style leaves no doubt about how things are done from >> that perspective. > > Modern (and I'm using that word very loosely here =E2=80=94 think GCC did= this > 10+ years ago) optimizing compilers do something called liveness gcc-1 did this 25 years ago (if not 30 years ago). > tracking[0] for variables to determine the scope they are used in > (something like the region between last write and last read). So in > that sense, optimizing compilers do not care whether you declare the > variable at function scope or local scope =E2=80=94 they always determine= the > local scope the variable is alive in. (Other than for shadowing, > which we strongly discourage and is considered bad style.) gcc did this more primitively 25 years ago, but it always (except for alloca(3)) allocated space for all variables in a function on entry to the function (except for alloca(3)). -O0 doesn't do much more than allocate all variables on the stack. -O moves a few variables to registers. Debugging is also easier with all variables on the stack, allocated to fixed positions with a lifetime extending to the end of the function. gcc does many minor pessimizations, at least with -O, so that -g mostly works. clang does the opposite, so that -O -g mostly doesn't work (it tends to give "value optimized out" even for args). Debugging is another reason to declare all variables at the start of functions. If you reuse a function-scope loop variable named i, then you can't see what its value was for previous loops in the function, but you can at least write "display" and "watch" directives for it without these breaking by the variable going out of scope before the end of the function. > Liveness analysis is part of register allocation[1], which typically > uses a graph coloring algorithm to determine the minimal number of > distinct registers needed to hold live values. If the number of > registers needed is more than the machine provides, some values must > be spilled to the stack. (On modern x86 it doesn't matter too much > what you spill to the stack, because the top few words of the stack > region is actually quite fast, but clever compilers targeting other > platforms may attempt to spill less frequently accessed values.) Not usually the top. Variables that can't be kept entirely in registers are usually allocated at a fixed place in the stack which is not especially likely to be at the top. Caching works the same everywhere on the stack. Sometimes related variables end up in the same cache line so that caching works best, but doing this intentionally is much harder than register allocation. > I think I recall Clang and other LLVM frontends do something nutty > when they emit intermediary representation, like using a new register > for each assignment. This relies on the register allocater to reduce > that to something sane for the target machine. gcc also depends on the register allocator to undo initial intentionally very stupid allocation. New (physical) registers should be used whenever possible to maximize use of CPU resources. Bruce From owner-svn-src-head@freebsd.org Sat Jan 20 08:58:04 2018 Return-Path: <owner-svn-src-head@freebsd.org> Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E8947EC3C4B; Sat, 20 Jan 2018 08:58:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 6F79E6F2FB; Sat, 20 Jan 2018 08:58:03 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id CB4AA1047A3A; Sat, 20 Jan 2018 19:57:55 +1100 (AEDT) Date: Sat, 20 Jan 2018 19:57:55 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> X-X-Sender: bde@besplex.bde.org To: Don Lewis <truckman@freebsd.org> cc: cem@freebsd.org, "Rodney W. Grimes" <rgrimes@freebsd.org>, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r328159 - head/sys/modules In-Reply-To: <tkrat.a8bb488b61eec3e0@FreeBSD.org> Message-ID: <20180120183216.U1478@besplex.bde.org> References: <CAG6CVpV6Suft3v-=08f5UH6BTH2NEJgU_4kYd-UphLZ6yoJB4Q@mail.gmail.com> <201801191737.w0JHbM90073097@pdx.rh.CN85.dnsmgr.net> <CAG6CVpUj3SfiuHAaPMB1zGXpXPw=U-CsHgk+ivEPyrzhvrrPKw@mail.gmail.com> <tkrat.a8bb488b61eec3e0@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=DIX/22Fb c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=iKhvJSA4AAAA:8 a=je3P-fZqYKzYcMC0QtAA:9 a=CjuIK1q_8ugA:10 a=odh9cflL3HIXMm4fY7Wr:22 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current <svn-src-head.freebsd.org> List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>, <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/> List-Post: <mailto:svn-src-head@freebsd.org> List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help> List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>, <mailto:svn-src-head-request@freebsd.org?subject=subscribe> X-List-Received-Date: Sat, 20 Jan 2018 08:58:05 -0000 On Fri, 19 Jan 2018, Don Lewis wrote: > On 19 Jan, Conrad Meyer wrote: >> On Fri, Jan 19, 2018 at 9:37 AM, Rodney W. Grimes >> <freebsd@pdx.rh.cn85.dnsmgr.net> wrote: >>> If you think in assembler it is easy to understand why this is UB, >>> most (all) architectures Right Logic or Arithmetic Shift only accept an >>> operand that is a size that can hold log2(wordsize). The not-unused x86 arch is one that does this. IIRC, some history of this is: - on the 8086, the shift count was taken mod 32. 16 bits was enough for anyone, and shifting left or right by 16 through 31 (but not by 32) shifted out all of the bits (in the unsigned case) to give 0. - for the 80386, someone forgot why the 8086 took the count mod 32 instead of just 16, and kept using 32. 16 bits was not enough for anyone, and shifting left or right by 32 had no effect (even in the signed case?). C was standardized at much the same time as the 80386 came out, so shifting right by 32 was not required to work. It gave undefined behaviour. Optimizing compilers took advantage of the UB to give the same do-nothing behaviour as the hardware for shift counts of 32 (or do-something-strange-and-undocumented for larger shift counts). Pessimizing compilers could have taken advantage of the UB to shift out all of the bits in the sme way at runtime as at compile time like some programmers expect. This would pessimize the usual case (extra code would be needed when the produce 0 at runtime when the shift count is >= 32). - binary compatibility prevented anyone fixing this on 32-bit x86's - modulo 32 is no good for 64-bit mode. Either someone forgot about the 8086 again, or there is some binary compatibility problem that inhibited expanding 32 to 128 or "infinity". (It certainly can't be "infinity" because even INT16_MAX is unreachable due to the shift count being limited to 256 by the old mistake^Woptimization of keeping it in %cl.) - binary compatibilty prevented fixing this on 64-bit x86's in 32-bit mode. >> This is a logical right shift by a constant larger than the width of >> the left operand. As a result, it would a constant zero in any >> emitted machine code. It is a bug in the C standard and a concession >> to naive, non-optimizing compilers that this is considered UB. This isn't a logical right shift, but it is what the hardware does. It is a feature in the C standard and a concession to smart, optimizing compilers that this is UB. UB allows the compiler to do anything, including optimizing to do what the hardware does or pessimizing to give logical shifts. It is interesting that the behaviour is undefined even for unsigned left operands. UB is not strictly required. The behaviour could also be implementation defined or perhaps unspecified. This makes little difference in practice. It is unclear if the implementation can define the behaviour as back to undefined. > Generating one answer when compiler knows that everything is constant > and can figure out the "correct" value at compile time, but generating > an entirely different answer when the shift value is still constant, but > passed in as a function parameter and hides that information from the > compiler so the result is generated at runtime sounds like a good way to > introduce bugs. My pre-C90 compiler does this for integer division. C99 requires incorrect rounding (round towards 0 instead of towards -infinity for positive divisors), but my compiler does correct rounding for divisions done at compile time and in software and whatever the hardware does (usually incorrect) otherwise. In C90, the rounding is implementation- defined, so it can even be correct, but in practice it cannot be trusted. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180120154359.R1063>