Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 14 May 2017 04:21:30 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Mark Millard <markmi@dsl-only.net>
Cc:        sgk@troutmask.apl.washington.edu, Bruce Evans <brde@optusnet.com.au>,  freebsd-hackers@freebsd.org, numerics@freebsd.org
Subject:   Re: catrig[fl].c and inexact
Message-ID:  <20170514023721.O1230@besplex.bde.org>
In-Reply-To: <DC2DA938-6A07-4CB0-AFB6-038368971B77@dsl-only.net>
References:  <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> <DC2DA938-6A07-4CB0-AFB6-038368971B77@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 13 May 2017, Mark Millard wrote:

>
> On 2017-May-12, at 11:08 PM, Steve Kargl <sgk at troutmask.apl.washington.edu> wrote:
>
>> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote:
>>> On Fri, 12 May 2017, Steve Kargl wrote:
>>>> ...
>>>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from
>>>>     macro 'raise_inexact'
>>>> #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0)
>>>>                                           ^
>>>> Grepping catrig.o for the variable 'junk' suggests that 'junk' is
>>>> optimized out (with at least -O2).

It is easy to write unportable code that works perfectly.  On i386(i387):

#define	use(x)	__asm("" : : "t" (x))
#define	raise_inexact() use(1 + tiny)

looks cleaner except for the asm, and generates perfect code with fstp
%st(0) and no store of the result to memory.  Unfortunately, the "t"
(top of i387 stack) is too unportable.  "g" might be portable enough,
but generatates wose code that the volatile variable.

>>> Just another bug in clang.  Volatile variables cannot be optimized out
>>> (if they are accessed).
>>
>> Does this depend on scope?  'junk' is local to the do {...} while(0);
>> construct.  Can a compiler completely eliminate a do-nothing scoping
>> unit?  I don't know C well enough to know.  I do know what I have
>> observed in clang.
>
> [This note ignores other standards than C99/C11
> that might place other constraints. And I've done
> no checking of compiler results, I've just looked
> at a couple of the C standards.]
>
> Note: I've not looking to tiny's declaration. It
> may contribute in a way not covered below.
>
> Unfortunately the declarator in an init-declarator
> that has an initializer is not part of an
> expression. The rules for volatile are tied to uses
> in expressions, not to the declarator. (Which is a
> hole in the language definition as far as I can
> tell.)

But the very first mention of volatile in C99 (5.1.2.3 Program Execution
#1) says that "Accessing a volatile object ... [is a side effect]. ...
[All previous side effects shall be complete at certain sequence points.]"

It doesn't make any exceptions for auto objects.

Also, #3 explicitly says for side effects in expressions that the
implementation may optimize away the evaluation if it can determine
that the evaluation has no side effects, including by calling a function
or accessing a volatile object.  But here the compiler can't do that for
1 + tiny, since this expression does have side effects (perhaps modulo
pragma FENV_ACCESS).  This rule is redundant if not wrong.  The
implementation can always use the "as if" rule to avoid doing work
to produce nothing.  And according to #1, any access to a volatile
variable has side effects, so the compiler can never determine that an
evaluation involving volatile variables has no side effects.

So the correctness of the compiler using #3 to avoid the assignment
reduces to the standard breaking its own definition of volatile, and
then the compiler using the broken definition.

> There is one part of the wording that might mitigate
> this, tied to a full declarator having a sequence
> point at its end despite the declarator itself not
> being an expression, even if its initializer is
> one. There is another wording detail that might
> as well.

Surely the assignment gives a sequence point for initializers?  Actually,
this is not too clear.  I don't even like initialization in declarations,
partly because it obscure the order, and only wrote the code with an
initializer to get a 1-line macro.  It could be written as
"volatile float junk; junk = 1 + tiny;".  Also, the use() macro can
be written in C, with similar problems to the asm version, as
"#define use(x) do { volatile float junk; junk = x; } while (0)" or better
in gnuC as
"#define use(x) do { volatile __typeof(x) junk; junk = x; } while (0)".
This allows keeping the volatile hack and variations to make it work
(maybe just __unused) in 1 place.

#9 (Example 1) says that an implementation may make the volatile keyword
redundant, essentially by making volatile-memory non-magic.  I don't
like this.  It reduces the side effects of volatile to just the ordering
of accesses to volatiles relative to sequence points, but practical
implementations need much more than that.  This clause just says that
impractical implementations are allowed, but so does the "as if" rule.

#10 is much more of the same.

6.7.3 #6 says that accesses to a volatile-qualified object "may" have
side effects unknown to the implementation.

Misimplementations may still apply the "as if" rule and comform to this
clause weaselishly by knowing their own badness.  They just have to do
what is allowed in Example 1 to make volatile have no useful effect.
Then this clause is null.

> Still, overall it would seem safer to be sure there
> is an expression that references the volatile object,
> not having only its declarator. But I would not take
> even that as a guarantee under the C standards.

The standard seems a bit too weighted towards read accesses.  We
cold try writing to a non-volatile variable and reading back the
result as volatile using a *(volatile type_t *)&var hack.  But that
would give an unwanted extra memory access.

> It may seem a silly difference but:
>
> do { volatile float junk=1; junk+=tiny; } while(0)
>
> may well be a better way of writing the "must
> evaluate" part of the intent simply because
> junk is used in an expression. Also it has both read
> and write access, so is a little more "used". The
> sequence point before the assignment can help avoid
> compile-time evaluation as well.

That would give 1 more unwanted memory access (if it works normally):
- write 1 to junk
- read 1 from junk; add tiny (usually) in a register
- write result to junk.

> Details if you care. . .
>
> I used the C99 and C11 definitions here, I
> reference C11 section numbering but C99 agrees
> as I remember.
>
> 5.1.2.3 Program execution says:
>
> "Accessing a volatile object, modifying an object,
> modifying a file, or calling a function that does
> any of those operations are all side effects,
> which are changes in the state of the execution
> environment. Evaluation of an expression may
> produce side effects."
>
> Note that raising inexact does not fit in the
> definition of side effect as far as I can tell.
> So a compiler need not consider such a thing
> for side-effect issues if I understand right.

I think it does, modulo #pragma FENV_ACCESS.  Indeed, F.7.1 says it
does explicitly (and without Annex F, floating point can do almost
anything).  It says that when FENV_ACCESS is "on" (should be "ON"),
for FP operations that implicitly raise exception flags, these
changes to the FP state are treated as side effects which respect
sequence points [footnote 291].  The footnote wastes space to remind
the reader that optimizations are allowed when FENV_ACCESS is "off".

> [C11 specific wording:] "The presence of a
> sequence point between the evaluations of
> expressions A and B implies that every value
> computation and side effect associated with A
> is sequenced before every value compuation and
> side effect associated with B."
>
> [C99 is similar but is before the detailed
> "sequenced before" definition.]
>
> "An actual implementation need not evaluate part
> of an expression if it can deduce that its value
> is not used and that no needed side effects are
> produced (including any caused by calling a
> function or accessing a volatile object)."

I didn't expect any problems with volatile or sequence points.  With
FENV_ACCESS OFF, the compiler is free to ignore the side effect for
1+tiny, but with FENV_ACCESS broken in all available compilers, we
have to assume that the compiler doesn't ignore this side affect.
In practice, compilers do ignore it for (void)(1+tiny) with tiny
non-volatile, so we use a several volatile hacks.  Volatile for
tiny alone isn't enough...

> Can a accessing a volatile object ever be
> classified as having "no needed side effects"?
> More on this later. [Remember what "side effect"
> excludes, as noted earlier. So some consequences
> need not be considered by the compiler, all in
> the name of optimizations.]

...we need the write access to junk it to have side effects.  Since
tiny is volatile, 1+tiny has an unknown value even with FENV_ACCESS OFF.
Then we want the side effects for accessing junk to depend on the value,
so that the value must be calculated even though it it unused except for
its effects on the side effects.  This is fragile.

> 6.7.3 Type Qualifiers says:
>
> "An object that has volatile-qualified type . . .
> Therefore any expression referring to such as object
> shall be evaluated strictly according to the rules
> of the abstract machine, as described in 5.1.2.3.
> Furthermore, at every sequence point the value last
> stored in the object shall agree with that prescribed
> by the abstract machine, except as modified by the
> unknown factors mentioned previously. What constitutes
> an access to an object that has volatile-qualified
> type is implementation-defined."
>
> This part is mixed: what the sequence point wording
> giveth the last sentence taketh away. (More later.)

The implementation must work for memory mapped-devices since that is the
most important case for us.  Anything that reads or writes a value to a
memory-mapped device has lots of side effects that depend on the value.
So junk = 1 + tiny must load tiny if tiny is for a memory-mapped device,
evaluate 1+tiny to get a value to store, and do the store if junk is for
a memory-mapped device.  The compiler is doing too much optimization if
it "knows" that junk is not for a memory-mapped device because the compiler
allocated it on the stack.  The compiler allocated the static tiny in
ordinary memory too.  If volatile is broken for tiny, and FENV_ACCESS is
OFF or broken (unsupported) then the compiler is free to evaluate 1+tiny
as 1 at compile time, and similarly for later expressions involving the
result.  extern volatile usually prevents the compiler from knowing that
the variable is not for device memory.

> It also says in a note (134):
>
> "A volatile declaration may be used to describe an
> object corresponding to a memory-mapped input/output
> port or an object accessed by an asynchronously
> interrupting function. Actions on objects so declared
> shall, not be "optimized out" by an implementation
> or reordered except as permitted by the rules for
> evaluating expressions."

"so declared" must be read as simple "volatile", since there is no
declaration like "volatile memory mapped ..." though such declarations
would be very useful for kernels.

> Since rules for evaluating expressions are not rules
> for declarators (vs. initializers), this could be
> read as not allowing the "optimize out". (But the
> abstract machine's description is not explicit about
> declarators for such issues.)

It just allows all optimizations which the compiler can tell are safe.
But compilers can never tell.  Maybe the programmer mapped the stack
memory-mapped...  This is well outside the scope of the C abstract
machine, but would be just another hack for kernels.

> The C99 Rationale:
>
> The C99 Rationale was explicit about static
> volatile for a memory mapped I/O register,
> static const volatile for a memory mapped
> input port, const volatile and volatile
> for variables shared across processes. To
> some extent this identifies examples of
> contexts with "needed side effects" that
> have hardware details to take into account.
>
> For taking into account hardware details:
> ". . . Whatever decision are adopted on such
> issues must be documented, as volatile access
> is implementation-defined".
>
> For volatile use with no explicitly identified
> hardware details: volatile would appear to be
> no more than a potential hint for such a
> context, not an effective requirement. The
> implementation-defined status could allow lack
> of access.
>
> Overall, based on what I see in the C99 and
> C11 language definitions, I'd not be willing to
> declare clang wrong (if it did optimize out junk),
> even with my alternative formulation.
>
> C does not have an explicit Principle of Least
> Astonishment as a official guideline to its
> interpretation and the rules are very biased to
> allowing so-called optimizations. "junk" does not
> fit with being shared across processes (for
> example its address is not handed to anything)
> and is not static or even global. There is no
> known type of potential context for specific
> hardware details that would need to be taken
> into account for junk. That in turn leaves open
> not accessing it at all as far as I can tell.

Yes, it is only a hint, and the C standard would be improved by saying
just that, or requiring the strong meaning that is needed in practice.
The strong meaning is that accesses to volatile variables always have
side effects even if the implementation "knows" that the don't.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170514023721.O1230>