FreeBSD Mail Archives

Date:      Sun, 7 Jun 1998 19:05:17 +0200
From:      Eivind Eklund <eivind@yes.no>
To:        Gregory D Moncreaff <moncrg@ma.ultranet.com>, Matthew Hunt <mph@pobox.com>, Mike Smith <mike@smith.net.au>, hackers@FreeBSD.ORG
Subject:   Re: Irritating cpp feature
Message-ID:  <19980607190517.23293@follo.net>
In-Reply-To: <002301bd9225$f1527480$804106d1@micron>; from Gregory D Moncreaff on Sun, Jun 07, 1998 at 11:07:07AM -0400
References:  <002301bd9225$f1527480$804106d1@micron>

On Sun, Jun 07, 1998 at 11:07:07AM -0400, Gregory D Moncreaff wrote:
> you can put anything you want in an #if 0/#endif block.
> by definition, the preprocessor deletes such before the compiler
> (which is the only thing that checks code syntax)
> even sees it

This is a common misconception.  Your statement is completely false.

It must be lexically correct - the relevant reference is 5.1.1.2 in
the draft standard.  I'm reproducing it here for your convenience (see
specifically the difference between phase 3 and phase 4):

5.1.1.2 Translation phases

The precedence among the syntax rules of translation is specified by
the following phases.[5]

1.  Physical source file multibyte characters are mapped to the source
character set (introducing new-line characters for end-of-line
indicators) if necessary.  Any multibyte source file character not in
the basic source character set is replaced by the
universal-character-name that designates that multibyte character.[6]
Then, trigraph sequences are replaced by corresponding
single-character internal representations.

2.  Each instance of a backslash character immediately followed by a
newline character is deleted, splicing physical source lines to form
logical source lines.  Only the last backslash on any physical source
line shall be eligible for being part of such a splice.  A source file
that is not empty shall end in a new-line character, which shall not
be immediately preceded by a backslash character before any such
splicing takes place.

3.  The source file is decomposed into preprocessing tokens[7] and
sequences of white-space characters (including comments).  A source
file shall not end in a partial preprocessing token or comment.  Each
comment is replaced by one space character.  New-line characters are
retained.  Whether each nonempty sequence of white-space characters
other than new-line is retained or replaced by one space character is
implementation-defined.

4.  Preprocessing directives are executed, macro invocations are
expanded, and pragma unary operator expressions are executed.  If a
character sequence that matches the syntax of a
universal-character-name is produced by token concatenation (6.8.3.3),
the behavior is undefined.  A #include preprocessing directive causes
the named header or source file to be processed from phase 1 through
phase 4, recursively.  All preprocessing directives are then deleted.

5.  Each source character set member, escape sequence, and
universal-character- name in character constants and string literals
is converted to a member of the execution character set.

6.  Adjacent character string literal tokens are concatenated and
adjacent wide string literal tokens are concatenated.

7.  White-space characters separating tokens are no longer
significant.  Each preprocessing token is converted into a token.  The
resulting tokens are syntactically and semantically analyzed and
translated as a translation unit.

8.  All external object and function references are resolved.  Library
components are linked to satisfy external references to functions and
objects not defined in the current translation.  All such translator
output is collected into a program image which contains information
needed for execution in its execution environment.

------

[5] Implementations must behave as if these separate phases occur,
even though many are typically folded together in practice.

[6] The process of handling extended characters is specified in terms
of mapping to an encoding that uses only the basic source character
set, and, in the case of character literals and strings, further
mapping to the execution character set.  In practical terms, however,
any internal encoding may be used, so long as an actual extended
character encountered in the input, and the same extended character
expressed in the input as a universal-character-name (i.e., using the
\U or \u notation), are handled equivalently.

[7] As described in 6.1, the process of dividing a source file's
characters into preprocessing tokens is context-dependent.  For
example, see the handling of < within a #include preprocessing
directive.

The compilation of a program such as Mike's specifically require a
diagnostic; this is in 5.1.1.3 section 1.

Now, can we all cut this discussion at this point?  Thanks.

Eivind.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980607190517.23293>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation