Date: Sun, 7 Jun 1998 19:05:17 +0200 From: Eivind Eklund <eivind@yes.no> To: Gregory D Moncreaff <moncrg@ma.ultranet.com>, Matthew Hunt <mph@pobox.com>, Mike Smith <mike@smith.net.au>, hackers@FreeBSD.ORG Subject: Re: Irritating cpp feature Message-ID: <19980607190517.23293@follo.net> In-Reply-To: <002301bd9225$f1527480$804106d1@micron>; from Gregory D Moncreaff on Sun, Jun 07, 1998 at 11:07:07AM -0400 References: <002301bd9225$f1527480$804106d1@micron>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jun 07, 1998 at 11:07:07AM -0400, Gregory D Moncreaff wrote: > you can put anything you want in an #if 0/#endif block. > by definition, the preprocessor deletes such before the compiler > (which is the only thing that checks code syntax) > even sees it This is a common misconception. Your statement is completely false. It must be lexically correct - the relevant reference is 5.1.1.2 in the draft standard. I'm reproducing it here for your convenience (see specifically the difference between phase 3 and phase 4): 5.1.1.2 Translation phases The precedence among the syntax rules of translation is specified by the following phases.[5] 1. Physical source file multibyte characters are mapped to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Any multibyte source file character not in the basic source character set is replaced by the universal-character-name that designates that multibyte character.[6] Then, trigraph sequences are replaced by corresponding single-character internal representations. 2. Each instance of a backslash character immediately followed by a newline character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place. 3. The source file is decomposed into preprocessing tokens[7] and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined. 4. Preprocessing directives are executed, macro invocations are expanded, and pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal-character-name is produced by token concatenation (6.8.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted. 5. Each source character set member, escape sequence, and universal-character- name in character constants and string literals is converted to a member of the execution character set. 6. Adjacent character string literal tokens are concatenated and adjacent wide string literal tokens are concatenated. 7. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit. 8. All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment. ------ [5] Implementations must behave as if these separate phases occur, even though many are typically folded together in practice. [6] The process of handling extended characters is specified in terms of mapping to an encoding that uses only the basic source character set, and, in the case of character literals and strings, further mapping to the execution character set. In practical terms, however, any internal encoding may be used, so long as an actual extended character encountered in the input, and the same extended character expressed in the input as a universal-character-name (i.e., using the \U or \u notation), are handled equivalently. [7] As described in 6.1, the process of dividing a source file's characters into preprocessing tokens is context-dependent. For example, see the handling of < within a #include preprocessing directive. The compilation of a program such as Mike's specifically require a diagnostic; this is in 5.1.1.3 section 1. Now, can we all cut this discussion at this point? Thanks. Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980607190517.23293>