From owner-freebsd-current@FreeBSD.ORG  Thu Oct 30 03:36:56 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E5A3E16A4CE
	for <current@FreeBSD.org>; Thu, 30 Oct 2003 03:36:56 -0800 (PST)
Received: from mailhub.fokus.fraunhofer.de (mailhub.fokus.fraunhofer.de
	[193.174.154.14])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7A9D943FAF
	for <current@FreeBSD.org>; Thu, 30 Oct 2003 03:36:55 -0800 (PST)
	(envelope-from brandt@fokus.fraunhofer.de)
Received: from beagle (beagle [193.175.132.100])h9UBWkP04683;
	Thu, 30 Oct 2003 12:32:46 +0100 (MET)
Date: Thu, 30 Oct 2003 12:32:46 +0100 (CET)
From: Harti Brandt <brandt@fokus.fraunhofer.de>
To: Terry Lambert <tlambert2@mindspring.com>
In-Reply-To: <3FA0EEFD.431DD759@mindspring.com>
Message-ID: <20031030120925.K80335@beagle.fokus.fraunhofer.de>
References: <BAEB9CED-091F-11D8-B483-000393BB9222@queasyweasel.com> 
	<3F9F4FE6.29C4E178@mindspring.com><3FA0EEFD.431DD759@mindspring.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: Jordan K Hubbard <jkh@queasyweasel.com>
cc: current@FreeBSD.org
Subject: Re: Anyone object to the following change in libc?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Oct 2003 11:36:57 -0000

On Thu, 30 Oct 2003, Terry Lambert wrote:

TL>Harti Brandt wrote:
TL>> TL>Paragraph 6 of:
TL>> TL>
TL>> TL>     http://www.opengroup.org/onlinepubs/007904975/functions/sscanf.html
TL>> TL>
TL>> TL>Implies that the lack of characters in the string following the
TL>> TL>conversion, due to failure in assignment, should result in an
TL>> TL>"Input failure".  Note also that stdio.h defines EOF as -1.
TL>>
TL>> I fail to locate this paragraph. This interpretation would also imply
TL>> that scanf() always needs to return -1 whenever it cannot match a format
TL>> specifier.
TL>
TL>	The fscanf() functions shall execute each directive of the
TL>	format in turn. If a directive fails, as detailed below, the
TL>	function shall return. Failures are described as input
TL>	failures (due to the unavailability of input bytes) or
TL>	matching failures (due to inappropriate input).
TL>
TL>It comes down to how you interpret the NUL byte at the end of the
TL>sscanf() input string.  Is it an EOF?  Or is it an unavailability of
TL>input bytes?  The answer to the question picks which return value
TL>is correct.

Section 7.19.6.7 of N843 states:

"Reaching the end of the string is equivalent to encountering end-of-file
for the fscanf function."

Unfortunately this is missing in POSIX, but obviously implied by their
reference to ISO.

The next paragraph states:

"The sscanf function returns the value of the macro EOF if an input
failure occurs before any conversion."

Again: do we have a conversion? We have! Should we return EOF? No.

TL>
TL>
TL>> TL>I think it can be interpreted either way, still.
TL>>
TL>> You miss the section about RETURN VALUE: EOF is return on a read error.
TL>> This is not an input error.
TL>
TL>How do I distinguish a "return value is -1 as an error result" from
TL>"return value is -1 as an EOF result"?

Well, I suppose that's the intention of having scanf() setting errno
when it returns -1 in POSIX. Unfortunately POSIX fails to describe
the error codes. This is possibly fodder for the aardvark.

TL>
TL>
TL>> You should also read the very 1st paragraph. This clearly states, that
TL>> ISO is the primary source of information and the ISO text is a lot
TL>> cleaner.
TL>
TL>No, that's not what it actually states; here's the paragraph:
TL>
TL>	The functionality described on this reference page is
TL>	aligned with the ISO C standard. Any conflict between
TL>	the requirements described here and the ISO C standard
TL>	is unintentional. This volume of IEEE Std 1003.1-2001
TL>	defers to the ISO C standard.
TL>
TL>It says that any conflicts are unintentional, and their intent was
TL>to use different language for no good reason, rather than just
TL>copying it verbatim and removing any doubt.  It does *NOT* say
TL>that no conflicts exist.

Yes. But I take the last sentence to mean that ISO-C takes over in the
case a conflict exists.

TL>
TL>Also: In this context, which is IEEE 1003.1-2001, Issue 6, "the
TL>ISO C standard" refers to "c89", which is the version of the C
TL>standard that was in effect at the time that SVID IV was defined.

Line 107 of Austin TC-1:

"The c89 utility (which specified a compiler for the C Language specified
by the 108 ISO/IEC 9899: 1990 standard) has been replaced by a c99 utility
(which specifies a compiler for 109 the C Language specified by the
ISO/IEC 9899: 1999 standard)."

TL>If you need clarification on this issue, you should download the
TL>currently available version of the NIST/PCTS, which specifically
TL>requires you to compile with a c89 compiler, not one more recent.
TL>The same is true of The Open Group test suites which are available
TL>on the Internet.
TL>
TL>The version of the ISO C standard you are quoting from is *NOT*
TL>the c89 version.

Our sscanf() claims conformance to C99. So if we change the behaviour
we have to remove this claim.

TL>This makes interpretation ambiguous, since the test you are
TL>specifically referencing to get the 0 result is text that was
TL>added to the next version of the standard to clarify it.
TL>
TL>
TL>> I think it makes no sense to classify
TL>>
TL>> sscanf("123", "%*d%d", ...
TL>>
TL>> as an error, but
TL>>
TL>> sscanf("123", "%d%d", ...
TL>>
TL>> not, does it? Also at least Solaris 9 return -1 but fails to set
TL>> errno. Which is simply a bug.
TL>
TL>It makes no sense to do conversions without assignment in the
TL>first place (IMO).

[... Stuff about sense removed (I was talking about what return
code makes sense, not whether calling sscanf makes sense) ...]

TL>In any case, we are practically guaranteed that returning -1, as
TL>all other UNIX-like OS's currently do, would result in less source
TL>code breaking.

No coder in his right mind should have written code that depends
on this behaviour given the moot formulations in the classical books,
man pages and pre-C99 standards. Also note, that the reason for
this change request was that configuration scripts break, not applications.
If applications break they should be fixed.

TL>In other words, conformance level has historically been dictated
TL>by what code is not broken, not what is technically permitted by
TL>the standards, if you language-lawyer them to death.
TL>
TL>To put it in IETF terms: "Be conservative in what you generate,
TL>and generous in what you accept".

This does not apply here because you cannot return -1 and 0 at the same
time. Adhering to a cleanly written standard and breaking a handful of
badly written autoconf scripts is clearly better than adhering to
undocumented historical behaviour. What will we do if Solaris 10
returns 0 in the above case? Change our code back?

harti
-- 
harti brandt,
http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private
brandt@fokus.fraunhofer.de, harti@freebsd.org