Date: Thu, 30 Oct 2003 02:59:09 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Harti Brandt <brandt@fokus.fraunhofer.de> Cc: current@FreeBSD.org Subject: Re: Anyone object to the following change in libc? Message-ID: <3FA0EEFD.431DD759@mindspring.com> References: <BAEB9CED-091F-11D8-B483-000393BB9222@queasyweasel.com> <3F9F4FE6.29C4E178@mindspring.com> <20031029093649.M72850@beagle.fokus.fraunhofer.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Harti Brandt wrote: > TL>Paragraph 6 of: > TL> > TL> http://www.opengroup.org/onlinepubs/007904975/functions/sscanf.html > TL> > TL>Implies that the lack of characters in the string following the > TL>conversion, due to failure in assignment, should result in an > TL>"Input failure". Note also that stdio.h defines EOF as -1. > > I fail to locate this paragraph. This interpretation would also imply > that scanf() always needs to return -1 whenever it cannot match a format > specifier. The fscanf() functions shall execute each directive of the format in turn. If a directive fails, as detailed below, the function shall return. Failures are described as input failures (due to the unavailability of input bytes) or matching failures (due to inappropriate input). It comes down to how you interpret the NUL byte at the end of the sscanf() input string. Is it an EOF? Or is it an unavailability of input bytes? The answer to the question picks which return value is correct. > TL>I think it can be interpreted either way, still. > > You miss the section about RETURN VALUE: EOF is return on a read error. > This is not an input error. How do I distinguish a "return value is -1 as an error result" from "return value is -1 as an EOF result"? > You should also read the very 1st paragraph. This clearly states, that > ISO is the primary source of information and the ISO text is a lot > cleaner. No, that's not what it actually states; here's the paragraph: The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of IEEE Std 1003.1-2001 defers to the ISO C standard. It says that any conflicts are unintentional, and their intent was to use different language for no good reason, rather than just copying it verbatim and removing any doubt. It does *NOT* say that no conflicts exist. Also: In this context, which is IEEE 1003.1-2001, Issue 6, "the ISO C standard" refers to "c89", which is the version of the C standard that was in effect at the time that SVID IV was defined. If you need clarification on this issue, you should download the currently available version of the NIST/PCTS, which specifically requires you to compile with a c89 compiler, not one more recent. The same is true of The Open Group test suites which are available on the Internet. The version of the ISO C standard you are quoting from is *NOT* the c89 version. This makes interpretation ambiguous, since the test you are specifically referencing to get the 0 result is text that was added to the next version of the standard to clarify it. > I think it makes no sense to classify > > sscanf("123", "%*d%d", ... > > as an error, but > > sscanf("123", "%d%d", ... > > not, does it? Also at least Solaris 9 return -1 but fails to set > errno. Which is simply a bug. It makes no sense to do conversions without assignment in the first place (IMO). Also, it makes no sense to call sscanf() with a string with too few arguments, considering that you are providing the arguments to it in the first place. You are effectively using sscanf() to validate an ambiguous set of data as part of its operation. I'm not sure that this is reasonable to do. Specifically, none of the referenced standards expects this to happen with sscanf(), since they do not define, specifically, how the end of the input string should be interpreted: EOF vs. unavailability of input bytes. One could argue that an unavailability of matching input bytes results only from the separator character(s) between format strings not being matched properly. At that point, "%d%d" (or "%*d%d") is a non-sensical format specifier entirely, since any characters that would be valid for input to the second specifier would also be valid for input to the first: and the matching is, by definition, greedy. Really, this is a problem which has occurred because you are not using fscanf() or scanf() on the input stream, instead of doing some conversion into an internal buffer, presumably to avoid a buffer overflow and/or bitch about the standards being specified inadequately in comp.lang.c, or on current@freebsd.org. In other words, overly anal buffer overflow checking, rather than specifying the buffer length in the format string. In terms of standards conformance, I'd like to see the output of a conformance test suite for ISO C (any version) complaining about the -1 return. I think IEEE 1003.1-2001 conformance is probably more important, if we have to pick one or the other on the basis of what sscanf() is going to return in this manufactured problem case. I'd also like to point out that the compiler we are using permits the standards conformance version to be chosen at compile time, but routines like sscanf(), unless they are inlined in header files, are not conditionally selectable based on the version at compile time. Further, it's quite possible that version conformance, even if it were specifiable at compile time, is not specifiable at link time, so moving the function into an inline would be the only viable approach to dealing with this issue in multiple libraries, each of which expects a different version, but which must be linked into a single program at the end of things in order to get an applicaiton using libraries with different expectations. So it's pretty stupid for a language standard to specify anything other than language syntax (e.g. things like library behaviour). In any case, we are practically guaranteed that returning -1, as all other UNIX-like OS's currently do, would result in less source code breaking. Finally, I will point to the current FreeBSD precedents in this matter, which is the TCP/IP RFC conformance for 1644 and 1323, which were defaulted to "off", after it broke a lot of existing code (and Livingston Portmaster terminal servers), and select(2) not modifying the contents of the timeval struct to provide an accurate value for the remaining timeout prior to the select coming true or a signal being received. In other words, conformance level has historically been dictated by what code is not broken, not what is technically permitted by the standards, if you language-lawyer them to death. To put it in IETF terms: "Be conservative in what you generate, and generous in what you accept". -- Terry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FA0EEFD.431DD759>