Date: Fri, 4 Sep 2009 13:42:04 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no> Cc: svn-src-head@freebsd.org, Andrey Chernov <ache@nagual.pp.ru>, svn-src-all@freebsd.org, src-committers@freebsd.org, "Simon L. Nielsen" <simon@freebsd.org> Subject: Re: svn commit: r196752 - head/lib/libc/stdtime Message-ID: <20090904115255.Q48987@delplex.bde.org> In-Reply-To: <86zl9c9z05.fsf@ds4.des.no> References: <200909020456.n824uUqQ082136@svn.freebsd.org> <20090902070808.GA1290@arthur.nitro.dk> <20090902084002.GA17325@nagual.pp.ru> <867hwgcwvo.fsf@ds4.des.no> <20090903084325.GA65192@nagual.pp.ru> <86zl9c9z05.fsf@ds4.des.no>
next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-307050607-1252035724=:48987 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Thu, 3 Sep 2009, [utf-8] Dag-Erling Sm=C3=B8rgrav wrote: > Andrey Chernov <ache@nagual.pp.ru> writes: >> Thanx for detailed explanation. > > np; I've made that mistake many times myself. Andrey doesn't make that mistake many times (at least this millenium), but fixes it a lot. > What do you think of the attached patch? Why not use the same text as POSIX? I don't like repeating this ad nauseum, but POSIX does. (In POSIX.1-2001-draft7, this repetition is responsible for 1/3 of the lines matching of "unsigned char".) C99 says this only once for ctype functions, at the beginning of the section on <ctype.h>. The patch is missing the corresponding text for wctype functions (argmuments of type wint_t generally give undefined behaviour unless their value is representable as a wchar_t or equal to the value of WEOF). In FreeBSD, wint_t has the same type as wchar_t and that type is int, so bugs in this area are latent (unportable code might run on FreeBSD, and there might be problems with the sign bit and/or with WEOF being indistinguishable from a valid wide char encoding. gcc/config/i386 has some targets with 16-bit wide chars, and these have to be more careful about both. The non-broken ones use uint16_t for wchar_t and int for wint_t)> % Index: lib/libc/locale/iscntrl.3 % =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D % --- lib/libc/locale/iscntrl.3=09(revision 196695) % +++ lib/libc/locale/iscntrl.3=09(working copy) % @@ -32,7 +32,7 @@ % .\" @(#)iscntrl.3=098.1 (Berkeley) 6/4/93 % .\" $FreeBSD$ % .\" % -.Dd July 17, 2005 % +.Dd September 3, 2009 % .Dt ISCNTRL 3 % .Os % .Sh NAME % @@ -65,6 +65,15 @@ % .It "\&031\ EM \t032\ SUB \t033\ ESC \t034\ FS \t035\ GS" % .It "\&036\ RS \t037\ US \t177\ DEL" % .El % +.Pp % +.Em NOTE : % +if the value passed to the % +.Fn iscntrl % +function is a % +.Vt signed char , % +as is usually the case, it must be cast to an % +.Vt unsigned char % +to avoid sign-extension errors. This wording is poor. It should be something like "if the value to be passed is represented as a signed char" ... I don't know a good easy way to fix "must be cast ... to avoid sign-extension errors". The value must be converted to one representable as an unsigned char to work, but that is not always possible, and blindly casting may give a wrong value. In most cases, it is an error to represent the value as a signed char to begin with, and better to cast a pointer from "char *" to "u_char": =09char *p;=09=09=09/* Already a bad type. */ =09... isalpha(*p);=09=09/* Wrong since *p might be < 0. */ =09... isalpha((u_char)*p);=09/* Mishandles exotic machines. */ =09... isalpha(*(u_char *)p);=09/* Mishandles diff. exotic machines. */ "unsigned char" is spelled u_char in KNF. Here this makes the above lines barely fit in 80 columns before quoting. Exotic machines include: - simple ones complement with no trap representations, signed chars and CHAR_BIT =3D 8. I think '\xff' is -0. Casting this to u_char gives 0, which is probably not what is wanted, while casting the pointer to it gives 0xff (if the bits are there in the memory copy), which is probably what is wanted. - ones with chars being signed and signed chars having a larger range than unsigned ones, or with u_char of a different size than char (not sure if the latter is allowed). Then casting the pointer will give a reasonable value, though possibly not the wanted one (the difficulty is more in advancing the pointer), while blindly casting the value may corrupt many more values than -0. - ones with trap representations, instead of or in addition to the above complications. Now *p will trap on trap reps but there should be no trap reps in valid data, while casting the pointer will always give a reasonable value, possibly including trap bits. Sign extension for passing a signed char is not an error. The error is passing a negative value. Neither C99 nor POSIX gives any advice. They just say that the behaviour is undefined if the value is not representable as an unsigned char or equal to the value of EOF. % Index: lib/libc/locale/isrune.3 % =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D % --- lib/libc/locale/isrune.3=09(revision 196695) % +++ lib/libc/locale/isrune.3=09(working copy) % @@ -25,7 +25,7 @@ % .\" % .\" $FreeBSD$ % .\" % -.Dd March 30, 2004 % +.Dd September 3, 2009 % .Dt ISRUNE 3 % .Os % .Sh NAME % @@ -46,6 +46,15 @@ % .Tn ASCII % character set, this is equivalent to % .Fn isascii . % +.Pp % +.Em NOTE : % +if the value passed to the % +.Fn isrune % +function is a % +.Vt signed char , % +as is usually the case, it must be cast to an % +.Vt unsigned char % +to avoid sign-extension errors. % .Sh RETURN VALUES % The % .Fn isrune This seems to be wrong. isascii() works on all int args, so I assume isrune() does too. isrune.3 also misdescribes its arg as "the character" -- this prejudges the arg. % Index: lib/libc/locale/digittoint.3 % =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D % --- lib/libc/locale/digittoint.3=09(revision 196695) % +++ lib/libc/locale/digittoint.3=09(working copy) % @@ -28,7 +28,7 @@ % .\"=09@(#)digittoint.3=098.1 (Berkeley) 6/4/93 % .\" $FreeBSD$ % .\" % -.Dd April 6, 2001 % +.Dd September 3, 2009 % .Dt DIGITTOINT 3 % .Os % .Sh NAME % @@ -46,6 +46,15 @@ % function converts a numeric character to its corresponding integer value= =2E ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ % The character can be any decimal digit or hexadecimal digit. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ % With hexadecimal characters, the case of the values does not matter. % +.Pp % +.Em NOTE : % +if the value passed to the % +.Fn digittoint % +function is a % +.Vt signed char , % +as is usually the case, it must be cast to an % +.Vt unsigned char % +to avoid sign-extension errors. % .Sh RETURN VALUES % The % .Fn digittoint Here the behaviour already seems to be undefined unless the arg is a decimal or hex digit. Like isascii(), this function isn't in C99, so it can be more or less strict than other ctype functions. % Index: lib/libc/locale/isascii.3 % =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D % --- lib/libc/locale/isascii.3=09(revision 196695) % +++ lib/libc/locale/isascii.3=09(working copy) % @@ -28,7 +28,7 @@ % .\" @(#)isascii.3=098.2 (Berkeley) 12/11/93 % .\" $FreeBSD$ % .\" % -.Dd October 6, 2002 % +.Dd September 3, 2009 % .Dt ISASCII 3 % .Os % .Sh NAME % @@ -47,6 +47,15 @@ % .Tn ASCII % character, which is any character % between 0 and octal 0177 inclusive. % +.Pp % +.Em NOTE : % +if the value passed to the % +.Fn isascii % +function is a % +.Vt signed char , % +as is usually the case, it must be cast to an % +.Vt unsigned char % +to avoid sign-extension errors. % .Sh SEE ALSO % .Xr ctype 3 , % .Xr iswascii 3 , This is wrong -- see above. POSIX specifically says that "The isascii() function is defined on all integer values". Please check POSIX for any other special cases -- I only remembered this one and passed most of your other changes since most of the functions are in C99. % Index: lib/libc/locale/toascii.3 % =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D % --- lib/libc/locale/toascii.3=09(revision 196695) % +++ lib/libc/locale/toascii.3=09(working copy) % @@ -28,7 +28,7 @@ % .\"=09@(#)toascii.3=098.1 (Berkeley) 6/4/93 % .\" $FreeBSD$ % .\" % -.Dd June 4, 1993 % +.Dd September 3, 2009 % .Dt TOASCII 3 % .Os % .Sh NAME % @@ -45,6 +45,15 @@ % .Fn toascii % function strips all but the low 7 bits from a letter, % including parity or other marker bits. % +.Pp % +.Em NOTE : % +if the value passed to the % +.Fn toascii % +function is a % +.Vt signed char , % +as is usually the case, it must be cast to an % +.Vt unsigned char % +to avoid sign-extension errors. % .Sh RETURN VALUES % The % .Fn toascii This seems wrong too. POSIX doesn't say anything specific for it (neither the restriction nor lack of it). It just says that the result is <arg> and 0x7f where the above says "strips all but the low 7 bits from a letter". Another old bug is the above saying "letter" -- toascii() always worked on at least ASCII non-letters. % Index: lib/libc/locale/toupper.3 % =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D % --- lib/libc/locale/toupper.3=09(revision 196695) % +++ lib/libc/locale/toupper.3=09(working copy) % @@ -32,7 +32,7 @@ % .\"=09@(#)toupper.3=098.1 (Berkeley) 6/4/93 % .\" $FreeBSD$ % .\" % -.Dd July 17, 2005 % +.Dd September 3, 2009 % .Dt TOUPPER 3 % .Os % .Sh NAME % @@ -53,6 +53,15 @@ % .Vt "unsigned char" % or the value of % .Dv EOF . % +.Pp % +.Em NOTE : % +if the value passed to the % +.Fn toupper % +function is a % +.Vt signed char , % +as is usually the case, it must be cast to an % +.Vt unsigned char % +to avoid sign-extension errors. % .Sh RETURN VALUES % If the argument is a lower-case letter, the % .Fn toupper toupper.2 already says that "the value of the argument is representable as an unsigned char or the value EOF" (it should say "the argument is representable as an unsigned char or equal to the value of EOF"). The new text, if any, should be mixed with this. Presumably similarly for tolower (I didn't noticed these problems earlier). Bruce --0-307050607-1252035724=:48987--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090904115255.Q48987>