From owner-freebsd-standards Mon Feb 25 20:40:39 2002 Delivered-To: freebsd-standards@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id E8D7D37B405 for ; Mon, 25 Feb 2002 20:40:28 -0800 (PST) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id PAA32103; Tue, 26 Feb 2002 15:40:12 +1100 Date: Tue, 26 Feb 2002 15:40:36 +1100 (EST) From: Bruce Evans X-X-Sender: To: Bill Fenner Cc: Subject: Re: scanf(3) patches for review In-Reply-To: <200201300531.g0U5VEh48095@stash.attlabs.att.com> Message-ID: <20020226141642.A43061-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-standards@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 29 Jan 2002, Bill Fenner wrote: > Here are some long-standing scanf(3) patches, which add the new c99 > size modifiers to scanf(3). They've been languishing in my tree waiting Here is a belated review. > for me to get around to implementing %n$, but I am clearly not getting > to it so there's nothing to win by sitting on 'em. > > One thing that I haven't decided yet is whether it makes sense to > rewrite the size modifier narrative to a table in the same way as > I did for printf(3). Input is solicited. I didn't look at the mdoc parts. > Index: vfscanf.c > =================================================================== > RCS file: /home/ncvs/src/lib/libc/stdio/vfscanf.c,v > retrieving revision 1.19 > diff -u -r1.19 vfscanf.c > --- vfscanf.c 29 Nov 2001 03:03:55 -0000 1.19 > +++ vfscanf.c 30 Jan 2002 05:16:58 -0000 > @@ -45,6 +45,7 @@ > #include "namespace.h" > #include > #include > +#include > #include > #if __STDC__ > #include > @@ -52,6 +53,7 @@ > #include > #endif > #include > +#include > #include "un-namespace.h" > > #include "collate.h" Includes more disordered than before. > @@ -76,7 +78,11 @@ > #define SUPPRESS 0x08 /* suppress assignment */ > #define POINTER 0x10 /* weird %p pointer (`fake hex') */ > #define NOSKIP 0x20 /* do not skip blanks */ > -#define QUAD 0x400 > +#define LONGLONG 0x400 /* ll: long long (+ deprecated q: quad) */ > +#define INTMAXT 0x800 /* j: intmax_t */ > +#define PTRDIFFT 0x1000 /* t: ptrdiff_t */ > +#define SIZET 0x2000 /* z: size_t */ > +#define SHORTSHORT 0x4000 /* hh: char */ > > /* > * The following are used in numeric conversions only: Please change all the old comments to have the same format (e.g., "p: void * (`fake hex')" for POINTER. > @@ -98,13 +104,10 @@ > #define CT_CHAR 0 /* %c conversion */ > #define CT_CCL 1 /* %[...] conversion */ > #define CT_STRING 2 /* %s conversion */ > -#define CT_INT 3 /* integer, i.e., strtoq or strtouq */ > +#define CT_INT 3 /* integer, i.e., strtoimax or strtoumax */ > #define CT_FLOAT 4 /* floating, i.e., strtod */ I think naming the conversion function(s) is not as useful as naming all the format specifiers. > @@ -136,8 +139,8 @@ > int nassigned; /* number of fields assigned */ > int nconversions; /* number of conversions */ > int nread; /* number of characters consumed from fp */ > - int base; /* base argument to strtoq/strtouq */ > - u_quad_t(*ccfn)(); /* conversion function (strtoq/strtouq) */ > + int base; /* base argument to strtoimax/strtoumax */ > + uintmax_t(*ccfn)(); /* conversion function (strtoimax/strtoumax) */ Missing space after uintmax_t (from style bug in rev.1.11). Naming the conversion functions all over is even less useful than above. We should start fixing incomplete prototypes in libc. They are more common for function pointers like the above than for functions. > @@ -205,61 +225,49 @@ > > /* > * Conversions. > - * Those marked `compat' are for 4.[123]BSD compatibility. > - * > - * (According to ANSI, E and X formats are supposed > - * to the same as e and x. Sorry about that.) > */ > - case 'D': /* compat */ > - flags |= LONG; > - /* FALLTHROUGH */ Maybe abort() for obsolete formats. > case 'd': > c = CT_INT; > - ccfn = (u_quad_t (*)())strtoq; > + ccfn = (uintmax_t (*)())strtoimax; > base = 10; > break; I was unhappy with the corresponding global change from strto[u]l to strto[u]q in rev.1.11. It pessimized the usual case and changes the handling of overflow. scanf'ing the integer one larger than LONG_MAX using %ld used to give LONG_MAX and set errno = ERANGE, but it gives undefined behaviour (usually silent truncation to LONG_MIN). Rev.1.1 had the same problem with the integer one larger than INT_MIN if INT_MIN < LONG_MIN. I just learned that all these misbehaviours are standards conformant. The behaviour is undefined if the integer (parsed in the same way as strto[u]l execpt with infinite precsion) is too large to be representatable by the specified type. So we can parse all integers using strto[u]max() and blindly truncate them. No programs should notice the inefficiency for this, since no programs should use the scanf family, at least with integer formats, since there is no way to determine if the scan really worked :-). > #ifdef FLOATING_POINT > - case 'E': /* compat XXX */ > - case 'F': /* compat */ > - flags |= LONG; > - /* FALLTHROUGH */ > + case 'E': case 'F': case 'G': Oops, we can't abort in most cases, since the specifier is used for something else. > @@ -278,7 +289,7 @@ > case 'p': /* pointer format is like hex */ > flags |= POINTER | PFXOK; > c = CT_INT; > - ccfn = strtouq; > + ccfn = strtoumax; > base = 16; > break; > Strictly, pointers might need more precision than strtoumax(), and we might even have a function to support this if we had a machine that needed it, but the function wouldn't fit into the current framework. > ... > @@ -451,7 +465,7 @@ > continue; > > case CT_INT: > - /* scan an integer as if by strtoq/strtouq */ > + /* scan an integer as if by strtoimax/strtoumax */ As above about naming the conversion functions all over. The standard actually only says that the subject sequence is in the same format as expected by strtol(). > @@ -569,19 +583,27 @@ > (void) __ungetc(c, fp); > } > if ((flags & SUPPRESS) == 0) { > - u_quad_t res; > + uintmax_t res; > > *p = 0; > res = (*ccfn)(buf, (char **)NULL, base); This invokes undefined behaviour (calling through a function pointer with a type different from the original function). This can be fixed easily by throwing away the function pointer and all the dubious casts to it. Just set an UNSIGNED flag if we don't already have a suitable flag, and use it here as follows: if (flags & UNSIGNED) res = strtoumax(buf, NULL, base); else res = strtoimax(buf, NULL, base); Better code would use a a specialized function for each result type and not risk gratutous sign extension bugs for storing all the results in a common type. About 15 specialized functgions would be needed. This is too much for a function that should never be used, so I'm not seriously suggesting it :-). > if (flags & POINTER) > *va_arg(ap, void **) = > - (void *)(u_long)res; > + (void *)(uintptr_t)res; Strictly, uintmax_t is not suitable for storing pointers. The cast to uintptr_t is main to break warnings about this. Better code would use: *va_arg(ap, void **) = strtovoidstar(...); This is even simpler, except strtovoidstar() doesn't exist. > + else if (flags & SHORTSHORT) > + *va_arg(ap, char *) = res; Better code for this would not be as simple as for pointers, since it would require 2 elseif clauases instead of 1 (for the signed and unsigned cases). Similarly for all the other integer types except ptrdiff_t and size_t. I like most of your changes but didn't check them against the standard. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-standards" in the body of the message