Date: Sat, 12 Jul 2014 18:27:20 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: bugzilla-noreply@freebsd.org Cc: freebsd-bugs@freebsd.org Subject: Re: [Bug 191674] New: [tests] printf("%tu", (intmax_t)-1) returns UINT64_MAX on i386, not UINT32_MAX Message-ID: <20140712145916.L2481@besplex.bde.org> In-Reply-To: <bug-191674-8@https.bugs.freebsd.org/bugzilla/> References: <bug-191674-8@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 6 Jul 2014 bugzilla-noreply@freebsd.org wrote: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191674 > ... > One of the testcases in tools/regression/lib/libc/stdio/test-printbasic.t tests > out %tu with -1 and it fails because the testcase is correctly expecting > UINT32_MAX, not UINT64_MAX. According to printf(3): > > " > t ptrdiff_t (see note) ptrdiff_t * > > Note: the t modifier, when applied to a o, u, x, or X conversion, > indicates that the argument is of an unsigned type equivalent in size > to a ptrdiff_t. > " C99 says the same thing, except it spells the "equivalent" type non-fuzzily as "corresponding unsigned integer type". It isn't clear if such a type must exist or what happens if it doesn't exist, but in FreeBSD it exists on all arches. > > ptrdiff_t on i386 is int32_t (from /usr/include/x86/_types.h): > > 100 #ifdef __LP64__ > 101 typedef __int64_t __ptrdiff_t; /* ptr1 - ptr2 */ > ... > 109 #else > 110 typedef __int32_t __ptrdiff_t; > > So I would expect the value to be UINT32_MAX. This mismatches with the code in Yes, on i386, the unsigned type corresponding to ptrdiff_t is u_int, so %tu means the same as %u. > lib/libc/stdio/vfprintf.c vs sys/x86/include/_types.h as intmax_t is always > int64_t: > > 412 #define INTMAX_SIZE (INTMAXT|SIZET|PTRDIFFT|LLONGINT) This line has almost as bad logic as style. All 4 types in the misformatted misordered expression can have different sizes, and size_t and ptrdiff_t do have different types on i386. > 413 #define SJARG() \ > 414 (flags&INTMAXT ? GETARG(intmax_t) : \ > 415 flags&SIZET ? (intmax_t)GETARG(ssize_t) : \ > 416 flags&PTRDIFFT ? (intmax_t)GETARG(ptrdiff_t) : \ > 417 (intmax_t)GETARG(long long)) This starts almost OK by using the actual type ptrdiff_t GETARG() is expanded to use va_arg(). The behaviour is strictly undefined in many cases when the arg type not ptrdiff_t but just the corresponding type, but the implementation of printf() depends on lots of undefined behaviour like that. (From C99: "[the behaviour is undefined unless the types are compatible, or...] -- one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;" So with everything 32 bits, va_arg(ap, ptrdiff_t) gives undefined behaviour on -1 converted to the corresponding unsigned type (value 0xFFFFFFFF) since the value 0xFFFFFFF is not representable as a ptrdiff_t. The above code in printf() gives this. Similarly, GET_ARG(uint32_t) gives undefined behaviour on the ptrdiff_t value -1. Similarly for printing the plain int -1 with the plain unsigned format %u. Except now the undefined behaviour is invoked by the application. Things mostly work due to 2's complement magic. The implementation depends on this to avoid translating %tu to %u on i386 and to %lu on amd64. If it did that, then the undefined behaviour would be limited to the application, like for -1 with %u (the corresponding printf format errors are %tu to print signed values like -1 and %td to print unsigned values like 0xFFFFFFFF. If the implementation doesn't translate %tu to an unsigned type, then misprinting -1 using %tu format might even avoid the undefined behaviour). Things don't work when this is pushed too far. Sign extension bugs occur here: @ case 'U': @ flags |= LONGINT; @ /*FALLTHROUGH*/ @ case 'u': @ if (flags & INTMAX_SIZE) @ ujval = UJARG(); @ else @ ulval = UARG(); @ base = 10; @ goto nosign; The implementation wants to store even signed values in the unsigned variables ujval and ulval (see the 'd' case for how it hacks on signed values to make this sort of work. It assumes 2's complement and more). It is somewhat careful about sizes, but this is little more than a micro-optimization for 32-bit arches. On amd64, ulval has the same size as ujval so the complications in the above are useless. On i386, ulval is smaller and code using it may be faster. ulval is abused for ints and the above code is probably just broken if ints are smaller than longs, with much the same bugs as this %tu one. Consider %tu on the unsigned value 0xFFFFFFFF. (You have to spell the value like this else behaviour is undefined due to bugs in the test program. The example in the summary: > Summary: [tests] printf("%tu", (intmax_t)-1) returns UINT64_MAX > on i386, not UINT32_MAX invokes lots of undefined behaviour. First there is an arg size mismatch (intmax_t is larger than ptrdiff_t on i386). Then there is a sign mismatch, so the behaviour is undefined on all arches.) GETARG() reads the value by abusing ptrdiff_t. The behaviour is undefined, but by 2's complement magic the result is (ptrdiff_t)0xFFFFFFFF = -1 (a plain int). This is assigned to ujval and becomes 0xFFFFFFFFFFFFFFFF (a uintmax_t). Then we just print this huge value. Fixing INTMAX_SIZE (make it depend on the actual sizes) would probably work. Then the value of -1 would be assigned to ulval and become 0xFFFFFFFF, which is the correct value by 2's complement magic. The signed %td case works by 2's complement magic even for the huge value. The implementation actually converts -1 to +1 in ujval and sign = '-', not to the huge value in ujval. There is no corresponding problem for %zu since size_t is unsigned so 0xFFFFFFFF doesn't get corrupted to -1. The kernel printf seems to have the same bug, but is simpler so is easier to understand and fix: @ case 't': @ tflag = 1; @ goto reswitch; @ handle_nosign: @ sign = 0; @ if (jflag) @ num = va_arg(ap, uintmax_t); @ else if (qflag) @ num = va_arg(ap, u_quad_t); @ else if (tflag) @ num = va_arg(ap, ptrdiff_t); The type pun is easier to fix here. There just needs to be a uptrdiff_t type to plug in here. This code already takes care with most of the other types by having separate handle_nosign/handle_sign cases to do not much more than select the correct type. @ else if (lflag) @ num = va_arg(ap, u_long); @ ... @ goto number; @ handle_sign: @ if (jflag) @ num = va_arg(ap, intmax_t); @ else if (qflag) @ num = va_arg(ap, quad_t); @ else if (tflag) @ num = va_arg(ap, ptrdiff_t); @ else if (lflag) @ num = va_arg(ap, long); @ ... The kernel printf doesn't have the complication of using 2 unsigned types to hold the value. It uses uintmax_t num for everything. This requires more care with sign extension from 32 to 64 bits, and whatever is done for that might work for sign extension from 16 (or 17 or 33) bits too. However, for %tu, the above just converts 0xFFFFFFFFF first to -1 and then to 0xFFFFFFFFFFFFFFFF, giving the same bug as in libc. (This is on i386; on amd64, the first conversion doesn't change the value.) The fix here is to use uptrdiff_t for the unsigned case. The fix in libc would be messier. The correct fix corresponds: expand GETARG() to use the correct type in all cases and plug uintptr_t into one case there. The incorrect fix is to keep depending on the undefined behaviour in GETARG() and fix up according to the size in all unsigned %t cases (u, x, o, ...) >> From sys/x86/include/_types.h: > 91 typedef __int64_t __intmax_t; > > Logically, I would expect this to be true IFF the i386 architecture was > PAE-enabled. No, PAE only expands physical memory. It doesn't affect things like ptrdiff_t or size_t size these are related to virtual memory. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140712145916.L2481>