Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Jul 2014 18:27:20 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        bugzilla-noreply@freebsd.org
Cc:        freebsd-bugs@freebsd.org
Subject:   Re: [Bug 191674] New: [tests] printf("%tu", (intmax_t)-1) returns UINT64_MAX on i386, not UINT32_MAX
Message-ID:  <20140712145916.L2481@besplex.bde.org>
In-Reply-To: <bug-191674-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-191674-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 6 Jul 2014 bugzilla-noreply@freebsd.org wrote:

> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191674
> ...
> One of the testcases in tools/regression/lib/libc/stdio/test-printbasic.t tests
> out %tu with -1 and it fails because the testcase is correctly expecting
> UINT32_MAX, not UINT64_MAX. According to printf(3):
>
>         "
>         t                 ptrdiff_t      (see note)            ptrdiff_t *
>
>         Note: the t modifier, when applied to a o, u, x, or X conversion,
>         indicates that the argument is of an unsigned type equivalent in size
>         to a ptrdiff_t.
>         "

C99 says the same thing, except it spells the "equivalent" type
non-fuzzily as "corresponding unsigned integer type".  It isn't clear
if such a type must exist or what happens if it doesn't exist, but in
FreeBSD it exists on all arches.

>
> ptrdiff_t on i386 is int32_t (from /usr/include/x86/_types.h):
>
> 100 #ifdef  __LP64__
> 101 typedef __int64_t       __ptrdiff_t;            /* ptr1 - ptr2 */
> ...
> 109 #else
> 110 typedef __int32_t       __ptrdiff_t;
>
> So I would expect the value to be UINT32_MAX. This mismatches with the code in

Yes, on i386, the unsigned type corresponding to ptrdiff_t is u_int, so
%tu means the same as %u.

> lib/libc/stdio/vfprintf.c vs sys/x86/include/_types.h as intmax_t is always
> int64_t:
>
> 412 #define INTMAX_SIZE     (INTMAXT|SIZET|PTRDIFFT|LLONGINT)

This line has almost as bad logic as style.  All 4 types in the misformatted
misordered expression can have different sizes, and size_t and ptrdiff_t do
have different types on i386.

> 413 #define SJARG() \
> 414         (flags&INTMAXT ? GETARG(intmax_t) : \
> 415             flags&SIZET ? (intmax_t)GETARG(ssize_t) : \
> 416             flags&PTRDIFFT ? (intmax_t)GETARG(ptrdiff_t) : \
> 417             (intmax_t)GETARG(long long))

This starts almost OK by using the actual type ptrdiff_t GETARG() is
expanded to use va_arg().  The behaviour is strictly undefined in
many cases when the arg type not ptrdiff_t but just the corresponding
type, but the implementation of printf() depends on lots of undefined
behaviour like that.  (From C99: "[the behaviour is undefined unless
the types are compatible, or...]
          -- one type is a signed integer type, the  other  type  is
             the  corresponding unsigned integer type, and the value
             is representable in both types;"
So with everything 32 bits, va_arg(ap, ptrdiff_t) gives undefined
behaviour on -1 converted to the corresponding unsigned type
(value 0xFFFFFFFF) since the value 0xFFFFFFF is not representable
as a ptrdiff_t.  The above code in printf() gives this.  Similarly,
GET_ARG(uint32_t) gives undefined behaviour on the ptrdiff_t value
-1.  Similarly for printing the plain int -1 with the plain unsigned
format %u.  Except now the undefined behaviour is invoked by the
application.

Things mostly work due to 2's complement magic.  The implementation
depends on this to avoid translating %tu to %u on i386 and to %lu
on amd64.  If it did that, then the undefined behaviour would be
limited to the application, like for -1 with %u (the corresponding
printf format errors are %tu to print signed values like -1 and
%td to print unsigned values like 0xFFFFFFFF.  If the implementation
doesn't translate %tu to an unsigned type, then misprinting -1 using
%tu format might even avoid the undefined behaviour).

Things don't work when this is pushed too far.  Sign extension bugs
occur here:

@ 		case 'U':
@ 			flags |= LONGINT;
@ 			/*FALLTHROUGH*/
@ 		case 'u':
@ 			if (flags & INTMAX_SIZE)
@ 				ujval = UJARG();
@ 			else
@ 				ulval = UARG();
@ 			base = 10;
@ 			goto nosign;

The implementation wants to store even signed values in the unsigned
variables ujval and ulval (see the 'd' case for how it hacks on signed
values to make this sort of work.  It assumes 2's complement and more).
It is somewhat careful about sizes, but this is little more than a
micro-optimization for 32-bit arches.  On amd64, ulval has the same
size as ujval so the complications in the above are useless.  On i386,
ulval is smaller and code using it may be faster.  ulval is abused for
ints and the above code is probably just broken if ints are smaller than
longs, with much the same bugs as this %tu one.

Consider %tu on the unsigned value 0xFFFFFFFF.  (You have to spell the
value like this else behaviour is undefined due to bugs in the test
program.  The example in the summary:

>           Summary: [tests] printf("%tu", (intmax_t)-1) returns UINT64_MAX
>                    on i386, not UINT32_MAX

invokes lots of undefined behaviour.  First there is an arg size mismatch
(intmax_t is larger than ptrdiff_t on i386).  Then there is a sign mismatch,
so the behaviour is undefined on all arches.)  GETARG() reads the value
by abusing ptrdiff_t.  The behaviour is undefined, but by 2's complement
magic the result is (ptrdiff_t)0xFFFFFFFF = -1 (a plain int).  This is
assigned to ujval and becomes 0xFFFFFFFFFFFFFFFF (a uintmax_t).  Then
we just print this huge value.

Fixing INTMAX_SIZE (make it depend on the actual sizes) would probably
work.  Then the value of -1 would be assigned to ulval and become
0xFFFFFFFF, which is the correct value by 2's complement magic.

The signed %td case works by 2's complement magic even for the huge
value.  The implementation actually converts -1 to +1 in ujval and
sign = '-', not to the huge value in ujval.

There is no corresponding problem for %zu since size_t is unsigned
so 0xFFFFFFFF doesn't get corrupted to -1.

The kernel printf seems to have the same bug, but is simpler so is
easier to understand and fix:

@ 		case 't':
@ 			tflag = 1;
@ 			goto reswitch;
@ handle_nosign:
@ 			sign = 0;
@ 			if (jflag)
@ 				num = va_arg(ap, uintmax_t);
@ 			else if (qflag)
@ 				num = va_arg(ap, u_quad_t);
@ 			else if (tflag)
@ 				num = va_arg(ap, ptrdiff_t);

The type pun is easier to fix here.  There just needs to be a uptrdiff_t
type to plug in here.  This code already takes care with most of the
other types by having separate handle_nosign/handle_sign cases to do
not much more than select the correct type.

@ 			else if (lflag)
@ 				num = va_arg(ap, u_long);
@ 			...
@ 			goto number;
@ handle_sign:
@ 			if (jflag)
@ 				num = va_arg(ap, intmax_t);
@ 			else if (qflag)
@ 				num = va_arg(ap, quad_t);
@ 			else if (tflag)
@ 				num = va_arg(ap, ptrdiff_t);
@ 			else if (lflag)
@ 				num = va_arg(ap, long);
@ 			...

The kernel printf doesn't have the complication of using 2 unsigned types
to hold the value.  It uses uintmax_t num for everything.  This requires
more care with sign extension from 32 to 64 bits, and whatever is done
for that might work for sign extension from 16 (or 17 or 33) bits too.
However, for %tu, the above just converts 0xFFFFFFFFF first to -1 and
then to 0xFFFFFFFFFFFFFFFF, giving the same bug as in libc.  (This is
on i386; on amd64, the first conversion doesn't change the value.)

The fix here is to use uptrdiff_t for the unsigned case.  The fix in
libc would be messier.  The correct fix corresponds: expand GETARG()
to use the correct type in all cases and plug uintptr_t into one case
there.  The incorrect fix is to keep depending on the undefined
behaviour in GETARG() and fix up according to the size in all unsigned
%t cases (u, x, o, ...)

>> From sys/x86/include/_types.h:
> 91 typedef __int64_t       __intmax_t;
>
> Logically, I would expect this to be true IFF the i386 architecture was
> PAE-enabled.

No, PAE only expands physical memory.  It doesn't affect things like
ptrdiff_t or size_t size these are related to virtual memory.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140712145916.L2481>