Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Feb 2016 00:52:15 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        Roman Divacky <rdivacky@vlakno.cz>, Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, FreeBSD Toolchain <freebsd-toolchain@freebsd.org>
Subject:   Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc
Message-ID:  <D7D536A4-68B6-4506-BDFB-8C2C41E1C958@dsl-only.net>
In-Reply-To: <70B405C4-E1AC-4F35-9786-051FDA2F8BE7@dsl-only.net>
References:  <F6846682-10F7-4D0D-A691-ED8D4366805C@dsl-only.net> <20160214192903.GA96697@vlakno.cz> <70B405C4-E1AC-4F35-9786-051FDA2F8BE7@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help

I'm top posting as the following can stand on its own fairly well.

On Sun Feb 14 23:46:14 UTC 2016 Nathan Whitehorn wrote:

> On 02/14/16 14:34, Mark Millard wrote:
> > clang's code base is not familiar material for me nor do I have =
solid=20
> > reference material for the FreeBSD TARGET_ARCH=3Dpowerpc ABI rules =
so=20
> > the below has my guess work involved. The following code appears to=20=

> > have hard wired a global, unvarying constant (8) into the test for=20=

> > picking UsingRegs vs. UsingOverflow.
>=20
> For reference, we use the standard ELF ABI=20
> (https://uclibc.org/docs/psABI-ppc.pdf).
> -Nathan

Reviewing the Parameter Passing material in that document shows that the =
problem is in the original specification.

And there is a more modern specification that has a fix in its wording. =
(Which shows that I'm not likely to be wrong.) I'll reference and quote =
it later.

First I'll explain the problem that is in psABI-ppc.pdf (the old SunSoft =
1995 document).

First a numbering point: psABI-ppc.pdf uses "gr" matching the numeral in =
r3, r4, . . . , r10, starting at r3 (i.e, 3). And gr indicates the next =
register to be used, not the last one already used.

The document splits the algorithm for placement of parameters into 3 =
stages with the following structure, intended as they have it in the =
document but various less interesting details for my "8byte then 4byte" =
example omitted:

> INITIALIZING:
>      Set fr=3D1, gr=3D3, and starg to the address of
>      parameter word 1.
> SCAN:
>      If there are no more arguments, terminate.
>      Otherwise, select one of the following
>      depending on the type of the next argument:
>=20
>      DOUBLE_OR_FLOAT
>         If fr>8 ( . . .), go to OTHER. Otherwise,
>         . . .
>=20
>      SIMPLE_ARG
>         If gr>10, go to OTHER. Otherwise, load the
>         argument value into general register gr,
>         set gr to gr+1, can goto SCAN. . . .
>=20
>      LONG_LONG
>         If gr>9, go to OTHER. Otherwise, . . .
>=20
> OTHER:
>        Arguments not otherwise handled above are
>        passed in the parameter words of the
>        caller=E2=80=99s stack frame. . . . Set starg to
>        starg+size, then go to SCAN.

Note that gr is not incremented by LONG_LONG or by the later OTHER usage =
when gr>9. (That would be my example's 8 byte integer that is later =
followed by a 4 byte one.)

That OTHER's "go to SCAN" would then lead to the following 4 byte =
integer in my example to be put in r10 and gr then being set to 11 =
instead of it being stored in a parameter word on the stack.

The nasty thing about this for va_list/va_arg use is that the stored =
information does not indicate which was before vs. after in the argument =
order: the 4 byte r10 content or the 8 byte "OTHER" content: the two =
orders produce identical results.

This can not be correct.

The Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf is more modern and =
explicitly deals with VR and other modern things. (Its terminology =
matching LONG_LONG above is DUAL_GP.) But for what I'm dealing with here =
it has the following extra wording at the very end of its OTHER section:

> If gr>9 and the type is DUAL_GP ,or . . ., or . . ., then set gr =3D =
11 (to prevent subsequent SINGLE_GPs from being placed in registers =
after DUAL_GP, QUAD_GP, or EIGHT_GP arguments that would no longer fit =
in the registers).



I've left the prior information below for reference.

=3D=3D=3D
Mark Millard
markmi at dsl-only.net



On 2016-Feb-14, at 2:34 PM, Mark Millard <markmi@dsl-only.net> wrote:
>=20
> On 2016-Feb-14, at 11:29 AM, Roman Divacky <rdivacky@vlakno.cz> wrote:
>>=20
>> Fwiw, the code to handle the vaarg is in=20
>> =
tools/clang/lib/CodeGen/TargetInfo.cpp:PPC32_SVR4_ABIInfo::EmitVAArg()
>>=20
>> You can take a look to see whats wrong.
>>=20
>> On Sat, Feb 13, 2016 at 07:03:29PM -0800, Mark Millard wrote:
>>> I've isolated another clang 3.8.0 TARGET_ARCH=3Dpowerpc SEGV problem =
that shows up for using clang 3.8.0 to buildworld/installworld for =
powerpc.
>>>=20
>>>> ls -l -n /
>>>=20
>>> gets a SEGV. As listed in =
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D207175 ( and  =
https://llvm.org/bugs/show_bug.cgi?id=3D26605 ) the following simplified =
program also gets the SEGV on powerpc:
>>>=20
>>>> #include <stdarg.h> // for va_list, va_start, va_arg, va_end
>>>> #include <stdint.h> // for intmax_t
>>>>=20
>>>> intmax_t
>>>> va_test (char *s, ...)
>>>> {
>>>>   va_list vap;
>>>>=20
>>>>   va_start(vap, s);
>>>>=20
>>>>   char*        t0 =3D va_arg(vap, char*);
>>>>   unsigned int o0 =3D va_arg(vap, unsigned int);
>>>>   int          c0 =3D va_arg(vap, int);
>>>>   unsigned int u0 =3D va_arg(vap, unsigned int);
>>>>   int          c1 =3D va_arg(vap, int);
>>>>   char *       t1 =3D va_arg(vap, char*);
>>>>=20
>>>>   intmax_t     j0 =3D va_arg(vap, intmax_t); // This spans into =
overflow_arg_area.
>>>>=20
>>>>   int          c2 =3D va_arg(vap, int);      // A copy was put in =
the=20
>>>>                                            // overflow_arg_area =
because of the
>>>>                                            // above.
>>>>                                            // But this tries to =
extract from the
>>>>                                            // last 4 bytes of the =
reg_save_area.
>>>>                                            // It does not increment =
the
>>>>                                            // overflow_arg_area =
position pointer
>>>>                                            // past the copy that is =
there.
>>>>=20
>>>>   char *       t2 =3D va_arg(vap, char*);    // The lack of =
increment before makes
>>>>                                            // this extraction off =
by 4 bytes.
>>>>=20
>>>>   char         t2fc =3D *t2;  // <<< This gets SEGV. t2 actually =
got what should be
>>>>                             //     the c2 value.
>>>>=20
>>>>   intmax_t     j1 =3D va_arg(vap, intmax_t);
>>>>=20
>>>>   va_end(vap);
>>>>=20
>>>>   return (intmax_t) ((s-t2)+(t0-t1)+o0+u0+j0+j1+c0+c1+c2+t2fc);
>>>>   // Avoid any optimize-away for lack of use.
>>>> }
>>>>=20
>>>> int main(void)
>>>> {
>>>>   char         s[1025] =3D "test string for this";
>>>>=20
>>>>   char*        t0 =3D s + 5;
>>>>   unsigned int o0 =3D 3;
>>>>   int          c0 =3D 1;
>>>>   unsigned int u0 =3D 1;
>>>>   int          c1 =3D 3;
>>>>   char *       t1 =3D s + 12;
>>>>   intmax_t     j0 =3D 314159265358979323;
>>>>   int          c2 =3D 4;
>>>>   char *       t2 =3D s + 16;
>>>>   intmax_t     j1 =3D ~314159265358979323;
>>>>=20
>>>>   intmax_t      result =3D =
va_test(s,t0,o0,c0,u0,c1,t1,j0,c1,t2,j1);
>>>>=20
>>>>   return (int) (result - (intmax_t) =
((s-t2)+(t0-t1)+o0+u0+j0+j1+c0+c1+c2+*t2));
>>>>   // Avoid any optimize-away for lack of use.
>>>> }
>>>=20
>>>=20
>>>=20
>>> =3D=3D=3D
>>> Mark Millard
>>> markmi at dsl-only.net
>>>=20
>>> _______________________________________________
>>> freebsd-toolchain@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
>>> To unsubscribe, send any mail to =
"freebsd-toolchain-unsubscribe@freebsd.org"
>=20
> clang's code base is not familiar material for me nor do I have solid =
reference material for the FreeBSD TARGET_ARCH=3Dpowerpc ABI rules so =
the below has my guess work involved.
>=20
> The following code appears to have hard wired a global, unvarying =
constant (8) into the test for picking UsingRegs vs. UsingOverflow.
>=20
>=20
>>  llvm::Value *NumRegs =3D Builder.CreateLoad(NumRegsAddr, =
"numUsedRegs");
> . . .
>>  llvm::Value *CC =3D
>>      Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond");
>>=20
>>  llvm::BasicBlock *UsingRegs =3D CGF.createBasicBlock("using_regs");
>>  llvm::BasicBlock *UsingOverflow =3D =
CGF.createBasicBlock("using_overflow");
>>  llvm::BasicBlock *Cont =3D CGF.createBasicBlock("cont");
>>=20
>>  Builder.CreateCondBr(CC, UsingRegs, UsingOverflow);
> . . .
>>  // Case 1: consume registers.
>>  Address RegAddr =3D Address::invalid();
>>  {
> . . .
>>    // Increase the used-register count.
>>    NumRegs =3D
>>      Builder.CreateAdd(NumRegs,
>>                        Builder.getInt8((isI64 || (isF64 && =
IsSoftFloatABI)) ? 2 : 1));
>>    Builder.CreateStore(NumRegs, NumRegsAddr);. . .
> . . .
>>  }
>>=20
>>  // Case 2: consume space in the overflow area.
>>  Address MemAddr =3D Address::invalid();
>>  {
> . . . (no adjustments to NumRegs) . . .
>=20
> If so the means of counting NumRegs (a.k.a. gpr) then needs to take =
into account an allocated but unused last UsingRegs "slot" sometimes. =
Imagine. . .
>=20
> r3, r4, r5, r6, r7, r8, r9 in use already so r10 is the last possible =
"UsingRegs" context.
> (0  1   2   3   4   5   6, leaving r10 as position 7, the last < 8 =
value)
>=20
> Then the next two arguments are a 8 byte integer then a a 4 byte =
integer (in that order). That results in what should be:
>=20
> r10 "UsingRegs" slot reserved and un-accessed
> In other words: counted as allocated so that the rest goes in in the =
overflow area
> (so no position 7 usage)
>=20
> then
>=20
> overflow with the 8 byte integer then the 4 byte integer.
>=20
>=20
> And, in fact, the memory content reflects this in the overflow area.
>=20
>=20
> But the va_arg access code does not count r10's slot as allocated in =
"Using Regs" after the 8 byte integer. So later it tries to use r10's =
slot for the 4 byte integer that is actually in the UsingOverflow area.
>=20
> One fix of sorts is to have "Case 2: consume space in the overflow =
area." set NumRegs (a.k.a. gpr) to the bound from the =
Builder.CreateICmpULT (8 in this context). Then the first (or any/every) =
use of the UsingOverflow area forces no more use of the UsingRegs area =
(for the involved va_list).
>=20
>=20
>=20
> =3D=3D=3D
> Mark Millard
> markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D7D536A4-68B6-4506-BDFB-8C2C41E1C958>