From owner-freebsd-toolchain@freebsd.org Mon Feb 15 08:52:18 2016 Return-Path: Delivered-To: freebsd-toolchain@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3F032AA8D48 for ; Mon, 15 Feb 2016 08:52:18 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-153.reflexion.net [208.70.211.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 01AD51112 for ; Mon, 15 Feb 2016 08:52:17 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 3096 invoked from network); 15 Feb 2016 08:52:30 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 15 Feb 2016 08:52:30 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v7.80.0) with SMTP; Mon, 15 Feb 2016 03:52:10 -0500 (EST) Received: (qmail 12072 invoked from network); 15 Feb 2016 08:52:10 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with SMTP; 15 Feb 2016 08:52:10 -0000 X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 43D921C43D8; Mon, 15 Feb 2016 00:52:15 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc From: Mark Millard In-Reply-To: <70B405C4-E1AC-4F35-9786-051FDA2F8BE7@dsl-only.net> Date: Mon, 15 Feb 2016 00:52:15 -0800 Cc: FreeBSD PowerPC ML , FreeBSD Toolchain Content-Transfer-Encoding: quoted-printable Message-Id: References: <20160214192903.GA96697@vlakno.cz> <70B405C4-E1AC-4F35-9786-051FDA2F8BE7@dsl-only.net> To: Roman Divacky , Nathan Whitehorn X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Feb 2016 08:52:18 -0000 I'm top posting as the following can stand on its own fairly well. On Sun Feb 14 23:46:14 UTC 2016 Nathan Whitehorn wrote: > On 02/14/16 14:34, Mark Millard wrote: > > clang's code base is not familiar material for me nor do I have = solid=20 > > reference material for the FreeBSD TARGET_ARCH=3Dpowerpc ABI rules = so=20 > > the below has my guess work involved. The following code appears to=20= > > have hard wired a global, unvarying constant (8) into the test for=20= > > picking UsingRegs vs. UsingOverflow. >=20 > For reference, we use the standard ELF ABI=20 > (https://uclibc.org/docs/psABI-ppc.pdf). > -Nathan Reviewing the Parameter Passing material in that document shows that the = problem is in the original specification. And there is a more modern specification that has a fix in its wording. = (Which shows that I'm not likely to be wrong.) I'll reference and quote = it later. First I'll explain the problem that is in psABI-ppc.pdf (the old SunSoft = 1995 document). First a numbering point: psABI-ppc.pdf uses "gr" matching the numeral in = r3, r4, . . . , r10, starting at r3 (i.e, 3). And gr indicates the next = register to be used, not the last one already used. The document splits the algorithm for placement of parameters into 3 = stages with the following structure, intended as they have it in the = document but various less interesting details for my "8byte then 4byte" = example omitted: > INITIALIZING: > Set fr=3D1, gr=3D3, and starg to the address of > parameter word 1. > SCAN: > If there are no more arguments, terminate. > Otherwise, select one of the following > depending on the type of the next argument: >=20 > DOUBLE_OR_FLOAT > If fr>8 ( . . .), go to OTHER. Otherwise, > . . . >=20 > SIMPLE_ARG > If gr>10, go to OTHER. Otherwise, load the > argument value into general register gr, > set gr to gr+1, can goto SCAN. . . . >=20 > LONG_LONG > If gr>9, go to OTHER. Otherwise, . . . >=20 > OTHER: > Arguments not otherwise handled above are > passed in the parameter words of the > caller=E2=80=99s stack frame. . . . Set starg to > starg+size, then go to SCAN. Note that gr is not incremented by LONG_LONG or by the later OTHER usage = when gr>9. (That would be my example's 8 byte integer that is later = followed by a 4 byte one.) That OTHER's "go to SCAN" would then lead to the following 4 byte = integer in my example to be put in r10 and gr then being set to 11 = instead of it being stored in a parameter word on the stack. The nasty thing about this for va_list/va_arg use is that the stored = information does not indicate which was before vs. after in the argument = order: the 4 byte r10 content or the 8 byte "OTHER" content: the two = orders produce identical results. This can not be correct. The Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf is more modern and = explicitly deals with VR and other modern things. (Its terminology = matching LONG_LONG above is DUAL_GP.) But for what I'm dealing with here = it has the following extra wording at the very end of its OTHER section: > If gr>9 and the type is DUAL_GP ,or . . ., or . . ., then set gr =3D = 11 (to prevent subsequent SINGLE_GPs from being placed in registers = after DUAL_GP, QUAD_GP, or EIGHT_GP arguments that would no longer fit = in the registers). I've left the prior information below for reference. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2016-Feb-14, at 2:34 PM, Mark Millard wrote: >=20 > On 2016-Feb-14, at 11:29 AM, Roman Divacky wrote: >>=20 >> Fwiw, the code to handle the vaarg is in=20 >> = tools/clang/lib/CodeGen/TargetInfo.cpp:PPC32_SVR4_ABIInfo::EmitVAArg() >>=20 >> You can take a look to see whats wrong. >>=20 >> On Sat, Feb 13, 2016 at 07:03:29PM -0800, Mark Millard wrote: >>> I've isolated another clang 3.8.0 TARGET_ARCH=3Dpowerpc SEGV problem = that shows up for using clang 3.8.0 to buildworld/installworld for = powerpc. >>>=20 >>>> ls -l -n / >>>=20 >>> gets a SEGV. As listed in = https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D207175 ( and = https://llvm.org/bugs/show_bug.cgi?id=3D26605 ) the following simplified = program also gets the SEGV on powerpc: >>>=20 >>>> #include // for va_list, va_start, va_arg, va_end >>>> #include // for intmax_t >>>>=20 >>>> intmax_t >>>> va_test (char *s, ...) >>>> { >>>> va_list vap; >>>>=20 >>>> va_start(vap, s); >>>>=20 >>>> char* t0 =3D va_arg(vap, char*); >>>> unsigned int o0 =3D va_arg(vap, unsigned int); >>>> int c0 =3D va_arg(vap, int); >>>> unsigned int u0 =3D va_arg(vap, unsigned int); >>>> int c1 =3D va_arg(vap, int); >>>> char * t1 =3D va_arg(vap, char*); >>>>=20 >>>> intmax_t j0 =3D va_arg(vap, intmax_t); // This spans into = overflow_arg_area. >>>>=20 >>>> int c2 =3D va_arg(vap, int); // A copy was put in = the=20 >>>> // overflow_arg_area = because of the >>>> // above. >>>> // But this tries to = extract from the >>>> // last 4 bytes of the = reg_save_area. >>>> // It does not increment = the >>>> // overflow_arg_area = position pointer >>>> // past the copy that is = there. >>>>=20 >>>> char * t2 =3D va_arg(vap, char*); // The lack of = increment before makes >>>> // this extraction off = by 4 bytes. >>>>=20 >>>> char t2fc =3D *t2; // <<< This gets SEGV. t2 actually = got what should be >>>> // the c2 value. >>>>=20 >>>> intmax_t j1 =3D va_arg(vap, intmax_t); >>>>=20 >>>> va_end(vap); >>>>=20 >>>> return (intmax_t) ((s-t2)+(t0-t1)+o0+u0+j0+j1+c0+c1+c2+t2fc); >>>> // Avoid any optimize-away for lack of use. >>>> } >>>>=20 >>>> int main(void) >>>> { >>>> char s[1025] =3D "test string for this"; >>>>=20 >>>> char* t0 =3D s + 5; >>>> unsigned int o0 =3D 3; >>>> int c0 =3D 1; >>>> unsigned int u0 =3D 1; >>>> int c1 =3D 3; >>>> char * t1 =3D s + 12; >>>> intmax_t j0 =3D 314159265358979323; >>>> int c2 =3D 4; >>>> char * t2 =3D s + 16; >>>> intmax_t j1 =3D ~314159265358979323; >>>>=20 >>>> intmax_t result =3D = va_test(s,t0,o0,c0,u0,c1,t1,j0,c1,t2,j1); >>>>=20 >>>> return (int) (result - (intmax_t) = ((s-t2)+(t0-t1)+o0+u0+j0+j1+c0+c1+c2+*t2)); >>>> // Avoid any optimize-away for lack of use. >>>> } >>>=20 >>>=20 >>>=20 >>> =3D=3D=3D >>> Mark Millard >>> markmi at dsl-only.net >>>=20 >>> _______________________________________________ >>> freebsd-toolchain@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain >>> To unsubscribe, send any mail to = "freebsd-toolchain-unsubscribe@freebsd.org" >=20 > clang's code base is not familiar material for me nor do I have solid = reference material for the FreeBSD TARGET_ARCH=3Dpowerpc ABI rules so = the below has my guess work involved. >=20 > The following code appears to have hard wired a global, unvarying = constant (8) into the test for picking UsingRegs vs. UsingOverflow. >=20 >=20 >> llvm::Value *NumRegs =3D Builder.CreateLoad(NumRegsAddr, = "numUsedRegs"); > . . . >> llvm::Value *CC =3D >> Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond"); >>=20 >> llvm::BasicBlock *UsingRegs =3D CGF.createBasicBlock("using_regs"); >> llvm::BasicBlock *UsingOverflow =3D = CGF.createBasicBlock("using_overflow"); >> llvm::BasicBlock *Cont =3D CGF.createBasicBlock("cont"); >>=20 >> Builder.CreateCondBr(CC, UsingRegs, UsingOverflow); > . . . >> // Case 1: consume registers. >> Address RegAddr =3D Address::invalid(); >> { > . . . >> // Increase the used-register count. >> NumRegs =3D >> Builder.CreateAdd(NumRegs, >> Builder.getInt8((isI64 || (isF64 && = IsSoftFloatABI)) ? 2 : 1)); >> Builder.CreateStore(NumRegs, NumRegsAddr);. . . > . . . >> } >>=20 >> // Case 2: consume space in the overflow area. >> Address MemAddr =3D Address::invalid(); >> { > . . . (no adjustments to NumRegs) . . . >=20 > If so the means of counting NumRegs (a.k.a. gpr) then needs to take = into account an allocated but unused last UsingRegs "slot" sometimes. = Imagine. . . >=20 > r3, r4, r5, r6, r7, r8, r9 in use already so r10 is the last possible = "UsingRegs" context. > (0 1 2 3 4 5 6, leaving r10 as position 7, the last < 8 = value) >=20 > Then the next two arguments are a 8 byte integer then a a 4 byte = integer (in that order). That results in what should be: >=20 > r10 "UsingRegs" slot reserved and un-accessed > In other words: counted as allocated so that the rest goes in in the = overflow area > (so no position 7 usage) >=20 > then >=20 > overflow with the 8 byte integer then the 4 byte integer. >=20 >=20 > And, in fact, the memory content reflects this in the overflow area. >=20 >=20 > But the va_arg access code does not count r10's slot as allocated in = "Using Regs" after the 8 byte integer. So later it tries to use r10's = slot for the 4 byte integer that is actually in the UsingOverflow area. >=20 > One fix of sorts is to have "Case 2: consume space in the overflow = area." set NumRegs (a.k.a. gpr) to the bound from the = Builder.CreateICmpULT (8 in this context). Then the first (or any/every) = use of the UsingOverflow area forces no more use of the UsingRegs area = (for the involved va_list). >=20 >=20 >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net