From owner-freebsd-ppc@freebsd.org Mon Feb 15 20:17:58 2016 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 073E7AAACB5 for ; Mon, 15 Feb 2016 20:17:58 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-153.reflexion.net [208.70.211.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C01F9F54 for ; Mon, 15 Feb 2016 20:17:57 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 27871 invoked from network); 15 Feb 2016 20:17:52 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 15 Feb 2016 20:17:52 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v7.80.0) with SMTP; Mon, 15 Feb 2016 15:18:01 -0500 (EST) Received: (qmail 9845 invoked from network); 15 Feb 2016 20:18:01 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with SMTP; 15 Feb 2016 20:18:01 -0000 X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 7C4371C43B3; Mon, 15 Feb 2016 12:17:47 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc From: Mark Millard In-Reply-To: <20160215191100.GA17387@vlakno.cz> Date: Mon, 15 Feb 2016 12:17:50 -0800 Cc: Nathan Whitehorn , FreeBSD Toolchain , FreeBSD PowerPC ML Content-Transfer-Encoding: quoted-printable Message-Id: <3A260EC5-E69A-4980-8F74-C04395F4E5F4@dsl-only.net> References: <20160214192903.GA96697@vlakno.cz> <70B405C4-E1AC-4F35-9786-051FDA2F8BE7@dsl-only.net> <20160215191100.GA17387@vlakno.cz> To: Roman Divacky X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Feb 2016 20:17:58 -0000 On 2016-Feb-15, at 11:11 AM, Roman Divacky = wrote: >=20 > Mark, I believe you're right. What do you think about this patch? >=20 > Index: tools/clang/lib/CodeGen/TargetInfo.cpp > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- tools/clang/lib/CodeGen/TargetInfo.cpp (revision 260852) > +++ tools/clang/lib/CodeGen/TargetInfo.cpp (working copy) > @@ -3599,6 +3599,8 @@ > { > CGF.EmitBlock(UsingOverflow); >=20 > + Builder.CreateStore(Builder.getInt8(11), NumRegsAddr); > + > // Everything in the overflow area is rounded up to a size of at = least 4. > CharUnits OverflowAreaAlign =3D CharUnits::fromQuantity(4); >=20 >=20 > Can you test it? It may be later today before I can start the the test process. While your change is not wrong as presented, it does seem to be based on = the ABI document's numbering with the range 3 <=3D gr <12, where 3 <=3D = gr < 11 cover r3-r10 use and gr=3D11 implies overflow stack area use. = (gr being the ABI documents name.) The clang code generation that I saw while analyzing the problem and the = clang source that you had me look at did not use that numbering. Instead = it seems to be based on 0 <=3D gpr < 9, where 0 <=3D gpr < 8 cover = r3-r10 use and gpr=3D8 implies overflow stack area use. (gpr being what = gdb showed me as I remember.) In other words: clang counts the number of = "parameter registers" already in use as it goes along instead of = tracking register numbers that have been used. So assigning any value that appears to be positive and >=3D 8 should = work, such as: Builder.CreateStore(Builder.getInt8(8), NumRegsAddr); A cross check on this is the clang source code below: > llvm::Value *NumRegs =3D Builder.CreateLoad(NumRegsAddr, = "numUsedRegs"); > . . . > llvm::Value *CC =3D > Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond"); >=20 > llvm::BasicBlock *UsingRegs =3D CGF.createBasicBlock("using_regs"); > llvm::BasicBlock *UsingOverflow =3D = CGF.createBasicBlock("using_overflow"); > llvm::BasicBlock *Cont =3D CGF.createBasicBlock("cont"); >=20 > Builder.CreateCondBr(CC, UsingRegs, UsingOverflow); I'd guess that the Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), = "cond") for using in Builder.CreateCondBr is a test for < 8 (unsigned = test?) picking UsingRegs and >=3D8 picking UsingOverflow. 11>=3D8 so 11 = would work. But the clang folks might prefer that the same figure be used in both = places, possibly with the source code naming the value once and using = the name in both places, not that the figure is likely to change in this = already PowerPC specific code. In analyzing the powerpc code absent knowledge of clang's code = generation source code I would likely have been confused by seeing such = differing numbers in the generated code if I'd run into such. That is = another reason to use the same figure in both places. I continue to provide some history below for reference. =3D=3D=3D Mark Millard markmi at dsl-only.net On Mon, Feb 15, 2016 at 12:52:15AM -0800, Mark Millard wrote: >=20 > I'm top posting as the following can stand on its own fairly well. >=20 > On Sun Feb 14 23:46:14 UTC 2016 Nathan Whitehorn wrote: >=20 >> On 02/14/16 14:34, Mark Millard wrote: >>> clang's code base is not familiar material for me nor do I have = solid=20 >>> reference material for the FreeBSD TARGET_ARCH=3Dpowerpc ABI rules = so=20 >>> the below has my guess work involved. The following code appears to=20= >>> have hard wired a global, unvarying constant (8) into the test for=20= >>> picking UsingRegs vs. UsingOverflow. >>=20 >> For reference, we use the standard ELF ABI=20 >> (https://uclibc.org/docs/psABI-ppc.pdf). >> -Nathan >=20 > Reviewing the Parameter Passing material in that document shows that = the problem is in the original specification. >=20 > And there is a more modern specification that has a fix in its = wording. (Which shows that I'm not likely to be wrong.) I'll reference = and quote it later. >=20 > First I'll explain the problem that is in psABI-ppc.pdf (the old = SunSoft 1995 document). >=20 > First a numbering point: psABI-ppc.pdf uses "gr" matching the numeral = in r3, r4, . . . , r10, starting at r3 (i.e, 3). And gr indicates the = next register to be used, not the last one already used. >=20 > The document splits the algorithm for placement of parameters into 3 = stages with the following structure, intended as they have it in the = document but various less interesting details for my "8byte then 4byte" = example omitted: >=20 >> INITIALIZING: >> Set fr=3D1, gr=3D3, and starg to the address of >> parameter word 1. >> SCAN: >> If there are no more arguments, terminate. >> Otherwise, select one of the following >> depending on the type of the next argument: >>=20 >> DOUBLE_OR_FLOAT >> If fr>8 ( . . .), go to OTHER. Otherwise, >> . . . >>=20 >> SIMPLE_ARG >> If gr>10, go to OTHER. Otherwise, load the >> argument value into general register gr, >> set gr to gr+1, can goto SCAN. . . . >>=20 >> LONG_LONG >> If gr>9, go to OTHER. Otherwise, . . . >>=20 >> OTHER: >> Arguments not otherwise handled above are >> passed in the parameter words of the >> caller???s stack frame. . . . Set starg to >> starg+size, then go to SCAN. >=20 > Note that gr is not incremented by LONG_LONG or by the later OTHER = usage when gr>9. (That would be my example's 8 byte integer that is = later followed by a 4 byte one.) >=20 > That OTHER's "go to SCAN" would then lead to the following 4 byte = integer in my example to be put in r10 and gr then being set to 11 = instead of it being stored in a parameter word on the stack. >=20 > The nasty thing about this for va_list/va_arg use is that the stored = information does not indicate which was before vs. after in the argument = order: the 4 byte r10 content or the 8 byte "OTHER" content: the two = orders produce identical results. >=20 > This can not be correct. >=20 > The Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf is more modern and = explicitly deals with VR and other modern things. (Its terminology = matching LONG_LONG above is DUAL_GP.) But for what I'm dealing with here = it has the following extra wording at the very end of its OTHER section: >=20 >> If gr>9 and the type is DUAL_GP ,or . . ., or . . ., then set gr =3D = 11 (to prevent subsequent SINGLE_GPs from being placed in registers = after DUAL_GP, QUAD_GP, or EIGHT_GP arguments that would no longer fit = in the registers). >=20 >=20 >=20 > I've left the prior information below for reference. >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net >=20 >=20 >=20 > On 2016-Feb-14, at 2:34 PM, Mark Millard wrote: >>=20 >> . . . >> clang's code base is not familiar material for me nor do I have solid = reference material for the FreeBSD TARGET_ARCH=3Dpowerpc ABI rules so = the below has my guess work involved. >>=20 >> The following code appears to have hard wired a global, unvarying = constant (8) into the test for picking UsingRegs vs. UsingOverflow. >>=20 >>=20 >>> llvm::Value *NumRegs =3D Builder.CreateLoad(NumRegsAddr, = "numUsedRegs"); >> . . . >>> llvm::Value *CC =3D >>> Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond"); >>>=20 >>> llvm::BasicBlock *UsingRegs =3D CGF.createBasicBlock("using_regs"); >>> llvm::BasicBlock *UsingOverflow =3D = CGF.createBasicBlock("using_overflow"); >>> llvm::BasicBlock *Cont =3D CGF.createBasicBlock("cont"); >>>=20 >>> Builder.CreateCondBr(CC, UsingRegs, UsingOverflow); >> . . . >>> // Case 1: consume registers. >>> Address RegAddr =3D Address::invalid(); >>> { >> . . . >>> // Increase the used-register count. >>> NumRegs =3D >>> Builder.CreateAdd(NumRegs, >>> Builder.getInt8((isI64 || (isF64 && = IsSoftFloatABI)) ? 2 : 1)); >>> Builder.CreateStore(NumRegs, NumRegsAddr);. . . >> . . . >>> } >>>=20 >>> // Case 2: consume space in the overflow area. >>> Address MemAddr =3D Address::invalid(); >>> { >> . . . (no adjustments to NumRegs) . . . >>=20 >> If so the means of counting NumRegs (a.k.a. gpr) then needs to take = into account an allocated but unused last UsingRegs "slot" sometimes. = Imagine. . . >>=20 >> r3, r4, r5, r6, r7, r8, r9 in use already so r10 is the last possible = "UsingRegs" context. >> (0 1 2 3 4 5 6, leaving r10 as position 7, the last < 8 = value) >>=20 >> Then the next two arguments are a 8 byte integer then a a 4 byte = integer (in that order). That results in what should be: >>=20 >> r10 "UsingRegs" slot reserved and un-accessed >> In other words: counted as allocated so that the rest goes in in the = overflow area >> (so no position 7 usage) >>=20 >> then >>=20 >> overflow with the 8 byte integer then the 4 byte integer. >>=20 >>=20 >> And, in fact, the memory content reflects this in the overflow area. >>=20 >>=20 >> But the va_arg access code does not count r10's slot as allocated in = "Using Regs" after the 8 byte integer. So later it tries to use r10's = slot for the 4 byte integer that is actually in the UsingOverflow area. >>=20 >> One fix of sorts is to have "Case 2: consume space in the overflow = area." set NumRegs (a.k.a. gpr) to the bound from the = Builder.CreateICmpULT (8 in this context). Then the first (or any/every) = use of the UsingOverflow area forces no more use of the UsingRegs area = (for the involved va_list). >>=20 >>=20 >>=20 >> =3D=3D=3D >> Mark Millard >> markmi at dsl-only.net >=20 > _______________________________________________ > freebsd-toolchain@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain > To unsubscribe, send any mail to = "freebsd-toolchain-unsubscribe@freebsd.org"