From owner-freebsd-toolchain@freebsd.org Mon Feb 15 19:13:19 2016 Return-Path: Delivered-To: freebsd-toolchain@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ADB50AA924C; Mon, 15 Feb 2016 19:13:19 +0000 (UTC) (envelope-from rdivacky@vlakno.cz) Received: from vlakno.cz (mail.vlakno.cz [91.217.96.224]) by mx1.freebsd.org (Postfix) with ESMTP id 3F28596B; Mon, 15 Feb 2016 19:13:18 +0000 (UTC) (envelope-from rdivacky@vlakno.cz) Received: by vlakno.cz (Postfix, from userid 1002) id D9FAC1E22EB6; Mon, 15 Feb 2016 20:11:00 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=vlakno.cz; s=mail; t=1455563460; bh=amKAlV+9OOHDTVVXhmExUFuA3Tia68EeeN/rkOwbvos=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=poH/ScoBbNVIDXQHRJxd2crFDE/ibsdmdmhoA+iFWPRH/VY2ZhBRZP5c4ZQyc8JCg DWx6yU0mhtIIQEPM/VvJKXGW4UVYSlmIrbRurakBFSaRJROtzEO50f/uOliPXVCagQ qJyOOROnUEW0QuGYazAykXBfRDqfsWrLZkq6umMA= Date: Mon, 15 Feb 2016 20:11:00 +0100 From: Roman Divacky To: Mark Millard Cc: Nathan Whitehorn , FreeBSD Toolchain , FreeBSD PowerPC ML Subject: Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc Message-ID: <20160215191100.GA17387@vlakno.cz> References: <20160214192903.GA96697@vlakno.cz> <70B405C4-E1AC-4F35-9786-051FDA2F8BE7@dsl-only.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Feb 2016 19:13:19 -0000 Mark, I believe you're right. What do you think about this patch? Index: tools/clang/lib/CodeGen/TargetInfo.cpp =================================================================== --- tools/clang/lib/CodeGen/TargetInfo.cpp (revision 260852) +++ tools/clang/lib/CodeGen/TargetInfo.cpp (working copy) @@ -3599,6 +3599,8 @@ { CGF.EmitBlock(UsingOverflow); + Builder.CreateStore(Builder.getInt8(11), NumRegsAddr); + // Everything in the overflow area is rounded up to a size of at least 4. CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4); Can you test it? On Mon, Feb 15, 2016 at 12:52:15AM -0800, Mark Millard wrote: > > I'm top posting as the following can stand on its own fairly well. > > On Sun Feb 14 23:46:14 UTC 2016 Nathan Whitehorn wrote: > > > On 02/14/16 14:34, Mark Millard wrote: > > > clang's code base is not familiar material for me nor do I have solid > > > reference material for the FreeBSD TARGET_ARCH=powerpc ABI rules so > > > the below has my guess work involved. The following code appears to > > > have hard wired a global, unvarying constant (8) into the test for > > > picking UsingRegs vs. UsingOverflow. > > > > For reference, we use the standard ELF ABI > > (https://uclibc.org/docs/psABI-ppc.pdf). > > -Nathan > > Reviewing the Parameter Passing material in that document shows that the problem is in the original specification. > > And there is a more modern specification that has a fix in its wording. (Which shows that I'm not likely to be wrong.) I'll reference and quote it later. > > First I'll explain the problem that is in psABI-ppc.pdf (the old SunSoft 1995 document). > > First a numbering point: psABI-ppc.pdf uses "gr" matching the numeral in r3, r4, . . . , r10, starting at r3 (i.e, 3). And gr indicates the next register to be used, not the last one already used. > > The document splits the algorithm for placement of parameters into 3 stages with the following structure, intended as they have it in the document but various less interesting details for my "8byte then 4byte" example omitted: > > > INITIALIZING: > > Set fr=1, gr=3, and starg to the address of > > parameter word 1. > > SCAN: > > If there are no more arguments, terminate. > > Otherwise, select one of the following > > depending on the type of the next argument: > > > > DOUBLE_OR_FLOAT > > If fr>8 ( . . .), go to OTHER. Otherwise, > > . . . > > > > SIMPLE_ARG > > If gr>10, go to OTHER. Otherwise, load the > > argument value into general register gr, > > set gr to gr+1, can goto SCAN. . . . > > > > LONG_LONG > > If gr>9, go to OTHER. Otherwise, . . . > > > > OTHER: > > Arguments not otherwise handled above are > > passed in the parameter words of the > > caller???s stack frame. . . . Set starg to > > starg+size, then go to SCAN. > > Note that gr is not incremented by LONG_LONG or by the later OTHER usage when gr>9. (That would be my example's 8 byte integer that is later followed by a 4 byte one.) > > That OTHER's "go to SCAN" would then lead to the following 4 byte integer in my example to be put in r10 and gr then being set to 11 instead of it being stored in a parameter word on the stack. > > The nasty thing about this for va_list/va_arg use is that the stored information does not indicate which was before vs. after in the argument order: the 4 byte r10 content or the 8 byte "OTHER" content: the two orders produce identical results. > > This can not be correct. > > The Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf is more modern and explicitly deals with VR and other modern things. (Its terminology matching LONG_LONG above is DUAL_GP.) But for what I'm dealing with here it has the following extra wording at the very end of its OTHER section: > > > If gr>9 and the type is DUAL_GP ,or . . ., or . . ., then set gr = 11 (to prevent subsequent SINGLE_GPs from being placed in registers after DUAL_GP, QUAD_GP, or EIGHT_GP arguments that would no longer fit in the registers). > > > > I've left the prior information below for reference. > > === > Mark Millard > markmi at dsl-only.net > > > > On 2016-Feb-14, at 2:34 PM, Mark Millard wrote: > > > > On 2016-Feb-14, at 11:29 AM, Roman Divacky wrote: > >> > >> Fwiw, the code to handle the vaarg is in > >> tools/clang/lib/CodeGen/TargetInfo.cpp:PPC32_SVR4_ABIInfo::EmitVAArg() > >> > >> You can take a look to see whats wrong. > >> > >> On Sat, Feb 13, 2016 at 07:03:29PM -0800, Mark Millard wrote: > >>> I've isolated another clang 3.8.0 TARGET_ARCH=powerpc SEGV problem that shows up for using clang 3.8.0 to buildworld/installworld for powerpc. > >>> > >>>> ls -l -n / > >>> > >>> gets a SEGV. As listed in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=207175 ( and https://llvm.org/bugs/show_bug.cgi?id=26605 ) the following simplified program also gets the SEGV on powerpc: > >>> > >>>> #include // for va_list, va_start, va_arg, va_end > >>>> #include // for intmax_t > >>>> > >>>> intmax_t > >>>> va_test (char *s, ...) > >>>> { > >>>> va_list vap; > >>>> > >>>> va_start(vap, s); > >>>> > >>>> char* t0 = va_arg(vap, char*); > >>>> unsigned int o0 = va_arg(vap, unsigned int); > >>>> int c0 = va_arg(vap, int); > >>>> unsigned int u0 = va_arg(vap, unsigned int); > >>>> int c1 = va_arg(vap, int); > >>>> char * t1 = va_arg(vap, char*); > >>>> > >>>> intmax_t j0 = va_arg(vap, intmax_t); // This spans into overflow_arg_area. > >>>> > >>>> int c2 = va_arg(vap, int); // A copy was put in the > >>>> // overflow_arg_area because of the > >>>> // above. > >>>> // But this tries to extract from the > >>>> // last 4 bytes of the reg_save_area. > >>>> // It does not increment the > >>>> // overflow_arg_area position pointer > >>>> // past the copy that is there. > >>>> > >>>> char * t2 = va_arg(vap, char*); // The lack of increment before makes > >>>> // this extraction off by 4 bytes. > >>>> > >>>> char t2fc = *t2; // <<< This gets SEGV. t2 actually got what should be > >>>> // the c2 value. > >>>> > >>>> intmax_t j1 = va_arg(vap, intmax_t); > >>>> > >>>> va_end(vap); > >>>> > >>>> return (intmax_t) ((s-t2)+(t0-t1)+o0+u0+j0+j1+c0+c1+c2+t2fc); > >>>> // Avoid any optimize-away for lack of use. > >>>> } > >>>> > >>>> int main(void) > >>>> { > >>>> char s[1025] = "test string for this"; > >>>> > >>>> char* t0 = s + 5; > >>>> unsigned int o0 = 3; > >>>> int c0 = 1; > >>>> unsigned int u0 = 1; > >>>> int c1 = 3; > >>>> char * t1 = s + 12; > >>>> intmax_t j0 = 314159265358979323; > >>>> int c2 = 4; > >>>> char * t2 = s + 16; > >>>> intmax_t j1 = ~314159265358979323; > >>>> > >>>> intmax_t result = va_test(s,t0,o0,c0,u0,c1,t1,j0,c1,t2,j1); > >>>> > >>>> return (int) (result - (intmax_t) ((s-t2)+(t0-t1)+o0+u0+j0+j1+c0+c1+c2+*t2)); > >>>> // Avoid any optimize-away for lack of use. > >>>> } > >>> > >>> > >>> > >>> === > >>> Mark Millard > >>> markmi at dsl-only.net > >>> > >>> _______________________________________________ > >>> freebsd-toolchain@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain > >>> To unsubscribe, send any mail to "freebsd-toolchain-unsubscribe@freebsd.org" > > > > clang's code base is not familiar material for me nor do I have solid reference material for the FreeBSD TARGET_ARCH=powerpc ABI rules so the below has my guess work involved. > > > > The following code appears to have hard wired a global, unvarying constant (8) into the test for picking UsingRegs vs. UsingOverflow. > > > > > >> llvm::Value *NumRegs = Builder.CreateLoad(NumRegsAddr, "numUsedRegs"); > > . . . > >> llvm::Value *CC = > >> Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond"); > >> > >> llvm::BasicBlock *UsingRegs = CGF.createBasicBlock("using_regs"); > >> llvm::BasicBlock *UsingOverflow = CGF.createBasicBlock("using_overflow"); > >> llvm::BasicBlock *Cont = CGF.createBasicBlock("cont"); > >> > >> Builder.CreateCondBr(CC, UsingRegs, UsingOverflow); > > . . . > >> // Case 1: consume registers. > >> Address RegAddr = Address::invalid(); > >> { > > . . . > >> // Increase the used-register count. > >> NumRegs = > >> Builder.CreateAdd(NumRegs, > >> Builder.getInt8((isI64 || (isF64 && IsSoftFloatABI)) ? 2 : 1)); > >> Builder.CreateStore(NumRegs, NumRegsAddr);. . . > > . . . > >> } > >> > >> // Case 2: consume space in the overflow area. > >> Address MemAddr = Address::invalid(); > >> { > > . . . (no adjustments to NumRegs) . . . > > > > If so the means of counting NumRegs (a.k.a. gpr) then needs to take into account an allocated but unused last UsingRegs "slot" sometimes. Imagine. . . > > > > r3, r4, r5, r6, r7, r8, r9 in use already so r10 is the last possible "UsingRegs" context. > > (0 1 2 3 4 5 6, leaving r10 as position 7, the last < 8 value) > > > > Then the next two arguments are a 8 byte integer then a a 4 byte integer (in that order). That results in what should be: > > > > r10 "UsingRegs" slot reserved and un-accessed > > In other words: counted as allocated so that the rest goes in in the overflow area > > (so no position 7 usage) > > > > then > > > > overflow with the 8 byte integer then the 4 byte integer. > > > > > > And, in fact, the memory content reflects this in the overflow area. > > > > > > But the va_arg access code does not count r10's slot as allocated in "Using Regs" after the 8 byte integer. So later it tries to use r10's slot for the 4 byte integer that is actually in the UsingOverflow area. > > > > One fix of sorts is to have "Case 2: consume space in the overflow area." set NumRegs (a.k.a. gpr) to the bound from the Builder.CreateICmpULT (8 in this context). Then the first (or any/every) use of the UsingOverflow area forces no more use of the UsingRegs area (for the involved va_list). > > > > > > > > === > > Mark Millard > > markmi at dsl-only.net > > _______________________________________________ > freebsd-toolchain@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain > To unsubscribe, send any mail to "freebsd-toolchain-unsubscribe@freebsd.org"