From owner-freebsd-ppc@freebsd.org Fri Feb 5 10:39:01 2016 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B2F16A9C360 for ; Fri, 5 Feb 2016 10:39:01 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-2.reflexion.net [208.70.210.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 77EA1B2E for ; Fri, 5 Feb 2016 10:39:00 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 12526 invoked from network); 5 Feb 2016 10:39:00 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 5 Feb 2016 10:39:00 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v7.80.0) with SMTP; Fri, 05 Feb 2016 05:39:07 -0500 (EST) Received: (qmail 15262 invoked from network); 5 Feb 2016 10:39:06 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with SMTP; 5 Feb 2016 10:39:06 -0000 X-No-Relay: not in my network Received: from [192.168.1.8] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id AFCA11C43AE for ; Fri, 5 Feb 2016 02:38:58 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: projects/clang380-import -r294962+ based powerpc (32-bit) buildworld -j 6: make gets SEGV, a partial smoking gun? From: Mark Millard In-Reply-To: Date: Fri, 5 Feb 2016 02:38:59 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: FreeBSD PowerPC ML X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Feb 2016 10:39:01 -0000 On 2016-Feb-4, at 11:46 PM, Mark Millard wrote: >=20 > The problem: >=20 > For a clang 3.8.0 based buildworld TARGET_ARCH=3Dpowerpc installation = attempting "make -j 6 buildworld" (run on 4 powerpc cores) eventually = gets a segmentation fault in a make instance. (More details later.) = "make buildworld" does not fault. >=20 > I expect that the details that I describe below implies some form of = intermittency, such as a race condition. >=20 > (This is with the content of sys/powerpc/powerpc/sigcode32.S -r295186 = in place so that signal delivery maintains the modulo 16 byte = stack/frame alignment status instead of changing the alignment.) >=20 > (clang 3.8.0 targeting powerpc (32-bit) is known to be able to = introduce more stack alignment dependencies by sometimes "or-ing" in = offset-bits into some aligned-address lower bits instead of using = addition. But I do not know if that is involved here somehow.) >=20 >=20 >=20 > What is always involved and what varies: >=20 > In all cases the failure was r31 being used as a frame-pointer with = the value zero in r31 at the time of the address calculation, even when = when the dereference of the address was later. r1 still seemed to be a = valid stack pointer in all cases. >=20 > In every case the faulting routine had called one or more routines = during its operation --and those had returned. There was an example or = two of a self-contained routine that was recursive that got the failure. >=20 > In some cases prior calls in the faulting routine had non-zero r31 = values when they returned. (There was later r31 usage that did not = fault.) >=20 > Overall the call chains varied widely for various example faults, = although some call context is more common as a failure point. >=20 > Use of ktrace with "-di -t cs" and use of kdump for extracting for the = failing process shows the same 5 line sequence before every example = "PSIG SIGSEGV". What was before those 5 lines varied across the various = kdsump outputs. >=20 > I used ktrace/kdump commands of the structure: >=20 > ktrace -di -f /usr/obj/make.out -t cs -p ??? > kdump -E -f /usr/obj/make.out -p ??? > = /var/tmp/make_ktrace_sigsegv_??.txt >=20 > Example results (showing the 5 lines and PSIG SIGSEGV): >=20 > (3 prior "sigreturn JUSTRETURN" among what is not shown) >> 65158 make 0.205791 PSIG SIGCHLD caught handler=3D0x180aae0 = mask=3D0x0 code=3DCLD_EXITED >> 65158 make 0.205822 CALL write(0x3,0x189e914,0x1) >> 65158 make 0.205847 RET write 1 >> 65158 make 0.205869 CALL sigreturn(0xffffbb50) >> 65158 make 0.205923 RET sigreturn JUSTRETURN >> 65158 make 0.205962 PSIG SIGSEGV SIG_DFL code=3DSEGV_MAPERR >=20 > (365 prior "sigreturn JUSTRETURN" among what is not shown) >> 599 make 5.552305 PSIG SIGCHLD caught handler=3D0x180aae0 = mask=3D0x0 code=3DCLD_EXITED >> 599 make 5.552323 CALL write(0x3,0x189e914,0x1) >> 599 make 5.552337 RET write 1 >> 599 make 5.552347 CALL sigreturn(0xffffbb30) >> 599 make 5.552358 RET sigreturn JUSTRETURN >> 599 make 5.552381 PSIG SIGSEGV SIG_DFL code=3DSEGV_MAPERR >=20 > (287 prior "sigreturn JUSTRETURN" among what is not shown) >> 75728 make 4.141097 PSIG SIGCHLD caught handler=3D0x180aae0 = mask=3D0x0 code=3DCLD_EXITED >> 75728 make 4.141116 CALL write(0x3,0x189e914,0x1) >> 75728 make 4.141154 RET write 1 >> 75728 make 4.141349 CALL sigreturn(0xffffbaa0) >> 75728 make 4.141366 RET sigreturn JUSTRETURN >> 75728 make 4.141404 PSIG SIGSEGV SIG_DFL code=3DSEGV_MAPERR >=20 > (273 prior "sigreturn JUSTRETURN" among what is not shown) >> 12195 make 27.213277 PSIG SIGCHLD caught handler=3D0x180aae0 = mask=3D0x0 code=3DCLD_EXITED >> 12195 make 27.213322 CALL write(0x3,0x189e914,0x1) >> 12195 make 27.213346 RET write 1 >> 12195 make 27.213361 CALL sigreturn(0xffffb1e0) >> 12195 make 27.213383 RET sigreturn JUSTRETURN >> 12195 make 27.213418 PSIG SIGSEGV SIG_DFL code=3DSEGV_MAPERR >=20 > (789 prior "sigreturn JUSTRETURN" among what is not shown) >> 50545 make 80.255162 PSIG SIGCHLD caught handler=3D0x180aae0 = mask=3D0x0 code=3DCLD_EXITED >> 50545 make 80.255192 CALL write(0x3,0x189e914,0x1) >> 50545 make 80.255219 RET write 1 >> 50545 make 80.255241 CALL sigreturn(0xffffafa0) >> 50545 make 80.255265 RET sigreturn JUSTRETURN >> 50545 make 80.255317 PSIG SIGSEGV SIG_DFL code=3DSEGV_MAPERR >=20 > The 5 line sequence is not sufficient for the problem to occur but = appears to be necessary: There were sometimes hundreds of prior "PSIG = SIGCHLD". . ."RET sigreturn JUSTRETURNS" sequences for which they were = not followed by "PSIG SIGSEGV". But every failure tested with ktrace has = the 5 lines as an immediate prefix in the list for the process. >=20 > Which instance of make varied and where in make the fault happens = varied. The "-E" elapsed times above and those JUSTRETURN counts give a = solid clue to there being variability in when the fault happens. >=20 > I'll use some script log file sizes for the buidlworld as another = indication of variability. I've sorted them: >=20 > 2942664 > 3304207 > 3342660 > 3474585 > 3941983 >=20 > so spanning from 2.9 MBytes to 3.9 MBytes. I've since gotten a few = with less and some with more. >=20 >=20 > Note: A couple of times with ktrace being involved it failed at an = earlier stage than I've seen otherwise. It may be that ktrace being = involved makes the problem more likely/frequent. >=20 >=20 >=20 > Context basics (quad core PowerMac running TARGET_ARCH=3Dpowerpc = (32-bit)): >=20 > # freebsd-version -ku; uname -aKU > 11.0-CURRENT > 11.0-CURRENT > FreeBSD FBSDG4C1 11.0-CURRENT FreeBSD 11.0-CURRENT #2 r294962M: Mon = Feb 1 00:31:03 PST 2016 = markmi@FreeBSDx64:/usr/obj/clang_gcc421/powerpc.powerpc/usr/src/sys/GENERI= Cvtsc-NODEBUG powerpc 1100097 1100097 >=20 > This is with the content of sys/powerpc/powerpc/sigcode32.S -r295186 = in place so that signal delivery maintains the modulo 16 byte = stack/frame alignment status instead of changing the alignment. >=20 > buildkernel was via gcc 4.2.1 > buildworld was via clang 3.8.0 >=20 >=20 >=20 > I'm not sure that I'm going to get much farther in tracking down the = source of the race(?) that leads to the SEGV's. I should have given an example of clang 3.8.0 produced post-amble code = that ends up picking up the 0x0 and putting it is r31 (the Frame Pointer = for clang's produced code). 0x1801b8c : lwz r30,24(r31) 0x1801b90 : lwz r29,20(r31) 0x1801b94 : lwz r28,16(r31) 0x1801b98 : lwz r27,12(r31) 0x1801b9c : lwz r26,8(r31) 0x1801ba0 : addi r1,r1,32 # Stack pointer = adjusted first 0x1801ba4 : lwz r0,4(r1) 0x1801ba8 : lwz r31,-4(r1) # Then Frame Pointer = load happens # "outside" the new = stack range. 0x1801bac : mtlr r0 0x1801bb0 : blr Here is the related bl and following code: 0x182516c : bl 0x1801b24 0x1825170 : lis r3,396 =3D> 0x1825174 : lwz r4,16(r31) 0x1825178 : lwz r3,-13260(r3) 0x182517c : subf. r3,r4,r3 0x1825180 : bne 0x18251c4 = The "=3D>" points to a "sometimes SEGV" place from r31=3D0x0 being = involved in a clang 3.8.0 buildworld's based "make" after it resumes = normal operation after handling a SIGCHLD delivery. So the "lwz r31,-4(r1)" earlier above resulted in a 0x0 in r31 just = before the SEGV. By contrast the gcc 4.2.1 generated code does not use r31 as a frame = pointer and has: 0x1801db8 : lwz r0,52(r1) 0x1801dbc : lwz r27,28(r1) 0x1801dc0 : lwz r28,32(r1) 0x1801dc4 : mtlr r0 0x1801dc8 : lwz r29,36(r1) 0x1801dcc : lwz r30,40(r1) 0x1801dd0 : lwz r31,44(r1) 0x1801dd4 : addi r1,r1,48 # Stack pointer is = adjusted last. # The prior code = worked in the range. 0x1801dd8 : blr and: 0x1823cac : bl 0x1801d58 0x1823cb0 : b 0x1823c0c . . . backing up to the jump point . . . 0x1823c0c : lwz r0,28(r1) # Stack pointer used, = not a Frame Pointer 0x1823c10 : lwz r9,0(r25) 0x1823c14 : xor. r0,r0,r9 0x1823c18 : li r9,0 0x1823c1c : li r3,1 0x1823c20 : bne- 0x1823cb4 As r31 is not used as a frame pointer the 4.2.1 code can not get the = SEGV that the clang 3.8.0 code sometimes gets. =3D=3D=3D Mark Millard markmi at dsl-only.net