From owner-freebsd-ppc@freebsd.org Fri Feb 5 07:46:54 2016 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 433069D90DB for ; Fri, 5 Feb 2016 07:46:54 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-2.reflexion.net [208.70.210.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0869819FA for ; Fri, 5 Feb 2016 07:46:53 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 2334 invoked from network); 5 Feb 2016 07:46:50 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 5 Feb 2016 07:46:50 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v7.80.0) with SMTP; Fri, 05 Feb 2016 02:46:55 -0500 (EST) Received: (qmail 11224 invoked from network); 5 Feb 2016 07:46:54 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with SMTP; 5 Feb 2016 07:46:54 -0000 X-No-Relay: not in my network Received: from [192.168.1.8] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id B4CC31C42A3 for ; Thu, 4 Feb 2016 23:46:44 -0800 (PST) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: projects/clang380-import -r294962+ based powerpc (32-bit) buildworld -j 6: make gets SEGV, a partial smoking gun? Message-Id: Date: Thu, 4 Feb 2016 23:46:50 -0800 To: FreeBSD PowerPC ML Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Feb 2016 07:46:54 -0000 The problem: For a clang 3.8.0 based buildworld TARGET_ARCH=3Dpowerpc installation = attempting "make -j 6 buildworld" (run on 4 powerpc cores) eventually = gets a segmentation fault in a make instance. (More details later.) = "make buildworld" does not fault. I expect that the details that I describe below implies some form of = intermittency, such as a race condition. (This is with the content of sys/powerpc/powerpc/sigcode32.S -r295186 in = place so that signal delivery maintains the modulo 16 byte stack/frame = alignment status instead of changing the alignment.) (clang 3.8.0 targeting powerpc (32-bit) is known to be able to introduce = more stack alignment dependencies by sometimes "or-ing" in offset-bits = into some aligned-address lower bits instead of using addition. But I do = not know if that is involved here somehow.) What is always involved and what varies: In all cases the failure was r31 being used as a frame-pointer with the = value zero in r31 at the time of the address calculation, even when when = the dereference of the address was later. r1 still seemed to be a valid = stack pointer in all cases. In every case the faulting routine had called one or more routines = during its operation --and those had returned. There was an example or = two of a self-contained routine that was recursive that got the failure. In some cases prior calls in the faulting routine had non-zero r31 = values when they returned. (There was later r31 usage that did not = fault.) Overall the call chains varied widely for various example faults, = although some call context is more common as a failure point. Use of ktrace with "-di -t cs" and use of kdump for extracting for the = failing process shows the same 5 line sequence before every example = "PSIG SIGSEGV". What was before those 5 lines varied across the various = kdsump outputs. I used ktrace/kdump commands of the structure: ktrace -di -f /usr/obj/make.out -t cs -p ??? kdump -E -f /usr/obj/make.out -p ??? > = /var/tmp/make_ktrace_sigsegv_??.txt Example results (showing the 5 lines and PSIG SIGSEGV): (3 prior "sigreturn JUSTRETURN" among what is not shown) > 65158 make 0.205791 PSIG SIGCHLD caught handler=3D0x180aae0 = mask=3D0x0 code=3DCLD_EXITED > 65158 make 0.205822 CALL write(0x3,0x189e914,0x1) > 65158 make 0.205847 RET write 1 > 65158 make 0.205869 CALL sigreturn(0xffffbb50) > 65158 make 0.205923 RET sigreturn JUSTRETURN > 65158 make 0.205962 PSIG SIGSEGV SIG_DFL code=3DSEGV_MAPERR (365 prior "sigreturn JUSTRETURN" among what is not shown) > 599 make 5.552305 PSIG SIGCHLD caught handler=3D0x180aae0 = mask=3D0x0 code=3DCLD_EXITED > 599 make 5.552323 CALL write(0x3,0x189e914,0x1) > 599 make 5.552337 RET write 1 > 599 make 5.552347 CALL sigreturn(0xffffbb30) > 599 make 5.552358 RET sigreturn JUSTRETURN > 599 make 5.552381 PSIG SIGSEGV SIG_DFL code=3DSEGV_MAPERR (287 prior "sigreturn JUSTRETURN" among what is not shown) > 75728 make 4.141097 PSIG SIGCHLD caught handler=3D0x180aae0 = mask=3D0x0 code=3DCLD_EXITED > 75728 make 4.141116 CALL write(0x3,0x189e914,0x1) > 75728 make 4.141154 RET write 1 > 75728 make 4.141349 CALL sigreturn(0xffffbaa0) > 75728 make 4.141366 RET sigreturn JUSTRETURN > 75728 make 4.141404 PSIG SIGSEGV SIG_DFL code=3DSEGV_MAPERR (273 prior "sigreturn JUSTRETURN" among what is not shown) > 12195 make 27.213277 PSIG SIGCHLD caught handler=3D0x180aae0 = mask=3D0x0 code=3DCLD_EXITED > 12195 make 27.213322 CALL write(0x3,0x189e914,0x1) > 12195 make 27.213346 RET write 1 > 12195 make 27.213361 CALL sigreturn(0xffffb1e0) > 12195 make 27.213383 RET sigreturn JUSTRETURN > 12195 make 27.213418 PSIG SIGSEGV SIG_DFL code=3DSEGV_MAPERR (789 prior "sigreturn JUSTRETURN" among what is not shown) > 50545 make 80.255162 PSIG SIGCHLD caught handler=3D0x180aae0 = mask=3D0x0 code=3DCLD_EXITED > 50545 make 80.255192 CALL write(0x3,0x189e914,0x1) > 50545 make 80.255219 RET write 1 > 50545 make 80.255241 CALL sigreturn(0xffffafa0) > 50545 make 80.255265 RET sigreturn JUSTRETURN > 50545 make 80.255317 PSIG SIGSEGV SIG_DFL code=3DSEGV_MAPERR The 5 line sequence is not sufficient for the problem to occur but = appears to be necessary: There were sometimes hundreds of prior "PSIG = SIGCHLD". . ."RET sigreturn JUSTRETURNS" sequences for which they were = not followed by "PSIG SIGSEGV". But every failure tested with ktrace has = the 5 lines as an immediate prefix in the list for the process. Which instance of make varied and where in make the fault happens = varied. The "-E" elapsed times above and those JUSTRETURN counts give a = solid clue to there being variability in when the fault happens. I'll use some script log file sizes for the buidlworld as another = indication of variability. I've sorted them: 2942664 3304207 3342660 3474585 3941983 so spanning from 2.9 MBytes to 3.9 MBytes. I've since gotten a few with = less and some with more. Note: A couple of times with ktrace being involved it failed at an = earlier stage than I've seen otherwise. It may be that ktrace being = involved makes the problem more likely/frequent. Context basics (quad core PowerMac running TARGET_ARCH=3Dpowerpc = (32-bit)): # freebsd-version -ku; uname -aKU 11.0-CURRENT 11.0-CURRENT FreeBSD FBSDG4C1 11.0-CURRENT FreeBSD 11.0-CURRENT #2 r294962M: Mon Feb = 1 00:31:03 PST 2016 = markmi@FreeBSDx64:/usr/obj/clang_gcc421/powerpc.powerpc/usr/src/sys/GENERI= Cvtsc-NODEBUG powerpc 1100097 1100097 This is with the content of sys/powerpc/powerpc/sigcode32.S -r295186 in = place so that signal delivery maintains the modulo 16 byte stack/frame = alignment status instead of changing the alignment. buildkernel was via gcc 4.2.1 buildworld was via clang 3.8.0 I'm not sure that I'm going to get much farther in tracking down the = source of the race(?) that leads to the SEGV's. =3D=3D=3D Mark Millard markmi at dsl-only.net