From owner-freebsd-arm@freebsd.org Thu Jan 26 22:17:32 2017 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6F0F2CC372D for ; Thu, 26 Jan 2017 22:17:32 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-70.reflexion.net [208.70.210.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 30652D30 for ; Thu, 26 Jan 2017 22:17:31 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 3366 invoked from network); 26 Jan 2017 22:18:00 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 26 Jan 2017 22:18:00 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.20.1) with SMTP; Thu, 26 Jan 2017 17:17:30 -0500 (EST) Received: (qmail 23698 invoked from network); 26 Jan 2017 22:17:30 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 26 Jan 2017 22:17:30 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id A0355EC901D; Thu, 26 Jan 2017 14:17:29 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: qemu-arm-static appears to have problems with signal delivery during (at least) poudrirer-devel based cross builds of some ports with ALLOW_MAKE_JOBS=yes From: Mark Millard In-Reply-To: <5AB92372-6862-4F60-84B2-9B3E7B7FF3C9@dsl-only.net> Date: Thu, 26 Jan 2017 14:17:29 -0800 Cc: Michal Meloun Content-Transfer-Encoding: quoted-printable Message-Id: <9D8010B2-02A7-4B01-86F1-358329E3DB1A@dsl-only.net> References: <7AF92A3C-3563-4B2E-B14A-D6BAF30A16A2@dsl-only.net> <9d7129d7-da2d-18e9-38ae-06f3483450f7@freebsd.org> <4399212D-B4DD-460F-AD1B-9250FB412B38@dsl-only.net> <049fd4e6-209b-4385-48ed-f3413ab27e52@gmail.com> <5AB92372-6862-4F60-84B2-9B3E7B7FF3C9@dsl-only.net> To: Sean Bruno , freebsd-arm X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jan 2017 22:17:32 -0000 [Top post of new information confirming SIGCHLD handling.] lldb on an arm (bpim3) is able to interpret the qemu_gmake.core file in a useful way when also given a copy of the gmake! For "TCG temporary leak before 00021826" the symbol dump in addresses order shows: Dumping symbol table for 4 modules. Symtab, file =3D /usr/local/bin/gmake, num_symbols =3D 957 (sorted by = address): Debug symbol |Synthetic symbol ||Externally Visible ||| Index UserID DSX Type File Address/Value Load Address = Size Flags Name ------- ------ --- --------------- ------------------ ------------------ = ------------------ ---------- ---------------------------------- . . . [ 538] 6121 X Code 0x0000000000021820 0x00029820 = 0x0000000000000038 0x00000012 child_handler [ 592] 6175 X Code 0x0000000000021858 0x00029858 = 0x0000000000000d7c 0x00000012 reap_children . . . This looks like it tends to confirm the SIGCHLD handling is involved. And objdump on gmake shows: 00021820 push {fp, lr} 00021824 mov fp, sp 00021828 sub sp, sp, #8 0002182c mov r1, r0 00021830 str r0, [sp, #4] 00021834 movw r0, #36636 ; 0x8f1c 00021838 movt r0, #5 0002183c ldr r2, [r0] 00021840 add r2, r2, #1 00021844 str r2, [r0] 00021848 str r1, [sp] 0002184c bl 0002e9f0 00021850 mov sp, fp 00021854 pop {fp, pc} Interestingly 00021826 is between instructions and lldb reported for the registers: (lldb) register read General Purpose Registers: r0 =3D 0x9fffc0f8 r1 =3D 0x9fffc138 r2 =3D 0x000a18c0 r3 =3D 0xf4fde858 r4 =3D 0x9fffc138 r5 =3D 0xf4a00000 r6 =3D 0xb6db6db7 r7 =3D 0x00000012 r8 =3D 0xf4a0c000 r9 =3D 0xf4aa18c0 r10 =3D 0x9fffc260 r11 =3D 0x00000004 r12 =3D 0x9fffc0f8 sp =3D 0x9fffc0f8 lr =3D 0x9fffffcc pc =3D 0x00021822 cpsr =3D 0x80000030 i.e., the pc being 0x00021822 . That would be in the middle of the "push {fp, lr}" instruction and 4 bytes before the 00021826 figure. If it really tried to fetch an instruction at 0x00021822 that likely would also explain getting a SIGILL classification for the 4 bytes starting there. I have no clue how the odd-multiple-of-2 address is getting involved. But it does appear that sometimes signal delivery is messed up under qemu. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2017-Jan-26, at 10:04 AM, Mark Millard = wrote: > On 2017-Jan-26, at 5:54 AM, Michal Meloun = wrote: >=20 >> On 26.01.2017 5:26, Mark Millard wrote: >>> On 2017-Jan-25, at 12:27 PM, Sean Bruno = wrote: >>>=20 >>>> Mark: >>>>=20 >>>> There was a recent update this week that was submitted and accepted = to >>>> qemu-user-static. >>>>=20 >>>> Want to give it a spin again and see if you are able to make = progress? >>>>=20 >>>> sean "top poster for maximum effect" bruno >>>=20 >>> I updated my /usr/ports to -r432460 (from today) and rebuilt. >>> I the tried doing some poudriere -x -a arm.armv6 port builds >>> again, with ALLOW_MAKE_JOBS=3Dyes and -J 1 in use. >>>=20 >>> Unfortunately the qemu-user-static update did not fix the >>> problem I've been seeing. >>>=20 >>> An example extracted from a print/texinfo log still shows >>> "TCG temporary leak before 00021826": >>=20 >> I just rebuild print/texinfo without single problem. >> Well, with slightly different CFLAGS >> CFLAGS+=3D -O2 -munaligned-access -mcpu=3Dcortex-a15 -fno-builtin-sin >> -fno-builtin-cos >>=20 >> Michal >=20 > I had already reported that on retries the failure point in > the overall sequence for the port either changes or the build > completes for whatever I was trying to build that initially > failed. (I did not repeat that in the new report.) >=20 > When I retried print/texinfo built okay. >=20 > I've never gotten anything large like lang/gcc6 with a full > bootstrap to complete with ALLOW_MAKE_JOBS=3Dyes and -J 1 in use > --no where near doing so. (In my context ALLOW_MAKE_JOBS means > what portmaster would do for -j 4. Poudriere seem to give no > control of this [-J is a different issue].) >=20 > I've also not been able to use gdb on the .core produced: > a qemu_gmake.core file extracted from the compressed tar > archive of the failed work directory. file on it reports. . . >=20 > # file /root/poudriere_failure/work/.build/qemu_gmake.core > /root/poudriere_failure/work/.build/qemu_gmake.core: ELF 32-bit LSB = core file ARM, version 1 (FreeBSD), FreeBSD-style, from 'ke' >=20 > (I suspect that "version 1 (FreebSD)" is not really > intended to be supported as stands.) >=20 > I submitted bugzilla 216132 as a segmentation fault report > against devel/gdb but the patch that was tried just allowed > gdb to get farther but show other problems and still fail > overall on handling qemu_gmake.core. See 216132. >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net >=20 >> .... >> mv warn-on-use.h-t warn-on-use.h >> /bin/mkdir -p sys >> rm -f sys/types.h-t sys/types.h && \ >> { echo '/* DO NOT EDIT! GENERATED AUTOMATICALLY! */'; \ >> sed -e 's|@''GUARD_PREFIX''@|GL|g' \ >> -e 's|@''INCLUDE_NEXT''@|include_next|g' \ >> -e 's|@''PRAGMA_SYSTEM_HEADER''@|#pragma GCC system_header|g' \ >> -e 's|@''PRAGMA_COLUMNS''@||g' \ >> -e 's|@''NEXT_SYS_TYPES_H''@||g' \ >> -e 's|@''WINDOWS_64_BIT_OFF_T''@|0|g' \ >> < ./sys_types.in.h; \ >> } > sys/types.h-t && \ >> mv sys/types.h-t sys/types.h >> rm -f unistd.h-t unistd.h && \ >> .. >>=20 >>=20 >>>=20 >>> . . . >>> rm -f sys/types.h-t sys/types.h && \ >>> { echo '/* DO NOT EDIT! GENERATED AUTOMATICALLY! */'; \ >>> sed -e 's|@''GUARD_PREFIX''@|GL|g' \ >>> -e 's|@''INCLUDE_NEXT''@|include_next|g' \ >>> -e 's|@''PRAGMA_SYSTEM_HEADER''@|#pragma GCC system_header|g' \ >>> -e 's|@''PRAGMA_COLUMNS''@||g' \ >>> -e 's|@''NEXT_SYS_TYPES_H''@||g' \ >>> -e 's|@''WINDOWS_64_BIT_OFF_T''@|0|g' \ >>> < ./sys_types.in.h; \ >>> } > sys/types.h-t && \ >>> mv sys/types.h-t sys/types.h >>> TCG temporary leak before 00021826 >>> qemu: uncaught target signal 4 (Illegal instruction) - core dumped >>> Illegal instruction >>> gmake[2]: *** [Makefile:1174: all-recursive] Error 1 >>> gmake[2]: Leaving directory = '/wrkdirs/usr/ports/print/texinfo/work/texinfo-6.1' >>> gmake[1]: *** [Makefile:1113: all] Error 2 >>> gmake[1]: Leaving directory = '/wrkdirs/usr/ports/print/texinfo/work/texinfo-6.1' >>> =3D=3D=3D> Compilation failed unexpectedly. >>> Try to set MAKE_JOBS_UNSAFE=3Dyes and rebuild before reporting the = failure to >>> the maintainer. >>> *** Error code 1 >>>=20 >>> Stop. >>> make: stopped in /usr/ports/print/texinfo >>> =3D=3D=3D=3D>> Cleaning up wrkdir >>> =3D=3D=3D> Cleaning for texinfo-6.1.20160425,1 >>> build of print/texinfo ended at Wed Jan 25 20:08:32 PST 2017 >>> build time: 00:06:57 >>> !!! build failure encountered !!! >>>=20 >>>=20 >>> =3D=3D=3D >>> Mark Millard >>> markmi at dsl-only.net >>>=20 >>> On 01/15/17 07:09, Mark Millard wrote: >>>> On 2017-Jan-14, at 10:53 PM, Mark Millard = wrote: >>>>=20 >>>>> [Context: head (12) -r312009 and ports head -r431413.] >>>>>=20 >>>>> I've been experimenting on amd64 with poudriere-devel with -x >>>>> for -a arm.armv6 and I ran into: >>>>>=20 >>>>>> TCG temporary leak before 00021826 >>>>>> qemu: uncaught target signal 4 (Illegal instruction) - core = dumped >>>>>=20 >>>>> in 3 of the 31 ports for the build, but 4 skipped so 3 of 27 >>>>> attempted. The 00021826 is the same number in all the examples >>>>> so far (whatever its base). >>>>>=20 >>>>> These seem to be the only TCG messages and each failure starts = with >>>>> one and then reports the qemu message. (Also true for the below.) >>>>> As far as I can tell the TCG notice is the report of an internal >>>>> qemu problem that is then translated into an Illegal instruction. >>>>>=20 >>>>> This was with ALLOW_MAKE_JOBS=3Dyes but -J 1 for poudriere. >>>>>=20 >>>>> For 2 of the problem ports retries worked, still using >>>>> ALLOW_MAKE_JOBS=3Dyes and -J 1 . >>>>>=20 >>>>> But the 3rd port failed each time tried with ALLOW_MAKE_JOBS=3Dyes >>>>> --but in a different step each time. >>>>>=20 >>>>> In all failure cases it was gmake that got the "illegal = instruction". >>>>>=20 >>>>> But disabling ALLOW_MAKE_JOBS=3Dyes appears (so far) to avoid the >>>>> issue. For example, that 3rd failing port built fine. (I've >>>>> been doing more ports since, with ALLOW_MAKE_JOBS=3Dyes repeatedly >>>>> failing and lack of it working.) >>>>>=20 >>>>> My guess is SIGCHLD delivery sometimes touches something (or a = timing) >>>>> that is not handled well in qemu-arm-static. I've had not problems >>>>> on an rpi2 or bpim3 in the past. >>>>>=20 >>>>> (I have seen some analogous "soemtimes" issues on powerpc under >>>>> and version of lang that mishandled the stack part of the ABI >>>>> FreeBSD uses, SIGCHLD sometimes getting on the stack at a bad-time >>>>> for the messed up code generation, leading to stack corruption. = Code >>>>> not getting signals had no problems.) >>>>>=20 >>>>> Note: The amd64 context is FreeBSD under VirtualBox under macOS >>>>> and it has had no problem for native builds of world, kernel, >>>>> or ports. >>>>=20 >>>> Avoiding ALLOW_MAKE_JOBS=3Dyes is not sufficient to guarantee = builds >>>> will work. Here is one that got near the end before failing the >>>> same way: >>>>=20 >>>> . . . >>>> install -m 0644 = /wrkdirs/usr/ports/devel/arm-none-eabi-gcc/work/gcc-6.3.0/gcc/cp/type-util= s.h = /wrkdirs/usr/ports/devel/arm-none-eabi-gcc/work/stage/usr/local/lib/gcc/ar= m-none-eabi/6.3.0/plugin/include/cp/type-utils.h >>>> install: DONTSTRIP set - will not strip installed binaries >>>> TCG temporary leak before 00021826 >>>> qemu: uncaught target signal 4 (Illegal instruction) - core dumped >>>> gmake[1]: *** [Makefile:4176: install-gcc] Illegal instruction >>>> gmake[1]: Leaving directory = '/wrkdirs/usr/ports/devel/arm-none-eabi-gcc/work/.build' >>>> *** Error code 2 >>>>=20 >>>> Stop. >>>> make: stopped in /usr/ports/devel/arm-none-eabi-gcc >>>> =3D=3D=3D=3D>> Cleaning up wrkdir >>>> =3D=3D=3D> Cleaning for arm-none-eabi-gcc-6.3.0 >>>> build of devel/arm-none-eabi-gcc ended at Sun Jan 15 00:04:02 PST = 2017 >>>> build time: 02:52:28 >>>> !!! build failure encountered !!! >>>>=20 >>>>=20 >>>> Going back to the earlier initial problem (that I happen to have = the >>>> material for handy): expanding the .tbz of the failed build and = finding >>>> the core showed: >>>>=20 >>>> # find . -name "*.core" -exec file {} \; = = ./work/binutils-2.27/ld/qemu_gmake.core: ELF 32-bit LSB core file ARM, = version 1 (FreeBSD), FreeBSD-style, from 'ke' >>>>=20 >>>> [I've not figured out what I can do with that --or how.] >>>>=20 >>>>=20 >>>> One thing unusual on my part is that I use -mcpu=3Dcortex-a7 . That >>>> matches how I historically buildworld buildkernel for installation >>>> on the rpi2 and bpim3. I've never had problems like this with >>>> builds on the rpi2 or the bpim3 (buildworld, buildkernel, port >>>> builds). It might be that qemu-arm-static has a problem with >>>> -mcpu=3Dcortex-a7 code that is generated --but not always. >>>>=20 >>>> Using the make.conf as an example: >>>>=20 >>>> # more /usr/local/etc/poudriere.d/head-cortex-a7-make.conf >>>> WANT_QT_VERBOSE_CONFIGURE=3D1 >>>> # >>>> DEFAULT_VERSIONS+=3Dperl5=3D5.24 >>>> WITH_DEBUG=3D >>>> WITH_DEBUG_FILES=3D >>>> MALLOC_PRODUCTION=3D >>>> # >>>> #system clang 3.8+ (gcc6 rejects -march=3Darmv7a): >>>> #CFLAGS+=3D -march=3Darmv7-a -mcpu=3Dcortex-a7 >>>> #CXXFLAGS+=3D -march=3Darmv7-a -mcpu=3Dcortex-a7 >>>> #CPPFLAGS+=3D -march=3Darmv7-a -mcpu=3Dcortex-a7 >>>> # >>>> #lang/gcc6's xgcc stage considers the above conflicting so use = just: >>>> CFLAGS+=3D -mcpu=3Dcortex-a7 >>>> CXXFLAGS+=3D -mcpu=3Dcortex-a7 >>>> CPPFLAGS+=3D -mcpu=3Dcortex-a7 >>>>=20 >>>>=20 >>>> For my context poudriere with -x for -a arm.armv6 and the use of >>>> qemu-arm-static does not look reliable enough to depend on. It is >>>> not obvious that the -x use contributes to the problem: it may well >>>> not. >>>>=20 >>>> =3D=3D=3D >>>> Mark Millard >>>> markmi at dsl-only.net