Date: Tue, 31 Jan 2017 18:39:05 -0800 From: Mark Millard <markmigm@gmail.com> To: Tom Vijlbrief <tvijlbrief@gmail.com>, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: Arm64 stack issues (was Re: FreeBSD status for/on ODroid-C2?) Message-ID: <EB1D79C2-CF5E-4C21-BA1B-EC9F34BB737E@gmail.com> In-Reply-To: <54642E5C-D5D6-45B7-BB74-2407CFB351C2@dsl-only.net> References: <CAOQrpVfK-Dw_rSo_YVY5MT1wbc6Ah-Pj%2BWv8UGjeiUQ1b3%2B-mg@mail.gmail.com> <20170124191357.0ec0abfd@zapp> <20170128010138.iublazyrhhqycn37@mutt-hardenedbsd> <20170128010223.tjivldnh7pyenbg6@mutt-hardenedbsd> <CAOQrpVfxKvSR5PoahnqEsYspHhjjOGJ8iCBUetKxRV57oX_aUg@mail.gmail.com> <009857E3-35BB-4DE4-B3BB-5EC5DDBB5B06@dsl-only.net> <CAOQrpVdKyP2T0V77sfpuKbNP3ARoD1EcwtH6E9o7p5KF%2B=A56A@mail.gmail.com> <CB36F13F-85E9-41D2-A7F3-DA183BE5985A@dsl-only.net> <890B7D8A-27FF-41AC-8291-1858393EC7B1@gmail.com> <54642E5C-D5D6-45B7-BB74-2407CFB351C2@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
[Show .core file creation times instead.] On 2017-Jan-31, at 6:30 PM, Mark Millard <markmi at dsl-only.net> wrote: > [Just adding more accurate/precise times for the .core files.] > [The original was accidentally sent from the "wrong" E-mail account > but I've adjusted that here.] >=20 > On 2017-Jan-31, at 12:35 PM, Mark Millard <markmi at dsl-only.net> = wrote: >=20 >> [More notes on what I observe on a pine64 from head -r312982 .] >>=20 >> On 2017-Jan-28, at 2:17 PM, Tom Vijlbrief <tvijlbrief at gmail.com> = wrote: >>=20 >>> Note that on the pine64 the network interface hangs from time to = time and I get a core dump with very low frequency from long running = processes, eg the shell that invokes "make world". >>=20 >> I got sh crashes (multiple processes in the same time frame) from >> just trying to build pkg: >>=20 >> make[5]: stopped in = /usr/obj/portswork/usr/ports/ports-mgmt/pkg/work/pkg-1.9.4/libpkg >> *** [all-recursive] Error code 1 >>=20 >> # ls -lt /var/crash/ >> total 41764 >> -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13676.core >> -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13511.core >> -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13499.core >> -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.12095.core >> -rw-r--r-- 1 root wheel 5 Nov 3 10:18 minfree >>=20 >> In all the crashes lldb on the .core shows that the pc was no longer >> pointing a memory with code in it. It is interesting that all >> 4 sh instances died at about the same time. >=20 > More time detail (using -T): >=20 > -rw------- 1 root wheel 4702208 Jan 31 03:15:44 2017 sh.13676.core > -rw------- 1 root wheel 4702208 Jan 31 03:15:43 2017 sh.13511.core > -rw------- 1 root wheel 4702208 Jan 31 03:15:42 2017 sh.13499.core > -rw------- 1 root wheel 4702208 Jan 31 03:15:32 2017 sh.12095.core I should have used creation times: # ls -UTlt /var/crash/ . . . -rw------- 1 root wheel 4702208 Jan 31 03:15:42 2017 sh.13676.core -rw------- 1 root wheel 4702208 Jan 31 03:15:41 2017 sh.13511.core -rw------- 1 root wheel 4702208 Jan 31 03:15:41 2017 sh.13499.core -rw------- 1 root wheel 4702208 Jan 31 03:15:30 2017 sh.12095.core >> SIGILL, SIGSEGV, SIGBUS, and SIGILL (again) from the non-code >> consequences. >>=20 >> The two SIGILL's have some interesting similarities to each other. >> So I list them first below. x0-x3, x8-x9, x13, x17, x27, and cpsr >> all match in these two. x1=3Dld-elf.so.1`_rtld_tlsdesc, >> x17=3Dlibc.so.7`__free at jemalloc_jemalloc.c:2007, >> x23=3Dld-elf.so.1`symlook_global + 124 at rtld.c:3916, >> x27=3Dsh..bss + 6336. >>=20 >> The other two have the following in common: >> x10-x12, x16-x17. x17=3Dlibc.so.7`close at close.c:48 . >>=20 >> x18 =3D 0xaaaaaaaaaaaaaaab is common between one SIGILL and one not. >>=20 >> Only one does not have x27=3Dsh..bss + 6336. It instead has: >> x28=3Dsh..bss + 6336 . >>=20 >> (lldb) bt >> * thread #1: tid =3D 100142, 0x000000004044f800, name =3D 'sh', stop = reason =3D signal SIGILL >> * frame #0: 0x000000004044f800 >> (lldb) register read >> General Purpose Registers: >> x0 =3D 0x0000000000000000 >> x1 =3D 0x00000000404346e8 ld-elf.so.1`_rtld_tlsdesc >> x2 =3D 0x0000000040a00000 >> x3 =3D 0x0000000000000002 >> x4 =3D 0x0000000000000050 >> x5 =3D 0x0000000040a4c9c0 >> x6 =3D 0x2e2e2f2e2e2f2e2e >> x7 =3D 0x6c6f6f7462696c2f >> x8 =3D 0x0000000000000001 >> x9 =3D 0x0000000000000000 >> x10 =3D 0x00000000000000df >> x11 =3D 0x000000000000002f >> x12 =3D 0x0000000040a0e690 >> x13 =3D 0x0000000000000427 >> x14 =3D 0x0000000000000001 >> x15 =3D 0x0000000000000000 >> x16 =3D 0x0000000000432340 =20 >> x17 =3D 0x000000004054cd00 libc.so.7`__free at = jemalloc_jemalloc.c:2007 >> x18 =3D 0x0000000000000000 >> x19 =3D 0x000000004044e330 >> x20 =3D 0x000000001c93deed >> x21 =3D 0x0000000007ab9b5c >> x22 =3D 0x00000000404ba7b0 =20 >> x23 =3D 0x000000004043c4b0 ld-elf.so.1`symlook_global + 124 at = rtld.c:3916 >> x24 =3D 0x0000ffffffffd2d0 >> x25 =3D 0x0000ffffffffd370 >> x26 =3D 0x0000ffffffffd340 >> x27 =3D 0x0000000000434000 sh..bss + 6336 >> x28 =3D 0x0000000040a4c1b0 >> fp =3D 0x0000ffff00000001 >> lr =3D 0x000000004044f800 >> sp =3D 0x0000ffffffffd2a0 >> pc =3D 0x000000004044f800 >> cpsr =3D 0x60000000 >> (lldb) disass >> -> 0x4044f800: .long 0xd550b87a ; unknown opcode >> 0x4044f804: .long 0x00000000 ; unknown opcode >> 0x4044f808: .long 0x00000001 ; unknown opcode >> 0x4044f80c: .long 0x00000000 ; unknown opcode >> 0x4044f810: .long 0x4044fc00 ; unknown opcode >> 0x4044f814: .long 0x00000000 ; unknown opcode >> 0x4044f818: .long 0x4044f410 ; unknown opcode >> 0x4044f81c: .long 0x00000000 ; unknown opcode >>=20 >> (lldb) thread list >> Process 0 stopped >> * thread #1: tid =3D 100161, 0x0000ffffffffee68, name =3D 'sh', stop = reason =3D signal SIGILL >> (lldb) register read >> General Purpose Registers: >> x0 =3D 0x0000000000000000 >> x1 =3D 0x00000000404346e8 ld-elf.so.1`_rtld_tlsdesc >> x2 =3D 0x0000000040a00000 >> x3 =3D 0x0000000000000002 >> x4 =3D 0x0000000000000017 >> x5 =3D 0x00080002a0290a00 >> x6 =3D 0x0000000000434c28 sh..bss + 9448 >> x7 =3D 0x000000000005e1cd >> x8 =3D 0x0000000000000001 >> x9 =3D 0x0000000000000000 >> x10 =3D 0x0000000000000000 >> x11 =3D 0x0000000040a5c000 >> x12 =3D 0x0000000040a0e670 >> x13 =3D 0x0000000000000427 >> x14 =3D 0x000000000000000d >> x15 =3D 0x0000000000432740 sh..bss + 0 >> x16 =3D 0x0000000000432340 =20 >> x17 =3D 0x000000004054cd00 libc.so.7`__free at = jemalloc_jemalloc.c:2007 >> x18 =3D 0xaaaaaaaaaaaaaaab >> x19 =3D 0x0000ffffffffee18 >> x20 =3D 0x0000ffffffffedb4 >> x21 =3D 0x0000ffffffffed80 >> x22 =3D 0x0000ffffffffed59 >> x23 =3D 0x0000ffffffffed47 >> x24 =3D 0x0000ffffffffed38 >> x25 =3D 0x0000ffffffffed28 >> x26 =3D 0x0000ffffffffed20 >> x27 =3D 0x0000000000434000 sh..bss + 6336 >> x28 =3D 0x0000000040a803a0 >> fp =3D 0x0000ffffffffee59 >> lr =3D 0x0000ffffffffee68 >> sp =3D 0x0000ffffffffe1a0 >> pc =3D 0x0000ffffffffee68 >> cpsr =3D 0x60000000 >> (lldb) disass >> -> 0xffffffffee68: .long 0x44504d54 ; unknown opcode >> 0xffffffffee6c: .long 0x2f3d5249 ; unknown opcode >> 0xffffffffee70: .long 0x00706d74 ; unknown opcode >> 0xffffffffee74: .long 0x4c454853 ; unknown opcode >> 0xffffffffee78: .long 0x622f3d4c ; unknown opcode >> 0xffffffffee7c: .long 0x732f6e69 ; unknown opcode >> 0xffffffffee80: .long 0x4f430068 ; unknown opcode >> 0xffffffffee84: .long 0x4749464e ; unknown opcode >>=20 >> (lldb) bt >> * thread #1: tid =3D 100088, 0x356c7265702f676e, name =3D 'sh', stop = reason =3D signal SIGBUS >> * frame #0: 0x356c7265702f676e >> (lldb) register read >> General Purpose Registers: >> x0 =3D 0x0000000000000000 >> x1 =3D 0x0000000000000000 >> x2 =3D 0x0000000040a00000 >> x3 =3D 0x0000000000000005 >> x4 =3D 0x0000000000000038 >> x5 =3D 0x0000000040a754e5 >> x6 =3D 0x584946455250442d >> x7 =3D 0x6c2f7273752f223d >> x8 =3D 0x0000000000000000 >> x9 =3D 0x0000000000000000 >> x10 =3D 0x0000000000434000 sh..bss + 6336 >> x11 =3D 0x0000000000000000 >> x12 =3D 0x0000000000434217 sh..bss + 6871 >> x13 =3D 0x0000000000434000 sh..bss + 6336 >> x14 =3D 0x0000000000432000 sh`__frame_dummy_init_array_entry >> x15 =3D 0x000000000000003d >> x16 =3D 0x00000000004322b0 =20 >> x17 =3D 0x000000004050d090 libc.so.7`close at close.c:48 >> x18 =3D 0xaaaaaaaaaaaaaaab >> x19 =3D 0x766564206f666e69 >> x20 =3D 0x7865646e692f746e >> x21 =3D 0x69727020676b702f >> x22 =3D 0x746d676d2d737472 >> x23 =3D 0x6f7020656d69746e >> x24 =3D 0x75722d7478657474 >> x25 =3D 0x65672f6c65766564 >> x26 =3D 0x206e6f7369622f6c >> x27 =3D 0x0000000040a53716 >> x28 =3D 0x0000000000434000 sh..bss + 6336 >> fp =3D 0x616c20346d2f6c65 >> lr =3D 0x356c7265702f676e >> sp =3D 0x0000ffffffffe740 >> pc =3D 0x356c7265702f676e >> cpsr =3D 0x20000000 >>=20 >> (lldb) disass >> error: core file does not contain 0x356c7265702f676e >> error: Failed to disassemble memory at 0xffffffffffffffff. >>=20 >>=20 >>=20 >> (lldb) bt >> * thread #1: tid =3D 100186, 0x0000000000000000, name =3D 'sh', stop = reason =3D signal SIGSEGV >> * frame #0: 0x0000000000000000 >> (lldb) disass >> error: core file does not contain 0x0 >> error: Failed to disassemble memory at 0xffffffffffffffff. >> (lldb) register read >> General Purpose Registers: >> x0 =3D 0x0000000000000000 >> x1 =3D 0x0000000000000000 >> x2 =3D 0x0000000000000002 >> x3 =3D 0x0000000000006c6f >> x4 =3D 0x0000000040a50bb3 >> x5 =3D 0x0000000040a499ba >> x6 =3D 0x6f7462696c2f2e2e >> x7 =3D 0x6c6f6f7462696c2f >> x8 =3D 0x0000000000000000 >> x9 =3D 0x0000000000000000 >> x10 =3D 0x0000000000434000 sh..bss + 6336 >> x11 =3D 0x0000000000000000 >> x12 =3D 0x0000000040a499f8 >> x13 =3D 0x0000000000434000 sh..bss + 6336 >> x14 =3D 0x0000000000000001 >> x15 =3D 0x0000000000000000 >> x16 =3D 0x00000000004322b0 =20 >> x17 =3D 0x000000004050d090 libc.so.7`close at close.c:48 >> x18 =3D 0x0000000000000000 >> x19 =3D 0x0000000000000065 >> x20 =3D 0x0000000000000065 >> x21 =3D 0x00000000004168f0 sh`readtoken1 + 5212 at = parser.c:1602 >> x22 =3D 0x0000ffffffffda90 >> x23 =3D 0x0000000040a498c0 >> x24 =3D 0x000000000000000a >> x25 =3D 0x0000000000000000 >> x26 =3D 0x0000000000000000 >> x27 =3D 0x0000000040a49258 >> x28 =3D 0x0000000000434000 sh..bss + 6336 >> fp =3D 0x0000ffffffffda08 >> lr =3D 0x0000000000000000 >> sp =3D 0x0000ffffffffd970 >> pc =3D 0x0000000000000000 >> cpsr =3D 0x20000000 >>=20 >>=20 >> Looks to me like something major is wrong. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2017-Jan-30, at 11:57 PM, Mark Millard <markmi at dsl-only.net> = wrote: > I updated to head -r312982 on the pine64 that I have access to: >=20 > # uname -apKU > FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r312982M arm64 = aarch64 1200020 1200020 >=20 > after several months of not using the pine64. > ( -mcpu=3Dcortex-a53 used for buildworld buildkernel; > non-debug variant of GENERIC [GENERIC included > then overridden]; usb SSD root file system) >=20 > I find that any time some of the cores are busy I get thousands > of the gic0 spurious interrupt messages in fairly sort order. > (This is not new: it is unchanged.) >=20 > For example during either of: >=20 > openssl speed >=20 > or: >=20 > cp /dev/zero /dev/null > (similarly for copying actual files around, > local or nfs involved) >=20 > Once the cores are no longer busy the gic0 messages stop. >=20 > The "on CPU<?>" varies. The "last irq: <?>" varies. > (But 27 is the most common by far.) =3D=3D=3D Mark Millard markmi at dsl-only.net On 2017-Jan-28, at 2:17 PM, Tom Vijlbrief <tvijlbrief at gmail.com> = wrote: Note that on the pine64 the network interface hangs from time to time = and I get a core dump with very low frequency from long running = processes, eg the shell that invokes "make world". Note that I had = similar issues on the ODroid-C2. Currently rebuilding world without MALLOC_PRODUCTION. The arm64 port is getting close to working 100%, just a last few = glitches. Op 22:03 ZA 28 Jan 2017 schreef Mark Millard <markmi at dsl-only.net>: [About: "gic0: Spurious interrupt detected" on armv6 as well.] On 2017-Jan-28, at 6:43 AM, Tom Vijlbrief <tvijlbrief at gmail.com> = wrote: > Did a build/install world/kernel with r312916 and = MALLOC_PRODUCTION=3DYES on > a pine64, removed /etc/malloc.conf, rebooted >=20 > and I am now rebuilding the python2 port without problems so far = (except > the "gic0: Spurious interrupt detected" messages which reappeared = shortly > after my previous post) While very rare, I have seen the gic0 notices on armv6 (e.g., a bpim3) during large builds (with -j 4). Recently I got a: gic0: Spurious interrupt detected: last irq: 29 on CPU1 on: # uname -apKU FreeBSD bpim3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r312726M: Tue Jan 24 = 20:57:48 PST 2017 = markmi@FreeBSDx64:/usr/obj/bpim3_clang/arm.armv6/usr/src/sys/BPIM3-NODBG = arm armv6 1200020 1200020 while building devel/gcc6 (via a full bootstrap) via -j 4 . This is from a non-debug buildworld buildkernel context and has = MALLOC_PRODUCTION=3D in /etc/make.conf . No /etc/malloc.conf present. I do use = -mcpu=3Dcortex-a7 . Details if you care: # more /usr/src/sys/arm/conf/BPIM3-NODBG # # BPIM3 -- Custom configuration for the Banana Pi M3 # include "GENERIC" ident BPIM3-NODBG makeoptions DEBUG=3D-g # Build kernel with gdb(1) = debug symbols options ALT_BREAK_TO_DEBUGGER options KDB # Enable kernel debugger support # For minimum debugger support (stable branch) use: options KDB_TRACE # Print a stack trace for a = panic options DDB # Enable the kernel debugger # Extra stuff: #options VERBOSE_SYSINIT # Enable verbose sysinit = messages #options BOOTVERBOSE=3D1 #options BOOTHOWTO=3DRB_VERBOSE #options KTR #options KTR_MASK=3DKTR_TRAP ##options KTR_CPUMASK=3D0xF #options KTR_VERBOSE # Disable any extra checking for. . . nooptions DEADLKRES # Enable the deadlock resolver nooptions INVARIANTS # Enable calls of extra sanity = checking nooptions INVARIANT_SUPPORT # Extra sanity checks of = internal structures, required by INVARIANTS nooptions WITNESS # Enable checks to detect = deadlocks and cycles nooptions WITNESS_SKIPSPIN # Don't run witness on spinlocks = for speed nooptions DIAGNOSTIC It was a from cross build for buildworld buildkernel : (I've not checked on lldb builds linking recently.) # more ~/src.configs/src.conf.bpim3-clang-bootstrap.amd64-host TO_TYPE=3Darmv6 # KERNCONF=3DBPIM3-NODBG TARGET=3Darm .if ${.MAKE.LEVEL} =3D=3D 0 TARGET_ARCH=3D${TO_TYPE} .export TARGET_ARCH .endif # WITH_CROSS_COMPILER=3D WITHOUT_SYSTEM_COMPILER=3D # #CPUTYPE=3Dsoft WITH_LIBCPLUSPLUS=3D WITH_BINUTILS_BOOTSTRAP=3D WITH_CLANG_BOOTSTRAP=3D WITH_CLANG=3D WITH_CLANG_IS_CC=3D WITH_CLANG_FULL=3D WITH_CLANG_EXTRAS=3D WITH_LLD=3D # # Linking lldb fails for armv6(/v7) WITHOUT_LLDB=3D # WITH_BOOT=3D WITHOUT_LIB32=3D WITHOUT_LIBSOFT=3D # WITHOUT_ELFTOOLCHAIN_BOOTSTRAP=3D WITHOUT_GCC_BOOTSTRAP=3D WITHOUT_GCC=3D WITHOUT_GCC_IS_CC=3D WITHOUT_GNUCXX=3D # NO_WERROR=3D #WERROR=3D MALLOC_PRODUCTION=3D # WITH_REPRODUCIBLE_BUILD=3D WITH_DEBUG_FILES=3D # XCFLAGS+=3D -mcpu=3Dcortex-a7 XCXXFLAGS+=3D -mcpu=3Dcortex-a7 # There is no XCPPFLAGS but XCPP gets XCFLAGS content. Used for buildworld buildkernel : # more ~/src.configs/make.conf #MALLOC_PRODUCTION=3D #NO_WERROR=3D #WERROR=3D CFLAGS.gcc+=3D -v Used for port builds: # more /etc/make.conf WANT_QT_VERBOSE_CONFIGURE=3D1 # DEFAULT_VERSIONS+=3Dperl5=3D5.24 WRKDIRPREFIX=3D/usr/obj/portswork WITH_DEBUG=3D WITH_DEBUG_FILES=3D MALLOC_PRODUCTION=3D # svnlite status /usr/src/ | sort ? /usr/src/sys/amd64/conf/GENERIC-DBG ? /usr/src/sys/amd64/conf/GENERIC-NODBG ? /usr/src/sys/arm/conf/BPIM3-DBG ? /usr/src/sys/arm/conf/BPIM3-NODBG ? /usr/src/sys/arm/conf/RPI2-DBG ? /usr/src/sys/arm/conf/RPI2-NODBG ? /usr/src/sys/arm64/conf/GENERIC-DBG ? /usr/src/sys/arm64/conf/GENERIC-NODBG ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG ? /usr/src/sys/powerpc/conf/GENERICvtsc-DBG ? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td M /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp M /usr/src/lib/csu/powerpc64/Makefile M /usr/src/libexec/rtld-elf/Makefile M /usr/src/sys/boot/ofw/Makefile.inc M /usr/src/sys/boot/powerpc/Makefile.inc M /usr/src/sys/boot/powerpc/kboot/Makefile M /usr/src/sys/boot/uboot/Makefile.inc M /usr/src/sys/conf/kern.mk M /usr/src/sys/conf/kmod.mk M /usr/src/sys/ddb/db_main.c M /usr/src/sys/ddb/db_script.c M /usr/src/sys/modules/zfs/Makefile M /usr/src/sys/powerpc/ofw/ofw_machdep.c The M's are generally tied to powerpc64 and powerpc explorations. I tend to use the same source for all the TARGET_ARCH's that I build. =3D=3D=3D Mark Millard markmi at dsl-only.net _______________________________________________ freebsd-arm@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EB1D79C2-CF5E-4C21-BA1B-EC9F34BB737E>