Date: Tue, 31 Jan 2017 18:30:39 -0800 From: Mark Millard <markmi@dsl-only.net> To: Tom Vijlbrief <tvijlbrief@gmail.com> Cc: freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: Arm64 stack issues (was Re: FreeBSD status for/on ODroid-C2?) Message-ID: <54642E5C-D5D6-45B7-BB74-2407CFB351C2@dsl-only.net> In-Reply-To: <890B7D8A-27FF-41AC-8291-1858393EC7B1@gmail.com> References: <CAOQrpVfK-Dw_rSo_YVY5MT1wbc6Ah-Pj%2BWv8UGjeiUQ1b3%2B-mg@mail.gmail.com> <20170124191357.0ec0abfd@zapp> <20170128010138.iublazyrhhqycn37@mutt-hardenedbsd> <20170128010223.tjivldnh7pyenbg6@mutt-hardenedbsd> <CAOQrpVfxKvSR5PoahnqEsYspHhjjOGJ8iCBUetKxRV57oX_aUg@mail.gmail.com> <009857E3-35BB-4DE4-B3BB-5EC5DDBB5B06@dsl-only.net> <CAOQrpVdKyP2T0V77sfpuKbNP3ARoD1EcwtH6E9o7p5KF%2B=A56A@mail.gmail.com> <CB36F13F-85E9-41D2-A7F3-DA183BE5985A@dsl-only.net> <890B7D8A-27FF-41AC-8291-1858393EC7B1@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[Just adding more accurate/precise times for the .core files.] [The original was accidentally sent from the "wrong" E-mail account but I've adjusted that here.] On 2017-Jan-31, at 12:35 PM, Mark Millard <markmi at dsl-only.net> = wrote: > [More notes on what I observe on a pine64 from head -r312982 .] >=20 > On 2017-Jan-28, at 2:17 PM, Tom Vijlbrief <tvijlbrief at gmail.com> = wrote: >=20 >> Note that on the pine64 the network interface hangs from time to time = and I get a core dump with very low frequency from long running = processes, eg the shell that invokes "make world". >=20 > I got sh crashes (multiple processes in the same time frame) from > just trying to build pkg: >=20 > make[5]: stopped in = /usr/obj/portswork/usr/ports/ports-mgmt/pkg/work/pkg-1.9.4/libpkg > *** [all-recursive] Error code 1 >=20 > # ls -lt /var/crash/ > total 41764 > -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13676.core > -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13511.core > -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13499.core > -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.12095.core > -rw-r--r-- 1 root wheel 5 Nov 3 10:18 minfree >=20 > In all the crashes lldb on the .core shows that the pc was no longer > pointing a memory with code in it. It is interesting that all > 4 sh instances died at about the same time. More time detail (using -T): -rw------- 1 root wheel 4702208 Jan 31 03:15:44 2017 sh.13676.core -rw------- 1 root wheel 4702208 Jan 31 03:15:43 2017 sh.13511.core -rw------- 1 root wheel 4702208 Jan 31 03:15:42 2017 sh.13499.core -rw------- 1 root wheel 4702208 Jan 31 03:15:32 2017 sh.12095.core > SIGILL, SIGSEGV, SIGBUS, and SIGILL (again) from the non-code > consequences. >=20 > The two SIGILL's have some interesting similarities to each other. > So I list them first below. x0-x3, x8-x9, x13, x17, x27, and cpsr > all match in these two. x1=3Dld-elf.so.1`_rtld_tlsdesc, > x17=3Dlibc.so.7`__free at jemalloc_jemalloc.c:2007, > x23=3Dld-elf.so.1`symlook_global + 124 at rtld.c:3916, > x27=3Dsh..bss + 6336. >=20 > The other two have the following in common: > x10-x12, x16-x17. x17=3Dlibc.so.7`close at close.c:48 . >=20 > x18 =3D 0xaaaaaaaaaaaaaaab is common between one SIGILL and one not. >=20 > Only one does not have x27=3Dsh..bss + 6336. It instead has: > x28=3Dsh..bss + 6336 . >=20 > (lldb) bt > * thread #1: tid =3D 100142, 0x000000004044f800, name =3D 'sh', stop = reason =3D signal SIGILL > * frame #0: 0x000000004044f800 > (lldb) register read > General Purpose Registers: > x0 =3D 0x0000000000000000 > x1 =3D 0x00000000404346e8 ld-elf.so.1`_rtld_tlsdesc > x2 =3D 0x0000000040a00000 > x3 =3D 0x0000000000000002 > x4 =3D 0x0000000000000050 > x5 =3D 0x0000000040a4c9c0 > x6 =3D 0x2e2e2f2e2e2f2e2e > x7 =3D 0x6c6f6f7462696c2f > x8 =3D 0x0000000000000001 > x9 =3D 0x0000000000000000 > x10 =3D 0x00000000000000df > x11 =3D 0x000000000000002f > x12 =3D 0x0000000040a0e690 > x13 =3D 0x0000000000000427 > x14 =3D 0x0000000000000001 > x15 =3D 0x0000000000000000 > x16 =3D 0x0000000000432340 =20 > x17 =3D 0x000000004054cd00 libc.so.7`__free at = jemalloc_jemalloc.c:2007 > x18 =3D 0x0000000000000000 > x19 =3D 0x000000004044e330 > x20 =3D 0x000000001c93deed > x21 =3D 0x0000000007ab9b5c > x22 =3D 0x00000000404ba7b0 =20 > x23 =3D 0x000000004043c4b0 ld-elf.so.1`symlook_global + 124 at = rtld.c:3916 > x24 =3D 0x0000ffffffffd2d0 > x25 =3D 0x0000ffffffffd370 > x26 =3D 0x0000ffffffffd340 > x27 =3D 0x0000000000434000 sh..bss + 6336 > x28 =3D 0x0000000040a4c1b0 > fp =3D 0x0000ffff00000001 > lr =3D 0x000000004044f800 > sp =3D 0x0000ffffffffd2a0 > pc =3D 0x000000004044f800 > cpsr =3D 0x60000000 > (lldb) disass > -> 0x4044f800: .long 0xd550b87a ; unknown opcode > 0x4044f804: .long 0x00000000 ; unknown opcode > 0x4044f808: .long 0x00000001 ; unknown opcode > 0x4044f80c: .long 0x00000000 ; unknown opcode > 0x4044f810: .long 0x4044fc00 ; unknown opcode > 0x4044f814: .long 0x00000000 ; unknown opcode > 0x4044f818: .long 0x4044f410 ; unknown opcode > 0x4044f81c: .long 0x00000000 ; unknown opcode >=20 > (lldb) thread list > Process 0 stopped > * thread #1: tid =3D 100161, 0x0000ffffffffee68, name =3D 'sh', stop = reason =3D signal SIGILL > (lldb) register read > General Purpose Registers: > x0 =3D 0x0000000000000000 > x1 =3D 0x00000000404346e8 ld-elf.so.1`_rtld_tlsdesc > x2 =3D 0x0000000040a00000 > x3 =3D 0x0000000000000002 > x4 =3D 0x0000000000000017 > x5 =3D 0x00080002a0290a00 > x6 =3D 0x0000000000434c28 sh..bss + 9448 > x7 =3D 0x000000000005e1cd > x8 =3D 0x0000000000000001 > x9 =3D 0x0000000000000000 > x10 =3D 0x0000000000000000 > x11 =3D 0x0000000040a5c000 > x12 =3D 0x0000000040a0e670 > x13 =3D 0x0000000000000427 > x14 =3D 0x000000000000000d > x15 =3D 0x0000000000432740 sh..bss + 0 > x16 =3D 0x0000000000432340 =20 > x17 =3D 0x000000004054cd00 libc.so.7`__free at = jemalloc_jemalloc.c:2007 > x18 =3D 0xaaaaaaaaaaaaaaab > x19 =3D 0x0000ffffffffee18 > x20 =3D 0x0000ffffffffedb4 > x21 =3D 0x0000ffffffffed80 > x22 =3D 0x0000ffffffffed59 > x23 =3D 0x0000ffffffffed47 > x24 =3D 0x0000ffffffffed38 > x25 =3D 0x0000ffffffffed28 > x26 =3D 0x0000ffffffffed20 > x27 =3D 0x0000000000434000 sh..bss + 6336 > x28 =3D 0x0000000040a803a0 > fp =3D 0x0000ffffffffee59 > lr =3D 0x0000ffffffffee68 > sp =3D 0x0000ffffffffe1a0 > pc =3D 0x0000ffffffffee68 > cpsr =3D 0x60000000 > (lldb) disass > -> 0xffffffffee68: .long 0x44504d54 ; unknown opcode > 0xffffffffee6c: .long 0x2f3d5249 ; unknown opcode > 0xffffffffee70: .long 0x00706d74 ; unknown opcode > 0xffffffffee74: .long 0x4c454853 ; unknown opcode > 0xffffffffee78: .long 0x622f3d4c ; unknown opcode > 0xffffffffee7c: .long 0x732f6e69 ; unknown opcode > 0xffffffffee80: .long 0x4f430068 ; unknown opcode > 0xffffffffee84: .long 0x4749464e ; unknown opcode >=20 > (lldb) bt > * thread #1: tid =3D 100088, 0x356c7265702f676e, name =3D 'sh', stop = reason =3D signal SIGBUS > * frame #0: 0x356c7265702f676e > (lldb) register read > General Purpose Registers: > x0 =3D 0x0000000000000000 > x1 =3D 0x0000000000000000 > x2 =3D 0x0000000040a00000 > x3 =3D 0x0000000000000005 > x4 =3D 0x0000000000000038 > x5 =3D 0x0000000040a754e5 > x6 =3D 0x584946455250442d > x7 =3D 0x6c2f7273752f223d > x8 =3D 0x0000000000000000 > x9 =3D 0x0000000000000000 > x10 =3D 0x0000000000434000 sh..bss + 6336 > x11 =3D 0x0000000000000000 > x12 =3D 0x0000000000434217 sh..bss + 6871 > x13 =3D 0x0000000000434000 sh..bss + 6336 > x14 =3D 0x0000000000432000 sh`__frame_dummy_init_array_entry > x15 =3D 0x000000000000003d > x16 =3D 0x00000000004322b0 =20 > x17 =3D 0x000000004050d090 libc.so.7`close at close.c:48 > x18 =3D 0xaaaaaaaaaaaaaaab > x19 =3D 0x766564206f666e69 > x20 =3D 0x7865646e692f746e > x21 =3D 0x69727020676b702f > x22 =3D 0x746d676d2d737472 > x23 =3D 0x6f7020656d69746e > x24 =3D 0x75722d7478657474 > x25 =3D 0x65672f6c65766564 > x26 =3D 0x206e6f7369622f6c > x27 =3D 0x0000000040a53716 > x28 =3D 0x0000000000434000 sh..bss + 6336 > fp =3D 0x616c20346d2f6c65 > lr =3D 0x356c7265702f676e > sp =3D 0x0000ffffffffe740 > pc =3D 0x356c7265702f676e > cpsr =3D 0x20000000 >=20 > (lldb) disass > error: core file does not contain 0x356c7265702f676e > error: Failed to disassemble memory at 0xffffffffffffffff. >=20 >=20 >=20 > (lldb) bt > * thread #1: tid =3D 100186, 0x0000000000000000, name =3D 'sh', stop = reason =3D signal SIGSEGV > * frame #0: 0x0000000000000000 > (lldb) disass > error: core file does not contain 0x0 > error: Failed to disassemble memory at 0xffffffffffffffff. > (lldb) register read > General Purpose Registers: > x0 =3D 0x0000000000000000 > x1 =3D 0x0000000000000000 > x2 =3D 0x0000000000000002 > x3 =3D 0x0000000000006c6f > x4 =3D 0x0000000040a50bb3 > x5 =3D 0x0000000040a499ba > x6 =3D 0x6f7462696c2f2e2e > x7 =3D 0x6c6f6f7462696c2f > x8 =3D 0x0000000000000000 > x9 =3D 0x0000000000000000 > x10 =3D 0x0000000000434000 sh..bss + 6336 > x11 =3D 0x0000000000000000 > x12 =3D 0x0000000040a499f8 > x13 =3D 0x0000000000434000 sh..bss + 6336 > x14 =3D 0x0000000000000001 > x15 =3D 0x0000000000000000 > x16 =3D 0x00000000004322b0 =20 > x17 =3D 0x000000004050d090 libc.so.7`close at close.c:48 > x18 =3D 0x0000000000000000 > x19 =3D 0x0000000000000065 > x20 =3D 0x0000000000000065 > x21 =3D 0x00000000004168f0 sh`readtoken1 + 5212 at = parser.c:1602 > x22 =3D 0x0000ffffffffda90 > x23 =3D 0x0000000040a498c0 > x24 =3D 0x000000000000000a > x25 =3D 0x0000000000000000 > x26 =3D 0x0000000000000000 > x27 =3D 0x0000000040a49258 > x28 =3D 0x0000000000434000 sh..bss + 6336 > fp =3D 0x0000ffffffffda08 > lr =3D 0x0000000000000000 > sp =3D 0x0000ffffffffd970 > pc =3D 0x0000000000000000 > cpsr =3D 0x20000000 >=20 >=20 > Looks to me like something major is wrong. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2017-Jan-30, at 11:57 PM, Mark Millard <markmi at dsl-only.net> = wrote: > I updated to head -r312982 on the pine64 that I have access to: >=20 > # uname -apKU > FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r312982M arm64 = aarch64 1200020 1200020 >=20 > after several months of not using the pine64. > ( -mcpu=3Dcortex-a53 used for buildworld buildkernel; > non-debug variant of GENERIC [GENERIC included > then overridden]; usb SSD root file system) >=20 > I find that any time some of the cores are busy I get thousands > of the gic0 spurious interrupt messages in fairly sort order. > (This is not new: it is unchanged.) >=20 > For example during either of: >=20 > openssl speed >=20 > or: >=20 > cp /dev/zero /dev/null > (similarly for copying actual files around, > local or nfs involved) >=20 > Once the cores are no longer busy the gic0 messages stop. >=20 > The "on CPU<?>" varies. The "last irq: <?>" varies. > (But 27 is the most common by far.) =3D=3D=3D Mark Millard markmi at dsl-only.net On 2017-Jan-28, at 2:17 PM, Tom Vijlbrief <tvijlbrief at gmail.com> = wrote: Note that on the pine64 the network interface hangs from time to time = and I get a core dump with very low frequency from long running = processes, eg the shell that invokes "make world". Note that I had = similar issues on the ODroid-C2. Currently rebuilding world without MALLOC_PRODUCTION. The arm64 port is getting close to working 100%, just a last few = glitches. Op 22:03 ZA 28 Jan 2017 schreef Mark Millard <markmi at dsl-only.net>: [About: "gic0: Spurious interrupt detected" on armv6 as well.] On 2017-Jan-28, at 6:43 AM, Tom Vijlbrief <tvijlbrief at gmail.com> = wrote: > Did a build/install world/kernel with r312916 and = MALLOC_PRODUCTION=3DYES on > a pine64, removed /etc/malloc.conf, rebooted >=20 > and I am now rebuilding the python2 port without problems so far = (except > the "gic0: Spurious interrupt detected" messages which reappeared = shortly > after my previous post) While very rare, I have seen the gic0 notices on armv6 (e.g., a bpim3) during large builds (with -j 4). Recently I got a: gic0: Spurious interrupt detected: last irq: 29 on CPU1 on: # uname -apKU FreeBSD bpim3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r312726M: Tue Jan 24 = 20:57:48 PST 2017 = markmi@FreeBSDx64:/usr/obj/bpim3_clang/arm.armv6/usr/src/sys/BPIM3-NODBG = arm armv6 1200020 1200020 while building devel/gcc6 (via a full bootstrap) via -j 4 . This is from a non-debug buildworld buildkernel context and has = MALLOC_PRODUCTION=3D in /etc/make.conf . No /etc/malloc.conf present. I do use = -mcpu=3Dcortex-a7 . Details if you care: # more /usr/src/sys/arm/conf/BPIM3-NODBG # # BPIM3 -- Custom configuration for the Banana Pi M3 # include "GENERIC" ident BPIM3-NODBG makeoptions DEBUG=3D-g # Build kernel with gdb(1) = debug symbols options ALT_BREAK_TO_DEBUGGER options KDB # Enable kernel debugger support # For minimum debugger support (stable branch) use: options KDB_TRACE # Print a stack trace for a = panic options DDB # Enable the kernel debugger # Extra stuff: #options VERBOSE_SYSINIT # Enable verbose sysinit = messages #options BOOTVERBOSE=3D1 #options BOOTHOWTO=3DRB_VERBOSE #options KTR #options KTR_MASK=3DKTR_TRAP ##options KTR_CPUMASK=3D0xF #options KTR_VERBOSE # Disable any extra checking for. . . nooptions DEADLKRES # Enable the deadlock resolver nooptions INVARIANTS # Enable calls of extra sanity = checking nooptions INVARIANT_SUPPORT # Extra sanity checks of = internal structures, required by INVARIANTS nooptions WITNESS # Enable checks to detect = deadlocks and cycles nooptions WITNESS_SKIPSPIN # Don't run witness on spinlocks = for speed nooptions DIAGNOSTIC It was a from cross build for buildworld buildkernel : (I've not checked on lldb builds linking recently.) # more ~/src.configs/src.conf.bpim3-clang-bootstrap.amd64-host TO_TYPE=3Darmv6 # KERNCONF=3DBPIM3-NODBG TARGET=3Darm .if ${.MAKE.LEVEL} =3D=3D 0 TARGET_ARCH=3D${TO_TYPE} .export TARGET_ARCH .endif # WITH_CROSS_COMPILER=3D WITHOUT_SYSTEM_COMPILER=3D # #CPUTYPE=3Dsoft WITH_LIBCPLUSPLUS=3D WITH_BINUTILS_BOOTSTRAP=3D WITH_CLANG_BOOTSTRAP=3D WITH_CLANG=3D WITH_CLANG_IS_CC=3D WITH_CLANG_FULL=3D WITH_CLANG_EXTRAS=3D WITH_LLD=3D # # Linking lldb fails for armv6(/v7) WITHOUT_LLDB=3D # WITH_BOOT=3D WITHOUT_LIB32=3D WITHOUT_LIBSOFT=3D # WITHOUT_ELFTOOLCHAIN_BOOTSTRAP=3D WITHOUT_GCC_BOOTSTRAP=3D WITHOUT_GCC=3D WITHOUT_GCC_IS_CC=3D WITHOUT_GNUCXX=3D # NO_WERROR=3D #WERROR=3D MALLOC_PRODUCTION=3D # WITH_REPRODUCIBLE_BUILD=3D WITH_DEBUG_FILES=3D # XCFLAGS+=3D -mcpu=3Dcortex-a7 XCXXFLAGS+=3D -mcpu=3Dcortex-a7 # There is no XCPPFLAGS but XCPP gets XCFLAGS content. Used for buildworld buildkernel : # more ~/src.configs/make.conf #MALLOC_PRODUCTION=3D #NO_WERROR=3D #WERROR=3D CFLAGS.gcc+=3D -v Used for port builds: # more /etc/make.conf WANT_QT_VERBOSE_CONFIGURE=3D1 # DEFAULT_VERSIONS+=3Dperl5=3D5.24 WRKDIRPREFIX=3D/usr/obj/portswork WITH_DEBUG=3D WITH_DEBUG_FILES=3D MALLOC_PRODUCTION=3D # svnlite status /usr/src/ | sort ? /usr/src/sys/amd64/conf/GENERIC-DBG ? /usr/src/sys/amd64/conf/GENERIC-NODBG ? /usr/src/sys/arm/conf/BPIM3-DBG ? /usr/src/sys/arm/conf/BPIM3-NODBG ? /usr/src/sys/arm/conf/RPI2-DBG ? /usr/src/sys/arm/conf/RPI2-NODBG ? /usr/src/sys/arm64/conf/GENERIC-DBG ? /usr/src/sys/arm64/conf/GENERIC-NODBG ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG ? /usr/src/sys/powerpc/conf/GENERICvtsc-DBG ? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td M /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp M /usr/src/lib/csu/powerpc64/Makefile M /usr/src/libexec/rtld-elf/Makefile M /usr/src/sys/boot/ofw/Makefile.inc M /usr/src/sys/boot/powerpc/Makefile.inc M /usr/src/sys/boot/powerpc/kboot/Makefile M /usr/src/sys/boot/uboot/Makefile.inc M /usr/src/sys/conf/kern.mk M /usr/src/sys/conf/kmod.mk M /usr/src/sys/ddb/db_main.c M /usr/src/sys/ddb/db_script.c M /usr/src/sys/modules/zfs/Makefile M /usr/src/sys/powerpc/ofw/ofw_machdep.c The M's are generally tied to powerpc64 and powerpc explorations. I tend to use the same source for all the TARGET_ARCH's that I build. =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54642E5C-D5D6-45B7-BB74-2407CFB351C2>