Date: Tue, 31 Jan 2017 12:35:37 -0800 From: Mark Millard <markmigm@gmail.com> To: Tom Vijlbrief <tvijlbrief@gmail.com>, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: Arm64 stack issues (was Re: FreeBSD status for/on ODroid-C2?) Message-ID: <890B7D8A-27FF-41AC-8291-1858393EC7B1@gmail.com> In-Reply-To: <CB36F13F-85E9-41D2-A7F3-DA183BE5985A@dsl-only.net> References: <CAOQrpVfK-Dw_rSo_YVY5MT1wbc6Ah-Pj%2BWv8UGjeiUQ1b3%2B-mg@mail.gmail.com> <20170124191357.0ec0abfd@zapp> <20170128010138.iublazyrhhqycn37@mutt-hardenedbsd> <20170128010223.tjivldnh7pyenbg6@mutt-hardenedbsd> <CAOQrpVfxKvSR5PoahnqEsYspHhjjOGJ8iCBUetKxRV57oX_aUg@mail.gmail.com> <009857E3-35BB-4DE4-B3BB-5EC5DDBB5B06@dsl-only.net> <CAOQrpVdKyP2T0V77sfpuKbNP3ARoD1EcwtH6E9o7p5KF%2B=A56A@mail.gmail.com> <CB36F13F-85E9-41D2-A7F3-DA183BE5985A@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
[More notes on what I observe on a pine64 from head -r312982 .] On 2017-Jan-28, at 2:17 PM, Tom Vijlbrief <tvijlbrief at gmail.com> = wrote: > Note that on the pine64 the network interface hangs from time to time = and I get a core dump with very low frequency from long running = processes, eg the shell that invokes "make world". I got sh crashes (multiple processes in the same time frame) from just trying to build pkg: make[5]: stopped in = /usr/obj/portswork/usr/ports/ports-mgmt/pkg/work/pkg-1.9.4/libpkg *** [all-recursive] Error code 1 # ls -lt /var/crash/ total 41764 -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13676.core -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13511.core -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13499.core -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.12095.core -rw-r--r-- 1 root wheel 5 Nov 3 10:18 minfree In all the crashes lldb on the .core shows that the pc was no longer pointing a memory with code in it. It is interesting that all 4 sh instances died at about the same time. SIGILL, SIGSEGV, SIGBUS, and SIGILL (again) from the non-code consequences. The two SIGILL's have some interesting similarities to each other. So I list them first below. x0-x3, x8-x9, x13, x17, x27, and cpsr all match in these two. x1=3Dld-elf.so.1`_rtld_tlsdesc, x17=3Dlibc.so.7`__free at jemalloc_jemalloc.c:2007, x23=3Dld-elf.so.1`symlook_global + 124 at rtld.c:3916, x27=3Dsh..bss + 6336. The other two have the following in common: x10-x12, x16-x17. x17=3Dlibc.so.7`close at close.c:48 . x18 =3D 0xaaaaaaaaaaaaaaab is common between one SIGILL and one not. Only one does not have x27=3Dsh..bss + 6336. It instead has: x28=3Dsh..bss + 6336 . (lldb) bt * thread #1: tid =3D 100142, 0x000000004044f800, name =3D 'sh', stop = reason =3D signal SIGILL * frame #0: 0x000000004044f800 (lldb) register read General Purpose Registers: x0 =3D 0x0000000000000000 x1 =3D 0x00000000404346e8 ld-elf.so.1`_rtld_tlsdesc x2 =3D 0x0000000040a00000 x3 =3D 0x0000000000000002 x4 =3D 0x0000000000000050 x5 =3D 0x0000000040a4c9c0 x6 =3D 0x2e2e2f2e2e2f2e2e x7 =3D 0x6c6f6f7462696c2f x8 =3D 0x0000000000000001 x9 =3D 0x0000000000000000 x10 =3D 0x00000000000000df x11 =3D 0x000000000000002f x12 =3D 0x0000000040a0e690 x13 =3D 0x0000000000000427 x14 =3D 0x0000000000000001 x15 =3D 0x0000000000000000 x16 =3D 0x0000000000432340 =20 x17 =3D 0x000000004054cd00 libc.so.7`__free at = jemalloc_jemalloc.c:2007 x18 =3D 0x0000000000000000 x19 =3D 0x000000004044e330 x20 =3D 0x000000001c93deed x21 =3D 0x0000000007ab9b5c x22 =3D 0x00000000404ba7b0 =20 x23 =3D 0x000000004043c4b0 ld-elf.so.1`symlook_global + 124 at = rtld.c:3916 x24 =3D 0x0000ffffffffd2d0 x25 =3D 0x0000ffffffffd370 x26 =3D 0x0000ffffffffd340 x27 =3D 0x0000000000434000 sh..bss + 6336 x28 =3D 0x0000000040a4c1b0 fp =3D 0x0000ffff00000001 lr =3D 0x000000004044f800 sp =3D 0x0000ffffffffd2a0 pc =3D 0x000000004044f800 cpsr =3D 0x60000000 (lldb) disass -> 0x4044f800: .long 0xd550b87a ; unknown opcode 0x4044f804: .long 0x00000000 ; unknown opcode 0x4044f808: .long 0x00000001 ; unknown opcode 0x4044f80c: .long 0x00000000 ; unknown opcode 0x4044f810: .long 0x4044fc00 ; unknown opcode 0x4044f814: .long 0x00000000 ; unknown opcode 0x4044f818: .long 0x4044f410 ; unknown opcode 0x4044f81c: .long 0x00000000 ; unknown opcode (lldb) thread list Process 0 stopped * thread #1: tid =3D 100161, 0x0000ffffffffee68, name =3D 'sh', stop = reason =3D signal SIGILL (lldb) register read General Purpose Registers: x0 =3D 0x0000000000000000 x1 =3D 0x00000000404346e8 ld-elf.so.1`_rtld_tlsdesc x2 =3D 0x0000000040a00000 x3 =3D 0x0000000000000002 x4 =3D 0x0000000000000017 x5 =3D 0x00080002a0290a00 x6 =3D 0x0000000000434c28 sh..bss + 9448 x7 =3D 0x000000000005e1cd x8 =3D 0x0000000000000001 x9 =3D 0x0000000000000000 x10 =3D 0x0000000000000000 x11 =3D 0x0000000040a5c000 x12 =3D 0x0000000040a0e670 x13 =3D 0x0000000000000427 x14 =3D 0x000000000000000d x15 =3D 0x0000000000432740 sh..bss + 0 x16 =3D 0x0000000000432340 =20 x17 =3D 0x000000004054cd00 libc.so.7`__free at = jemalloc_jemalloc.c:2007 x18 =3D 0xaaaaaaaaaaaaaaab x19 =3D 0x0000ffffffffee18 x20 =3D 0x0000ffffffffedb4 x21 =3D 0x0000ffffffffed80 x22 =3D 0x0000ffffffffed59 x23 =3D 0x0000ffffffffed47 x24 =3D 0x0000ffffffffed38 x25 =3D 0x0000ffffffffed28 x26 =3D 0x0000ffffffffed20 x27 =3D 0x0000000000434000 sh..bss + 6336 x28 =3D 0x0000000040a803a0 fp =3D 0x0000ffffffffee59 lr =3D 0x0000ffffffffee68 sp =3D 0x0000ffffffffe1a0 pc =3D 0x0000ffffffffee68 cpsr =3D 0x60000000 (lldb) disass -> 0xffffffffee68: .long 0x44504d54 ; unknown opcode 0xffffffffee6c: .long 0x2f3d5249 ; unknown opcode 0xffffffffee70: .long 0x00706d74 ; unknown opcode 0xffffffffee74: .long 0x4c454853 ; unknown opcode 0xffffffffee78: .long 0x622f3d4c ; unknown opcode 0xffffffffee7c: .long 0x732f6e69 ; unknown opcode 0xffffffffee80: .long 0x4f430068 ; unknown opcode 0xffffffffee84: .long 0x4749464e ; unknown opcode (lldb) bt * thread #1: tid =3D 100088, 0x356c7265702f676e, name =3D 'sh', stop = reason =3D signal SIGBUS * frame #0: 0x356c7265702f676e (lldb) register read General Purpose Registers: x0 =3D 0x0000000000000000 x1 =3D 0x0000000000000000 x2 =3D 0x0000000040a00000 x3 =3D 0x0000000000000005 x4 =3D 0x0000000000000038 x5 =3D 0x0000000040a754e5 x6 =3D 0x584946455250442d x7 =3D 0x6c2f7273752f223d x8 =3D 0x0000000000000000 x9 =3D 0x0000000000000000 x10 =3D 0x0000000000434000 sh..bss + 6336 x11 =3D 0x0000000000000000 x12 =3D 0x0000000000434217 sh..bss + 6871 x13 =3D 0x0000000000434000 sh..bss + 6336 x14 =3D 0x0000000000432000 sh`__frame_dummy_init_array_entry x15 =3D 0x000000000000003d x16 =3D 0x00000000004322b0 =20 x17 =3D 0x000000004050d090 libc.so.7`close at close.c:48 x18 =3D 0xaaaaaaaaaaaaaaab x19 =3D 0x766564206f666e69 x20 =3D 0x7865646e692f746e x21 =3D 0x69727020676b702f x22 =3D 0x746d676d2d737472 x23 =3D 0x6f7020656d69746e x24 =3D 0x75722d7478657474 x25 =3D 0x65672f6c65766564 x26 =3D 0x206e6f7369622f6c x27 =3D 0x0000000040a53716 x28 =3D 0x0000000000434000 sh..bss + 6336 fp =3D 0x616c20346d2f6c65 lr =3D 0x356c7265702f676e sp =3D 0x0000ffffffffe740 pc =3D 0x356c7265702f676e cpsr =3D 0x20000000 (lldb) disass error: core file does not contain 0x356c7265702f676e error: Failed to disassemble memory at 0xffffffffffffffff. (lldb) bt * thread #1: tid =3D 100186, 0x0000000000000000, name =3D 'sh', stop = reason =3D signal SIGSEGV * frame #0: 0x0000000000000000 (lldb) disass error: core file does not contain 0x0 error: Failed to disassemble memory at 0xffffffffffffffff. (lldb) register read General Purpose Registers: x0 =3D 0x0000000000000000 x1 =3D 0x0000000000000000 x2 =3D 0x0000000000000002 x3 =3D 0x0000000000006c6f x4 =3D 0x0000000040a50bb3 x5 =3D 0x0000000040a499ba x6 =3D 0x6f7462696c2f2e2e x7 =3D 0x6c6f6f7462696c2f x8 =3D 0x0000000000000000 x9 =3D 0x0000000000000000 x10 =3D 0x0000000000434000 sh..bss + 6336 x11 =3D 0x0000000000000000 x12 =3D 0x0000000040a499f8 x13 =3D 0x0000000000434000 sh..bss + 6336 x14 =3D 0x0000000000000001 x15 =3D 0x0000000000000000 x16 =3D 0x00000000004322b0 =20 x17 =3D 0x000000004050d090 libc.so.7`close at close.c:48 x18 =3D 0x0000000000000000 x19 =3D 0x0000000000000065 x20 =3D 0x0000000000000065 x21 =3D 0x00000000004168f0 sh`readtoken1 + 5212 at parser.c:1602 x22 =3D 0x0000ffffffffda90 x23 =3D 0x0000000040a498c0 x24 =3D 0x000000000000000a x25 =3D 0x0000000000000000 x26 =3D 0x0000000000000000 x27 =3D 0x0000000040a49258 x28 =3D 0x0000000000434000 sh..bss + 6336 fp =3D 0x0000ffffffffda08 lr =3D 0x0000000000000000 sp =3D 0x0000ffffffffd970 pc =3D 0x0000000000000000 cpsr =3D 0x20000000 Looks to me like something major is wrong. On 2017-Jan-30, at 11:57 PM, Mark Millard <markmi at dsl-only.net> = wrote: > I updated to head -r312982 on the pine64 that I have access to: >=20 > # uname -apKU > FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r312982M arm64 = aarch64 1200020 1200020 >=20 > after several months of not using the pine64. > ( -mcpu=3Dcortex-a53 used for buildworld buildkernel; > non-debug variant of GENERIC [GENERIC included > then overridden]; usb SSD root file system) >=20 > I find that any time some of the cores are busy I get thousands > of the gic0 spurious interrupt messages in fairly sort order. > (This is not new: it is unchanged.) >=20 > For example during either of: >=20 > openssl speed >=20 > or: >=20 > cp /dev/zero /dev/null > (similarly for copying actual files around, > local or nfs involved) >=20 > Once the cores are no longer busy the gic0 messages stop. >=20 > The "on CPU<?>" varies. The "last irq: <?>" varies. > (But 27 is the most common by far.) =3D=3D=3D Mark Millard markmi at dsl-only.net On 2017-Jan-28, at 2:17 PM, Tom Vijlbrief <tvijlbrief at gmail.com> = wrote: Note that on the pine64 the network interface hangs from time to time = and I get a core dump with very low frequency from long running = processes, eg the shell that invokes "make world". Note that I had = similar issues on the ODroid-C2. Currently rebuilding world without MALLOC_PRODUCTION. The arm64 port is getting close to working 100%, just a last few = glitches. Op 22:03 ZA 28 Jan 2017 schreef Mark Millard <markmi at dsl-only.net>: [About: "gic0: Spurious interrupt detected" on armv6 as well.] On 2017-Jan-28, at 6:43 AM, Tom Vijlbrief <tvijlbrief at gmail.com> = wrote: > Did a build/install world/kernel with r312916 and = MALLOC_PRODUCTION=3DYES on > a pine64, removed /etc/malloc.conf, rebooted >=20 > and I am now rebuilding the python2 port without problems so far = (except > the "gic0: Spurious interrupt detected" messages which reappeared = shortly > after my previous post) While very rare, I have seen the gic0 notices on armv6 (e.g., a bpim3) during large builds (with -j 4). Recently I got a: gic0: Spurious interrupt detected: last irq: 29 on CPU1 on: # uname -apKU FreeBSD bpim3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r312726M: Tue Jan 24 = 20:57:48 PST 2017 = markmi@FreeBSDx64:/usr/obj/bpim3_clang/arm.armv6/usr/src/sys/BPIM3-NODBG = arm armv6 1200020 1200020 while building devel/gcc6 (via a full bootstrap) via -j 4 . This is from a non-debug buildworld buildkernel context and has = MALLOC_PRODUCTION=3D in /etc/make.conf . No /etc/malloc.conf present. I do use = -mcpu=3Dcortex-a7 . Details if you care: # more /usr/src/sys/arm/conf/BPIM3-NODBG # # BPIM3 -- Custom configuration for the Banana Pi M3 # include "GENERIC" ident BPIM3-NODBG makeoptions DEBUG=3D-g # Build kernel with gdb(1) = debug symbols options ALT_BREAK_TO_DEBUGGER options KDB # Enable kernel debugger support # For minimum debugger support (stable branch) use: options KDB_TRACE # Print a stack trace for a = panic options DDB # Enable the kernel debugger # Extra stuff: #options VERBOSE_SYSINIT # Enable verbose sysinit = messages #options BOOTVERBOSE=3D1 #options BOOTHOWTO=3DRB_VERBOSE #options KTR #options KTR_MASK=3DKTR_TRAP ##options KTR_CPUMASK=3D0xF #options KTR_VERBOSE # Disable any extra checking for. . . nooptions DEADLKRES # Enable the deadlock resolver nooptions INVARIANTS # Enable calls of extra sanity = checking nooptions INVARIANT_SUPPORT # Extra sanity checks of = internal structures, required by INVARIANTS nooptions WITNESS # Enable checks to detect = deadlocks and cycles nooptions WITNESS_SKIPSPIN # Don't run witness on spinlocks = for speed nooptions DIAGNOSTIC It was a from cross build for buildworld buildkernel : (I've not checked on lldb builds linking recently.) # more ~/src.configs/src.conf.bpim3-clang-bootstrap.amd64-host TO_TYPE=3Darmv6 # KERNCONF=3DBPIM3-NODBG TARGET=3Darm .if ${.MAKE.LEVEL} =3D=3D 0 TARGET_ARCH=3D${TO_TYPE} .export TARGET_ARCH .endif # WITH_CROSS_COMPILER=3D WITHOUT_SYSTEM_COMPILER=3D # #CPUTYPE=3Dsoft WITH_LIBCPLUSPLUS=3D WITH_BINUTILS_BOOTSTRAP=3D WITH_CLANG_BOOTSTRAP=3D WITH_CLANG=3D WITH_CLANG_IS_CC=3D WITH_CLANG_FULL=3D WITH_CLANG_EXTRAS=3D WITH_LLD=3D # # Linking lldb fails for armv6(/v7) WITHOUT_LLDB=3D # WITH_BOOT=3D WITHOUT_LIB32=3D WITHOUT_LIBSOFT=3D # WITHOUT_ELFTOOLCHAIN_BOOTSTRAP=3D WITHOUT_GCC_BOOTSTRAP=3D WITHOUT_GCC=3D WITHOUT_GCC_IS_CC=3D WITHOUT_GNUCXX=3D # NO_WERROR=3D #WERROR=3D MALLOC_PRODUCTION=3D # WITH_REPRODUCIBLE_BUILD=3D WITH_DEBUG_FILES=3D # XCFLAGS+=3D -mcpu=3Dcortex-a7 XCXXFLAGS+=3D -mcpu=3Dcortex-a7 # There is no XCPPFLAGS but XCPP gets XCFLAGS content. Used for buildworld buildkernel : # more ~/src.configs/make.conf #MALLOC_PRODUCTION=3D #NO_WERROR=3D #WERROR=3D CFLAGS.gcc+=3D -v Used for port builds: # more /etc/make.conf WANT_QT_VERBOSE_CONFIGURE=3D1 # DEFAULT_VERSIONS+=3Dperl5=3D5.24 WRKDIRPREFIX=3D/usr/obj/portswork WITH_DEBUG=3D WITH_DEBUG_FILES=3D MALLOC_PRODUCTION=3D # svnlite status /usr/src/ | sort ? /usr/src/sys/amd64/conf/GENERIC-DBG ? /usr/src/sys/amd64/conf/GENERIC-NODBG ? /usr/src/sys/arm/conf/BPIM3-DBG ? /usr/src/sys/arm/conf/BPIM3-NODBG ? /usr/src/sys/arm/conf/RPI2-DBG ? /usr/src/sys/arm/conf/RPI2-NODBG ? /usr/src/sys/arm64/conf/GENERIC-DBG ? /usr/src/sys/arm64/conf/GENERIC-NODBG ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG ? /usr/src/sys/powerpc/conf/GENERICvtsc-DBG ? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td M /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp M /usr/src/lib/csu/powerpc64/Makefile M /usr/src/libexec/rtld-elf/Makefile M /usr/src/sys/boot/ofw/Makefile.inc M /usr/src/sys/boot/powerpc/Makefile.inc M /usr/src/sys/boot/powerpc/kboot/Makefile M /usr/src/sys/boot/uboot/Makefile.inc M /usr/src/sys/conf/kern.mk M /usr/src/sys/conf/kmod.mk M /usr/src/sys/ddb/db_main.c M /usr/src/sys/ddb/db_script.c M /usr/src/sys/modules/zfs/Makefile M /usr/src/sys/powerpc/ofw/ofw_machdep.c The M's are generally tied to powerpc64 and powerpc explorations. I tend to use the same source for all the TARGET_ARCH's that I build. =3D=3D=3D Mark Millard markmi at dsl-only.net _______________________________________________ freebsd-arm@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?890B7D8A-27FF-41AC-8291-1858393EC7B1>