Date: Sun, 12 Feb 2017 18:46:50 -0800 From: Mark Millard <markmi@dsl-only.net> To: andrew@freebsd.org, Shawn Webb <shawn.webb@hardenedbsd.org> Cc: Tom Vijlbrief <tvijlbrief@gmail.com>, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: pine64 (an A64 Cortex-A53 context, e.g. -r312982): sh`forkshell child-process path after fork sometimes has a bad stack pointer value Message-ID: <2D04FF37-DEC8-42CE-961D-AE8CD58A0EAA@dsl-only.net> In-Reply-To: <DC3CC3BE-9D8C-41ED-ADD0-AFD4019B2E90@dsl-only.net> References: <DC3CC3BE-9D8C-41ED-ADD0-AFD4019B2E90@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-Feb-12, at 2:52 AM, Mark Millard <markmi at dsl-only.net> wrote: > On pine64 (an A64 Cortex-A53 context) multiple people on the lists > including me have reported sh getting occasional core dumps. >=20 > I've analyzed a bunch of the sh core dumps and all failed in the > child-process path of forkshell when forkshell tried to return.=20 > I've since done experiments with code to detect some forms of > odd stack pointer values so that the adjusted code calls abort > for such a detection before such a return would happen. [This > gives a nicer context to look at in core dumps (before things > are very messed up if the sp is bad).] >=20 > In sh`forkshell, just after the fork returns, on the child-process > path there is sometimes a messed up sp value by what direction > it is from the prior frame-pointers on the stack --and on occasion > the value difference is very large, such as: > (from: lldb register read on the frame with the pc in sh`forkshell ) >=20 > fp =3D 0x0000ffffffffce90 > sp =3D 0x0000ffffffffe980 >=20 > This has the sp with a larger address than what sh`__start > stored as the frame-pointer back-link when it is put to use via > ld-elf.so.1`.rtld_start (more like 0x0000ffffffffde10 as I > remember): outside the active stack region. >=20 > [Note: my experiments so far would not establish if the sp > might sometimes have an unexpectedly large distance toward > lower memory addresses, specially if it was still in the > potential stack-region. It may be that both directions > happen.] >=20 > The distance when it fails is vary variable across examples. > I just picked an example were stack frames would be written > over the top of other material when sh`forkshell makes other > calls on the child-process path, material that would be > outside what should be the active stack region. >=20 > # uname -apKU > FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r312982M arm64 = aarch64 1200020 1200020 >=20 > (I've frozen at that version for this exploration. > It has taken me a while.) >=20 > Looking around I see what might be a few possibilities. . . > (I'm no expert so some might be trivially eliminated.) >=20 >=20 > Possibility #0 (possibilities in no particular order): >=20 > sys/arm64/arm64/vm_machdep.c : >=20 > In cpu_fork what if the bcopy of td1-td_frame might not > always have access to the latest updated values, needing > some form of memory "fence" to be sure that such values are > accessible? : >=20 > tf =3D (struct trapframe *)STACKALIGN((struct trapframe *)pcb2 = - 1); > bcopy(td1->td_frame, tf, sizeof(*tf)); > tf->tf_x[0] =3D 0; > tf->tf_x[1] =3D 0; > tf->tf_spsr =3D 0; >=20 > td2->td_frame =3D tf; >=20 > /* Set the return value registers for fork() */ > td2->td_pcb->pcb_x[8] =3D (uintptr_t)fork_return; > td2->td_pcb->pcb_x[9] =3D (uintptr_t)td2; > td2->td_pcb->pcb_x[PCB_LR] =3D (uintptr_t)fork_trampoline; > td2->td_pcb->pcb_sp =3D (uintptr_t)td2->td_frame; > td2->td_pcb->pcb_fpusaved =3D &td2->td_pcb->pcb_fpustate; > td2->td_pcb->pcb_vfpcpu =3D UINT_MAX; >=20 > /* Setup to release spin count in fork_exit(). */ > td2->td_md.md_spinlock_count =3D 1; > td2->td_md.md_saved_daif =3D 0; >=20 >=20 > Possibility #1: >=20 > sys/arm64/arm64/swtch.S : >=20 > ENTRY(fork_trampoline) > . . . > /* Restore sp and lr */ > ldp x0, x1, [sp] > msr sp_el0, x0 > mov lr, x1 >=20 > Similar point to #0 but for the ldp memory accesses > shown. >=20 >=20 > Possibility #3: >=20 > sys/arm64/arm64/exception.S : >=20 > Both of: >=20 > handle_el0_sync > handle_el0_irq >=20 > also update sp_el0 and so if any such can happen > during any part of fork_trampoline after its > "msr sp_el0, x0" but before its "msr daifset, #2" > (disabling interrupts), then the wrong sp_el0 value > would be in place at fork_tramploine's eret . >=20 >=20 > It will be interesting to see what the problem actually > was once it has been fixed. I have updated to a test that should report any time fork changes the stack pointer from what it was before the fork, parent or child process: # svnlite diff /usr/src/bin/sh/miscbltin.c /usr/src/bin/sh/jobs.c Index: /usr/src/bin/sh/miscbltin.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/bin/sh/miscbltin.c (revision 312982) +++ /usr/src/bin/sh/miscbltin.c (working copy) @@ -64,6 +64,15 @@ =20 #undef eflag =20 + +/* JUST FOR TESTING */ +uintptr_t example_stack_address(void) +{ + volatile uintptr_t test =3D 0; + return (uintptr_t)(void*)&test; +} + + int readcmd(int, char **); int umaskcmd(int, char **); int ulimitcmd(int, char **); Index: /usr/src/bin/sh/jobs.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/bin/sh/jobs.c (revision 312982) +++ /usr/src/bin/sh/jobs.c (working copy) @@ -51,6 +51,9 @@ #include <stdlib.h> #include <unistd.h> =20 +/* JUST FOR TESTING */ +#include <stdint.h> + #include "shell.h" #if JOBS #include <termios.h> @@ -833,6 +836,11 @@ * in a pipeline). */ =20 +extern uintptr_t example_stack_address(void); + +uintptr_t stack_address_before_fork =3D 0; +uintptr_t stack_address_after_fork =3D 0; + pid_t forkshell(struct job *jp, union node *n, int mode) { @@ -845,7 +853,10 @@ if (mode =3D=3D FORK_BG && (jp =3D=3D NULL || jp->nprocs =3D=3D = 0)) checkzombies(); flushall(); + stack_address_before_fork =3D example_stack_address(); pid =3D fork(); + stack_address_after_fork =3D example_stack_address(); + if (stack_address_after_fork !=3D stack_address_before_fork) = abort(); if (pid =3D=3D -1) { TRACE(("Fork failed, errno=3D%d\n", errno)); INTON; @@ -946,7 +957,6 @@ return pid; } =20 - pid_t vforkexecshell(struct job *jp, char **argv, char **envp, const char = *path, int idx, int pip[2]) { I've been using repeated attempts to build devel/aarch64-none-elf-gcc for testing because the configure activity does lots of shell work that forks. (I first noticed the problem in such a context.) The first example core file from repeated attempts to build devel/aarch64-none-elf-gcc with this new test shows: (lldb) bt * thread #1: tid =3D 100188, 0x0000000040554e54 libc.so.7`_thr_kill + 8, = name =3D 'sh', stop reason =3D signal SIGABRT * frame #0: 0x0000000040554e54 libc.so.7`_thr_kill + 8 frame #1: 0x0000000040554e18 libc.so.7`__raise(s=3D6) + 64 at = raise.c:52 frame #2: 0x0000000040554d8c libc.so.7`abort + 84 at abort.c:65 frame #3: 0x000000000040f56c sh`forkshell(jp=3D<unavailable>, = n=3D<unavailable>, mode=3D<unavailable>) + 1080 at jobs.c:859 (lldb) up 3 frame #3: 0x000000000040f56c sh`forkshell(jp=3D<unavailable>, = n=3D<unavailable>, mode=3D<unavailable>) + 1080 at jobs.c:859 856 stack_address_before_fork =3D example_stack_address(); 857 pid =3D fork(); 858 stack_address_after_fork =3D example_stack_address(); -> 859 if (stack_address_after_fork !=3D = stack_address_before_fork) abort(); 860 if (pid =3D=3D -1) { 861 TRACE(("Fork failed, errno=3D%d\n", errno)); 862 INTON; (lldb) register read General Purpose Registers: x19 =3D 0x0000000000000000 x20 =3D 0x0000000040a350c0 x21 =3D 0x0000000000000002 x22 =3D 0x0000000040a49898 x23 =3D 0x0000000000434000 sh..bss + 6336 x24 =3D 0x0000000000434000 sh..bss + 6336 x25 =3D 0x00000000000000fb x26 =3D 0x0000000040a49b90 x27 =3D 0x0000000040a49918 x28 =3D 0x0000000000434000 sh..bss + 6336 fp =3D 0x0000ffffffffcee0 lr =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 sp =3D 0x0000ffffffffd180 pc =3D 0x000000000040f56c sh`forkshell + 1080 at jobs.c:862 20 registers were unavailable. (lldb) print/x stack_address_before_fork (uintptr_t) $0 =3D 0x0000ffffffffce80 (lldb) print/x stack_address_after_fork (uintptr_t) $1 =3D 0x0000ffffffffd160 (lldb) print/x main_handler (jmploc) $2 =3D { loc =3D { [0] =3D { _jb =3D { [0] =3D 0x0000ffffffffdb60fb5d25837d7ff700 [1] =3D 0x00000000000000170000ffffffffdc08 [2] =3D 0x00000000004320380000000000434a10 [3] =3D 0x00000000000000000000000000000000 [4] =3D 0x00000000000000000000000000000000 [5] =3D 0x00000000000000000000000000000000 [6] =3D 0x0000000000410c800000ffffffffdbb0 . . . (main_handler is from setjmp in main.) In this example of the context for sh`forkshell after the fork: fp =3D 0x0000ffffffffcee0 < sp =3D 0x0000ffffffffd180 < = 0x0000ffffffffdbb0 That explains why the bt stopped with frame #3: part of the stack had been trashed by the calls to example_stack_address and to abort (and what its internal call chain involves), unlike the prior example that trashed memory outside the active stack region. Compared to the prior example's sp =3D 0x0000ffffffffe980 the distance from the frame pointer in sh`forkshell is much smaller this time --but it is still in the same (wrong) direction. This illustrates the variability in the bad sp value from bad-case to bad-case. I plan to leave the repeated builds running for a time to accumulate some more core files based on this new test. If any examples happen of sp decreasing instead of increasing, the code should core dump for those as well, unlike my prior testing. Context details: # more /usr/src/sys/arm64/conf/GENERIC-NODBG # # GENERIC -- Custom configuration for the arm64/aarch64 # include "GENERIC" ident GENERIC-NODBG makeoptions DEBUG=3D-g # Build kernel with gdb(1) = debug symbols options ALT_BREAK_TO_DEBUGGER options KDB # Enable kernel debugger support # For minimum debugger support (stable branch) use: #options KDB_TRACE # Print a stack trace for a = panic options DDB # Enable the kernel debugger # Extra stuff: #options VERBOSE_SYSINIT # Enable verbose sysinit = messages #options BOOTVERBOSE=3D1 #options BOOTHOWTO=3DRB_VERBOSE #options KTR #options KTR_MASK=3DKTR_TRAP ##options KTR_CPUMASK=3D0xF #options KTR_VERBOSE # Disable any extra checking for. . . nooptions DEADLKRES # Enable the deadlock resolver nooptions INVARIANTS # Enable calls of extra sanity = checking nooptions INVARIANT_SUPPORT # Extra sanity checks of = internal structures, required by INVARIANTS nooptions WITNESS # Enable checks to detect = deadlocks and cycles nooptions WITNESS_SKIPSPIN # Don't run witness on spinlocks = for speed nooptions DIAGNOSTIC nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones The -r312982 variant is a cross build with -mcpu=3Dcortex-a53 specified: # more ~/src.configs/src.conf.pine64-clang-bootstrap.amd64-host=20 TO_TYPE=3Daarch64 TOOLS_TO_TYPE=3D${TO_TYPE} # KERNCONF=3DGENERIC-NODBG TARGET=3Darm64 .if ${.MAKE.LEVEL} =3D=3D 0 TARGET_ARCH=3D${TO_TYPE} .export TARGET_ARCH .endif # WITH_CROSS_COMPILER=3D WITHOUT_SYSTEM_COMPILER=3D # WITH_LIBCPLUSPLUS=3D WITHOUT_BINUTILS_BOOTSTRAP=3D WITHOUT_ELFTOOLCHAIN_BOOTSTRAP=3D WITH_CLANG_BOOTSTRAP=3D WITH_CLANG=3D WITH_CLANG_IS_CC=3D WITH_CLANG_FULL=3D WITH_CLANG_EXTRAS=3D WITH_LLD=3D WITH_LLDB=3D # WITH_BOOT=3D WITHOUT_LIB32=3D WITHOUT_LIBSOFT=3D # WITHOUT_GCC_BOOTSTRAP=3D WITHOUT_GCC=3D WITHOUT_GCC_IS_CC=3D WITHOUT_GNUCXX=3D # NO_WERROR=3D #WERROR=3D MALLOC_PRODUCTION=3D # WITH_REPRODUCIBLE_BUILD=3D WITH_DEBUG_FILES=3D # CROSS_BINUTILS_PREFIX=3D/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/ XCFLAGS+=3D -B${CROSS_BINUTILS_PREFIX} XCXXFLAGS+=3D -B${CROSS_BINUTILS_PREFIX} # There is no XCPPFLAGS but XCPP gets XCFLAGS content. # XCFLAGS+=3D -mcpu=3Dcortex-a53 XCXXFLAGS+=3D -mcpu=3Dcortex-a53 # There is no XCPPFLAGS but XCPP gets XCFLAGS content. In order to cut down on the Spurious interrupt notices on the console I eventually did the below personal adjustment specific to my context (and it is in use above): # svnlite diff /usr/src/sys/arm/arm/gic.c Index: /usr/src/sys/arm/arm/gic.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/arm/arm/gic.c (revision 312982) +++ /usr/src/sys/arm/arm/gic.c (working copy) @@ -672,9 +672,13 @@ =20 if (irq >=3D sc->nirqs) { #ifdef GIC_DEBUG_SPURIOUS +#define EXPECTED_SPURIOUS_IRQ 1023 + if (irq !=3D EXPECTED_SPURIOUS_IRQ) { device_printf(sc->gic_dev, - "Spurious interrupt detected: last irq: %d on = CPU%d\n", + "Spurious interrupt %d detected of %d: last irq: %d = on CPU%d\n", + irq, sc->nirqs, sc->last_irq[PCPU_GET(cpuid)], PCPU_GET(cpuid)); + } #endif return (FILTER_HANDLED); } @@ -720,6 +724,16 @@ if (irq < sc->nirqs) goto dispatch_irq; =20 + if (irq !=3D EXPECTED_SPURIOUS_IRQ) { +#undef EXPECTED_SPURIOUS_IRQ +#ifdef GIC_DEBUG_SPURIOUS + device_printf(sc->gic_dev, + "Spurious end interrupt %d detected of %d: last irq: = %d on CPU%d\n", + irq, sc->nirqs, + sc->last_irq[PCPU_GET(cpuid)], PCPU_GET(cpuid)); +#endif + } + return (FILTER_HANDLED); } The added message block was to report unexpected values for the exit path. None have been reported from there. The gic.c changes eliminated all the Spurious interrupt notices --which made the console more reasonable to use. But so far I've not come up with a way to discover what leads to the irq=3D=3D1023 reports for the original message context. [Interrupts handled by one core but sent to multiple cores?] Other notes: I switched to Andrew Turner's freebsd Email address. It might be more appropriate. I also fixed the mistyped "Contex-a53" in the Subject to say "Cortex-a53". =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2D04FF37-DEC8-42CE-961D-AE8CD58A0EAA>