Date: Sun, 12 Feb 2017 02:52:46 -0800 From: Mark Millard <markmi@dsl-only.net> To: Andrew Turner <andrew@fubar.geek.nz>, Shawn Webb <shawn.webb@hardenedbsd.org>, Tom Vijlbrief <tvijlbrief@gmail.com> Cc: freebsd-arm <freebsd-arm@freebsd.org> Subject: pine64 (an A64 Contex-A53 context, e.g. -r312982): sh`forkshell child-process path after fork sometimes has a bad stack pointer value Message-ID: <DC3CC3BE-9D8C-41ED-ADD0-AFD4019B2E90@dsl-only.net>
next in thread | raw e-mail | index | archive | help
On pine64 (an A64 Cortex-A53 context) multiple people on the lists including me have reported sh getting occasional core dumps. I've analyzed a bunch of the sh core dumps and all failed in the child-process path of forkshell when forkshell tried to return.=20 I've since done experiments with code to detect some forms of odd stack pointer values so that the adjusted code calls abort for such a detection before such a return would happen. [This gives a nicer context to look at in core dumps (before things are very messed up if the sp is bad).] In sh`forkshell, just after the fork returns, on the child-process path there is sometimes a messed up sp value by what direction it is from the prior frame-pointers on the stack --and on occasion the value difference is very large, such as: (from: lldb register read on the frame with the pc in sh`forkshell ) fp =3D 0x0000ffffffffce90 sp =3D 0x0000ffffffffe980 This has the sp with a larger address than what sh`__start stored as the frame-pointer back-link when it is put to use via ld-elf.so.1`.rtld_start (more like 0x0000ffffffffde10 as I remember): outside the active stack region. [Note: my experiments so far would not establish if the sp might sometimes have an unexpectedly large distance toward lower memory addresses, specially if it was still in the potential stack-region. It may be that both directions happen.] The distance when it fails is vary variable across examples. I just picked an example were stack frames would be written over the top of other material when sh`forkshell makes other calls on the child-process path, material that would be outside what should be the active stack region. # uname -apKU FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r312982M arm64 = aarch64 1200020 1200020 (I've frozen at that version for this exploration. It has taken me a while.) Looking around I see what might be a few possibilities. . . (I'm no expert so some might be trivially eliminated.) Possibility #0 (possibilities in no particular order): sys/arm64/arm64/vm_machdep.c : In cpu_fork what if the bcopy of td1-td_frame might not always have access to the latest updated values, needing some form of memory "fence" to be sure that such values are accessible? : tf =3D (struct trapframe *)STACKALIGN((struct trapframe *)pcb2 - = 1); bcopy(td1->td_frame, tf, sizeof(*tf)); tf->tf_x[0] =3D 0; tf->tf_x[1] =3D 0; tf->tf_spsr =3D 0; td2->td_frame =3D tf; /* Set the return value registers for fork() */ td2->td_pcb->pcb_x[8] =3D (uintptr_t)fork_return; td2->td_pcb->pcb_x[9] =3D (uintptr_t)td2; td2->td_pcb->pcb_x[PCB_LR] =3D (uintptr_t)fork_trampoline; td2->td_pcb->pcb_sp =3D (uintptr_t)td2->td_frame; td2->td_pcb->pcb_fpusaved =3D &td2->td_pcb->pcb_fpustate; td2->td_pcb->pcb_vfpcpu =3D UINT_MAX; /* Setup to release spin count in fork_exit(). */ td2->td_md.md_spinlock_count =3D 1; td2->td_md.md_saved_daif =3D 0; Possibility #1: sys/arm64/arm64/swtch.S : ENTRY(fork_trampoline) . . . /* Restore sp and lr */ ldp x0, x1, [sp] msr sp_el0, x0 mov lr, x1 Similar point to #0 but for the ldp memory accesses shown. Possibility #3: sys/arm64/arm64/exception.S : Both of: handle_el0_sync handle_el0_irq also update sp_el0 and so if any such can happen during any part of fork_trampoline after its "msr sp_el0, x0" but before its "msr daifset, #2" (disabling interrupts), then the wrong sp_el0 value would be in place at fork_tramploine's eret . It will be interesting to see what the problem actually was once it has been fixed. =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DC3CC3BE-9D8C-41ED-ADD0-AFD4019B2E90>