Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Feb 2017 18:46:50 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        andrew@freebsd.org, Shawn Webb <shawn.webb@hardenedbsd.org>
Cc:        Tom Vijlbrief <tvijlbrief@gmail.com>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: pine64 (an A64 Cortex-A53 context, e.g. -r312982): sh`forkshell child-process path after fork sometimes has a bad stack pointer value
Message-ID:  <2D04FF37-DEC8-42CE-961D-AE8CD58A0EAA@dsl-only.net>
In-Reply-To: <DC3CC3BE-9D8C-41ED-ADD0-AFD4019B2E90@dsl-only.net>
References:  <DC3CC3BE-9D8C-41ED-ADD0-AFD4019B2E90@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-Feb-12, at 2:52 AM, Mark Millard <markmi at dsl-only.net> wrote:

> On pine64 (an A64 Cortex-A53 context) multiple people on the lists
> including me have reported sh getting occasional core dumps.
>=20
> I've analyzed a bunch of the sh core dumps and all failed in the
> child-process path of forkshell when forkshell tried to return.=20
> I've since done experiments with code to detect some forms of
> odd stack pointer values so that the adjusted code calls abort
> for such a detection before such a return would happen. [This
> gives a nicer context to look at in core dumps (before things
> are very messed up if the sp is bad).]
>=20
> In sh`forkshell, just after the fork returns, on the child-process
> path there is sometimes a messed up sp value by what direction
> it is from the prior frame-pointers on the stack --and on occasion
> the value difference is very large, such as:
> (from: lldb register read on the frame with the pc in sh`forkshell )
>=20
>        fp =3D 0x0000ffffffffce90
>        sp =3D 0x0000ffffffffe980
>=20
> This has the sp with a larger address than what sh`__start
> stored as the frame-pointer back-link when it is put to use via
> ld-elf.so.1`.rtld_start (more like 0x0000ffffffffde10 as I
> remember): outside the active stack region.
>=20
> [Note: my experiments so far would not establish if the sp
> might sometimes have an unexpectedly large distance toward
> lower memory addresses, specially if it was still in the
> potential stack-region. It may be that both directions
> happen.]
>=20
> The distance when it fails is vary variable across examples.
> I just picked an example were stack frames would be written
> over the top of other material when sh`forkshell makes other
> calls on the child-process path, material that would be
> outside what should be the active stack region.
>=20
> # uname -apKU
> FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT  r312982M  arm64 =
aarch64 1200020 1200020
>=20
> (I've frozen at that version for this exploration.
> It has taken me a while.)
>=20
> Looking around I see what might be a few possibilities. . .
> (I'm no expert so some might be trivially eliminated.)
>=20
>=20
> Possibility #0 (possibilities in no particular order):
>=20
> sys/arm64/arm64/vm_machdep.c :
>=20
> In cpu_fork what if the bcopy of td1-td_frame might not
> always have access to the latest updated values, needing
> some form of memory "fence" to be sure that such values are
> accessible? :
>=20
>        tf =3D (struct trapframe *)STACKALIGN((struct trapframe *)pcb2 =
- 1);
>        bcopy(td1->td_frame, tf, sizeof(*tf));
>        tf->tf_x[0] =3D 0;
>        tf->tf_x[1] =3D 0;
>        tf->tf_spsr =3D 0;
>=20
>        td2->td_frame =3D tf;
>=20
>        /* Set the return value registers for fork() */
>        td2->td_pcb->pcb_x[8] =3D (uintptr_t)fork_return;
>        td2->td_pcb->pcb_x[9] =3D (uintptr_t)td2;
>        td2->td_pcb->pcb_x[PCB_LR] =3D (uintptr_t)fork_trampoline;
>        td2->td_pcb->pcb_sp =3D (uintptr_t)td2->td_frame;
>        td2->td_pcb->pcb_fpusaved =3D &td2->td_pcb->pcb_fpustate;
>        td2->td_pcb->pcb_vfpcpu =3D UINT_MAX;
>=20
>        /* Setup to release spin count in fork_exit(). */
>        td2->td_md.md_spinlock_count =3D 1;
>        td2->td_md.md_saved_daif =3D 0;
>=20
>=20
> Possibility #1:
>=20
> sys/arm64/arm64/swtch.S :
>=20
> ENTRY(fork_trampoline)
> . . .
>        /* Restore sp and lr */
>        ldp     x0, x1, [sp]
>        msr     sp_el0, x0
>        mov     lr, x1
>=20
> Similar point to #0 but for the ldp memory accesses
> shown.
>=20
>=20
> Possibility #3:
>=20
> sys/arm64/arm64/exception.S :
>=20
> Both of:
>=20
> handle_el0_sync
> handle_el0_irq
>=20
> also update sp_el0 and so if any such can happen
> during any part of fork_trampoline after its
> "msr sp_el0, x0" but before its "msr daifset, #2"
> (disabling interrupts), then the wrong sp_el0 value
> would be in place at fork_tramploine's eret .
>=20
>=20
> It will be interesting to see what the problem actually
> was once it has been fixed.

I have updated to a test that should report any time
fork changes the stack pointer from what it was before
the fork, parent or child process:

# svnlite diff /usr/src/bin/sh/miscbltin.c /usr/src/bin/sh/jobs.c
Index: /usr/src/bin/sh/miscbltin.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /usr/src/bin/sh/miscbltin.c (revision 312982)
+++ /usr/src/bin/sh/miscbltin.c (working copy)
@@ -64,6 +64,15 @@
=20
 #undef eflag
=20
+
+/* JUST FOR TESTING */
+uintptr_t example_stack_address(void)
+{
+    volatile uintptr_t test =3D 0;
+    return (uintptr_t)(void*)&test;
+}
+
+
 int readcmd(int, char **);
 int umaskcmd(int, char **);
 int ulimitcmd(int, char **);
Index: /usr/src/bin/sh/jobs.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /usr/src/bin/sh/jobs.c      (revision 312982)
+++ /usr/src/bin/sh/jobs.c      (working copy)
@@ -51,6 +51,9 @@
 #include <stdlib.h>
 #include <unistd.h>
=20
+/* JUST FOR TESTING */
+#include <stdint.h>
+
 #include "shell.h"
 #if JOBS
 #include <termios.h>
@@ -833,6 +836,11 @@
  * in a pipeline).
  */
=20
+extern uintptr_t example_stack_address(void);
+
+uintptr_t stack_address_before_fork =3D 0;
+uintptr_t stack_address_after_fork =3D 0;
+
 pid_t
 forkshell(struct job *jp, union node *n, int mode)
 {
@@ -845,7 +853,10 @@
        if (mode =3D=3D FORK_BG && (jp =3D=3D NULL || jp->nprocs =3D=3D =
0))
                checkzombies();
        flushall();
+       stack_address_before_fork =3D example_stack_address();
        pid =3D fork();
+       stack_address_after_fork  =3D example_stack_address();
+       if (stack_address_after_fork !=3D stack_address_before_fork) =
abort();
        if (pid =3D=3D -1) {
                TRACE(("Fork failed, errno=3D%d\n", errno));
                INTON;
@@ -946,7 +957,6 @@
        return pid;
 }
=20
-
 pid_t
 vforkexecshell(struct job *jp, char **argv, char **envp, const char =
*path, int idx, int pip[2])
 {


I've been using repeated attempts to build devel/aarch64-none-elf-gcc
for testing because the configure activity does lots of shell work
that forks. (I first noticed the problem in such a context.)

The first example core file from repeated attempts to build
devel/aarch64-none-elf-gcc with this new test shows:

(lldb) bt
* thread #1: tid =3D 100188, 0x0000000040554e54 libc.so.7`_thr_kill + 8, =
name =3D 'sh', stop reason =3D signal SIGABRT
  * frame #0: 0x0000000040554e54 libc.so.7`_thr_kill + 8
    frame #1: 0x0000000040554e18 libc.so.7`__raise(s=3D6) + 64 at =
raise.c:52
    frame #2: 0x0000000040554d8c libc.so.7`abort + 84 at abort.c:65
    frame #3: 0x000000000040f56c sh`forkshell(jp=3D<unavailable>, =
n=3D<unavailable>, mode=3D<unavailable>) + 1080 at jobs.c:859
(lldb) up 3
frame #3: 0x000000000040f56c sh`forkshell(jp=3D<unavailable>, =
n=3D<unavailable>, mode=3D<unavailable>) + 1080 at jobs.c:859
   856 		stack_address_before_fork =3D example_stack_address();
   857 		pid =3D fork();
   858 		stack_address_after_fork  =3D example_stack_address();
-> 859 		if (stack_address_after_fork !=3D =
stack_address_before_fork) abort();
   860 		if (pid =3D=3D -1) {
   861 			TRACE(("Fork failed, errno=3D%d\n", errno));
   862 			INTON;

(lldb) register read
General Purpose Registers:
       x19 =3D 0x0000000000000000
       x20 =3D 0x0000000040a350c0
       x21 =3D 0x0000000000000002
       x22 =3D 0x0000000040a49898
       x23 =3D 0x0000000000434000  sh..bss + 6336
       x24 =3D 0x0000000000434000  sh..bss + 6336
       x25 =3D 0x00000000000000fb
       x26 =3D 0x0000000040a49b90
       x27 =3D 0x0000000040a49918
       x28 =3D 0x0000000000434000  sh..bss + 6336
        fp =3D 0x0000ffffffffcee0
        lr =3D 0x000000000040f56c  sh`forkshell + 1080 at jobs.c:862
        sp =3D 0x0000ffffffffd180
        pc =3D 0x000000000040f56c  sh`forkshell + 1080 at jobs.c:862
20 registers were unavailable.

(lldb) print/x stack_address_before_fork
(uintptr_t) $0 =3D 0x0000ffffffffce80

(lldb) print/x stack_address_after_fork
(uintptr_t) $1 =3D 0x0000ffffffffd160

(lldb) print/x main_handler
(jmploc) $2 =3D {
  loc =3D {
    [0] =3D {
      _jb =3D {
        [0] =3D 0x0000ffffffffdb60fb5d25837d7ff700
        [1] =3D 0x00000000000000170000ffffffffdc08
        [2] =3D 0x00000000004320380000000000434a10
        [3] =3D 0x00000000000000000000000000000000
        [4] =3D 0x00000000000000000000000000000000
        [5] =3D 0x00000000000000000000000000000000
        [6] =3D 0x0000000000410c800000ffffffffdbb0
. . .

(main_handler is from setjmp in main.)

In this example of the context for sh`forkshell after the fork:

fp =3D 0x0000ffffffffcee0 < sp =3D 0x0000ffffffffd180 < =
0x0000ffffffffdbb0

That explains why the bt stopped with frame #3: part of the stack
had been trashed by the calls to example_stack_address and to abort
(and what its internal call chain involves), unlike the prior example
that trashed memory outside the active stack region.

Compared to the prior example's sp =3D 0x0000ffffffffe980 the
distance from the frame pointer in sh`forkshell is much smaller this
time --but it is still in the same (wrong) direction.

This illustrates the variability in the bad sp value from bad-case
to bad-case.

I plan to leave the repeated builds running for a time to accumulate
some more core files based on this new test. If any examples happen
of sp decreasing instead of increasing, the code should core dump for
those as well, unlike my prior testing.



Context details:

# more /usr/src/sys/arm64/conf/GENERIC-NODBG
#
# GENERIC -- Custom configuration for the arm64/aarch64
#

include "GENERIC"

ident   GENERIC-NODBG

makeoptions     DEBUG=3D-g                # Build kernel with gdb(1) =
debug symbols

options         ALT_BREAK_TO_DEBUGGER

options         KDB                     # Enable kernel debugger support

# For minimum debugger support (stable branch) use:
#options        KDB_TRACE               # Print a stack trace for a =
panic
options         DDB                     # Enable the kernel debugger

# Extra stuff:
#options        VERBOSE_SYSINIT         # Enable verbose sysinit =
messages
#options        BOOTVERBOSE=3D1
#options        BOOTHOWTO=3DRB_VERBOSE
#options        KTR
#options        KTR_MASK=3DKTR_TRAP
##options       KTR_CPUMASK=3D0xF
#options        KTR_VERBOSE

# Disable any extra checking for. . .
nooptions       DEADLKRES               # Enable the deadlock resolver
nooptions       INVARIANTS              # Enable calls of extra sanity =
checking
nooptions       INVARIANT_SUPPORT       # Extra sanity checks of =
internal structures, required by INVARIANTS
nooptions       WITNESS                 # Enable checks to detect =
deadlocks and cycles
nooptions       WITNESS_SKIPSPIN        # Don't run witness on spinlocks =
for speed
nooptions       DIAGNOSTIC
nooptions       MALLOC_DEBUG_MAXZONES   # Separate malloc(9) zones


The -r312982 variant is a cross build with -mcpu=3Dcortex-a53
specified:

# more ~/src.configs/src.conf.pine64-clang-bootstrap.amd64-host=20
TO_TYPE=3Daarch64
TOOLS_TO_TYPE=3D${TO_TYPE}
#
KERNCONF=3DGENERIC-NODBG
TARGET=3Darm64
.if ${.MAKE.LEVEL} =3D=3D 0
TARGET_ARCH=3D${TO_TYPE}
.export TARGET_ARCH
.endif
#
WITH_CROSS_COMPILER=3D
WITHOUT_SYSTEM_COMPILER=3D
#
WITH_LIBCPLUSPLUS=3D
WITHOUT_BINUTILS_BOOTSTRAP=3D
WITHOUT_ELFTOOLCHAIN_BOOTSTRAP=3D
WITH_CLANG_BOOTSTRAP=3D
WITH_CLANG=3D
WITH_CLANG_IS_CC=3D
WITH_CLANG_FULL=3D
WITH_CLANG_EXTRAS=3D
WITH_LLD=3D
WITH_LLDB=3D
#
WITH_BOOT=3D
WITHOUT_LIB32=3D
WITHOUT_LIBSOFT=3D
#
WITHOUT_GCC_BOOTSTRAP=3D
WITHOUT_GCC=3D
WITHOUT_GCC_IS_CC=3D
WITHOUT_GNUCXX=3D
#
NO_WERROR=3D
#WERROR=3D
MALLOC_PRODUCTION=3D
#
WITH_REPRODUCIBLE_BUILD=3D
WITH_DEBUG_FILES=3D
#
CROSS_BINUTILS_PREFIX=3D/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/
XCFLAGS+=3D -B${CROSS_BINUTILS_PREFIX}
XCXXFLAGS+=3D -B${CROSS_BINUTILS_PREFIX}
# There is no XCPPFLAGS but XCPP gets XCFLAGS content.
#
XCFLAGS+=3D -mcpu=3Dcortex-a53
XCXXFLAGS+=3D -mcpu=3Dcortex-a53
# There is no XCPPFLAGS but XCPP gets XCFLAGS content.


In order to cut down on the Spurious interrupt notices on the
console I eventually did the below personal adjustment specific
to my context (and it is in use above):

# svnlite diff /usr/src/sys/arm/arm/gic.c
Index: /usr/src/sys/arm/arm/gic.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /usr/src/sys/arm/arm/gic.c  (revision 312982)
+++ /usr/src/sys/arm/arm/gic.c  (working copy)
@@ -672,9 +672,13 @@
=20
        if (irq >=3D sc->nirqs) {
 #ifdef GIC_DEBUG_SPURIOUS
+#define EXPECTED_SPURIOUS_IRQ 1023
+           if (irq !=3D EXPECTED_SPURIOUS_IRQ) {
                device_printf(sc->gic_dev,
-                   "Spurious interrupt detected: last irq: %d on =
CPU%d\n",
+                   "Spurious interrupt %d detected of %d: last irq: %d =
on CPU%d\n",
+                   irq, sc->nirqs,
                    sc->last_irq[PCPU_GET(cpuid)], PCPU_GET(cpuid));
+            }
 #endif
                return (FILTER_HANDLED);
        }
@@ -720,6 +724,16 @@
        if (irq < sc->nirqs)
                goto dispatch_irq;
=20
+       if (irq !=3D EXPECTED_SPURIOUS_IRQ) {
+#undef EXPECTED_SPURIOUS_IRQ
+#ifdef GIC_DEBUG_SPURIOUS
+               device_printf(sc->gic_dev,
+                   "Spurious end interrupt %d detected of %d: last irq: =
%d on CPU%d\n",
+                   irq, sc->nirqs,
+                   sc->last_irq[PCPU_GET(cpuid)], PCPU_GET(cpuid));
+#endif
+       }
+
        return (FILTER_HANDLED);
 }

The added message block was to report unexpected values for the
exit path. None have been reported from there.

The gic.c changes eliminated all the Spurious interrupt notices --which
made the console more reasonable to use.

But so far I've not come up with a way to discover what leads
to the irq=3D=3D1023 reports for the original message context.
[Interrupts handled by one core but sent to multiple cores?]


Other notes:

I switched to Andrew Turner's freebsd Email address. It might be more
appropriate. I also fixed the mistyped "Contex-a53" in the Subject to
say "Cortex-a53".


=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2D04FF37-DEC8-42CE-961D-AE8CD58A0EAA>