Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Feb 2017 19:55:57 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>, jasone@FreeBSD.org, Shawn Webb <shawn.webb@hardenedbsd.org>
Subject:   Re: arm64 vs. jemalloc and swapping in and out, sh/su examples: being swapped out leads to later Failed assertion: "tsd_booted" after being swapped in
Message-ID:  <41A51B66-4290-48E0-A3E3-AEB809B2748D@dsl-only.net>
In-Reply-To: <6A5AE299-8594-45CC-A801-F0B1125BB042@dsl-only.net>
References:  <6A5AE299-8594-45CC-A801-F0B1125BB042@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-Feb-27, at 7:21 PM, Mark Millard <markmi@dsl-only.net> wrote:

> [I've added a variant of this material to bugzilla 217138.]
>=20
> I've reduced the testing context to the following
> type of example (no longer involving buildworld
> buildkernel):
>=20
> # sh
> # sh
> # sh
> . . .
> # sh
>=20
> (So a bunch of nested sh's in an ssh session that
> will have most swapped out. I happened to have
> done this under 2 users for the example material
> below: in one I did an su first. Plus I had another
> session without such a nesting but with an su: this
> is the one were I ran stress.)
>=20
> After forcing these to mostly swap out (see below)
> I used ^D or exit exit a currently interactive one.
> Then the rest of the swapped out ones from the
> nesting get the tsd_booted failure.
>=20
> So, for example:
>=20
> # stress -m 1 --vm-bytes 1536M
>=20
> The context is a PINE64+ with 2GB of RAM. The above
> was enough in my context to cause the needed swapouts:
>=20
> # ps -aOuser,flags
>  PID USER          F TT  STAT    TIME COMMAND
>  688 root       4102 u0  IWs  0:00.00 login [pam] (login)
>  689 root   10004002 u0  I+   0:00.06 -sh (sh)
> 72611 markmi     4002  0  IWs  0:00.00 -sh (sh)
> 72613 markmi     4002  0  IW   0:00.00 sh
> 72614 markmi     4002  0  IW   0:00.00 sh
> 72615 markmi     4002  0  IW   0:00.00 sh
> 72616 markmi     4002  0  IW   0:00.00 sh
> 72617 markmi     4002  0  IW   0:00.00 sh
> 72618 markmi     4002  0  IW   0:00.00 sh
> 72619 markmi     4002  0  IW   0:00.00 sh
> 72620 markmi     4002  0  IW   0:00.00 sh
> 72621 markmi     4002  0  IW   0:00.00 sh
> 72622 markmi     4002  0  IW   0:00.00 sh
> 72623 markmi     4002  0  IW   0:00.00 sh
> 72624 markmi     4002  0  IW   0:00.00 sh
> 72625 markmi     4002  0  IW   0:00.00 sh
> 72626 markmi 10004002  0  I+   0:00.01 sh
>  167 markmi     4002  1  IWs  0:00.00 -sh (sh)
>  169 root       4102  1  IW   0:00.00 su
>  170 root       4002  1  IW   0:00.00 su (sh)
>  171 root       4002  1  IW   0:00.00 sh
>  172 root       4002  1  IW   0:00.00 sh
>  173 root       4002  1  IW   0:00.00 sh
>  174 root       4002  1  IW   0:00.00 sh
>  175 root       4002  1  IW   0:00.00 sh
>  176 root       4002  1  IW   0:00.00 sh
>  177 root       4002  1  IW   0:00.00 sh
>  178 root       4002  1  IW   0:00.00 sh
>  179 root   10004002  1  I+   0:00.01 sh
> 60961 root       4002  2  IW   0:00.00 stress -m 1 --vm-bytes 1536M
> 60962 root   10000002  2  R    0:29.41 stress -m 1 --vm-bytes 1536M
> 60964 root   10004002  2  R+   0:00.01 ps -aOuser,flags
> 82389 markmi     4002  2  IWs  0:00.00 -sh (sh)
> 82391 root       4102  2  IW   0:00.00 su
> 82392 root   10004002  2  S    0:00.22 su (sh)
>=20
> So with that swapped out context established:
>=20
> (The markmi user case of nested sh's:)
>=20
> $ ^D<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
>=20
>=20
> (The su'd user case of nested sh's:)
>=20
> # ^D<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
>=20
> The one without the deeply nested sh's (but an su):
>=20
> # exit
> <jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
>=20
> So far I've not gotten such without having had the failing
> processes having been swapped-out at some point before failure.
>=20
>=20
> # ls -ltTU /var/crash/
> total 589220
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:31:00 2017 =
sh.82389.core
> -rw-------  1 root    wheel  4812800 Feb 27 18:30:59 2017 =
su.82391.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:35 2017 =
sh.72611.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:34 2017 =
sh.72613.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:34 2017 =
sh.72614.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:33 2017 =
sh.72615.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:32 2017 =
sh.72616.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:31 2017 =
sh.72617.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:30 2017 =
sh.72618.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:29 2017 =
sh.72619.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:28 2017 =
sh.72620.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:28 2017 =
sh.72621.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:27 2017 =
sh.72622.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:26 2017 =
sh.72623.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:25 2017 =
sh.72624.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:24 2017 =
sh.72625.core
> -rw-------  1 markmi  wheel  4718592 Feb 27 18:20:20 2017 sh.167.core
> -rw-------  1 root    wheel  4718592 Feb 27 18:20:19 2017 sh.170.core
> -rw-------  1 root    wheel  4718592 Feb 27 18:20:18 2017 sh.171.core
> -rw-------  1 root    wheel  4718592 Feb 27 18:20:17 2017 sh.172.core
> -rw-------  1 root    wheel  4718592 Feb 27 18:20:16 2017 sh.173.core
> -rw-------  1 root    wheel  4718592 Feb 27 18:20:16 2017 sh.174.core
> -rw-------  1 root    wheel  4718592 Feb 27 18:20:15 2017 sh.175.core
> -rw-------  1 root    wheel  4718592 Feb 27 18:20:14 2017 sh.176.core
> -rw-------  1 root    wheel  4718592 Feb 27 18:20:13 2017 sh.177.core
> -rw-------  1 root    wheel  4718592 Feb 27 18:20:12 2017 sh.178.core
> . . . (from prior investigative activity) . . .
>=20
> Note it was about 10 minutes after the two nested examples before I
> existed the one that only had the su --at which point it detected
> the Failed assertion: "tsd_booted" as well (su and sh, su first).
>=20
>=20
> In other experiments I found that it was when buildworld activity
> caused swapping out of the failing processes that there were the
> later failures.
>=20
>=20
>=20
> I can not claim swapping out and back in is the only context
> with issues but it has been true of every such failure that
> I've looked at since I made this discovery.
>=20
> My context has a USB SSD for the root filesystem and the swap
> parition.

I was careless and did not list context details for the FreeBSD on
the PINE64+ for the examples used:

# uname -apKU
FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT  r313999M  arm64 =
aarch64 1200021 1200021

-r313999's buildworld was built without MALLOC_PRODUCTION. I've done no =
tailoring
of jemalloc behavior.

It has a non-debug kernel build in use.

I've not been using/testing stable/11 but from what I understand it
likely has the same status as head (12) for the Failed assertion:
"tsd_booted" notices/aborts.

The evidence is that process-memory is trashed and so likely continued
operation of any previously swapped-out processes is unreliable.

=3D=3D=3D
Mark Millard
markmi at dsl-only.net





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?41A51B66-4290-48E0-A3E3-AEB809B2748D>