Date: Mon, 27 Feb 2017 19:55:57 -0800 From: Mark Millard <markmi@dsl-only.net> To: freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org> Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>, jasone@FreeBSD.org, Shawn Webb <shawn.webb@hardenedbsd.org> Subject: Re: arm64 vs. jemalloc and swapping in and out, sh/su examples: being swapped out leads to later Failed assertion: "tsd_booted" after being swapped in Message-ID: <41A51B66-4290-48E0-A3E3-AEB809B2748D@dsl-only.net> In-Reply-To: <6A5AE299-8594-45CC-A801-F0B1125BB042@dsl-only.net> References: <6A5AE299-8594-45CC-A801-F0B1125BB042@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-Feb-27, at 7:21 PM, Mark Millard <markmi@dsl-only.net> wrote: > [I've added a variant of this material to bugzilla 217138.] >=20 > I've reduced the testing context to the following > type of example (no longer involving buildworld > buildkernel): >=20 > # sh > # sh > # sh > . . . > # sh >=20 > (So a bunch of nested sh's in an ssh session that > will have most swapped out. I happened to have > done this under 2 users for the example material > below: in one I did an su first. Plus I had another > session without such a nesting but with an su: this > is the one were I ran stress.) >=20 > After forcing these to mostly swap out (see below) > I used ^D or exit exit a currently interactive one. > Then the rest of the swapped out ones from the > nesting get the tsd_booted failure. >=20 > So, for example: >=20 > # stress -m 1 --vm-bytes 1536M >=20 > The context is a PINE64+ with 2GB of RAM. The above > was enough in my context to cause the needed swapouts: >=20 > # ps -aOuser,flags > PID USER F TT STAT TIME COMMAND > 688 root 4102 u0 IWs 0:00.00 login [pam] (login) > 689 root 10004002 u0 I+ 0:00.06 -sh (sh) > 72611 markmi 4002 0 IWs 0:00.00 -sh (sh) > 72613 markmi 4002 0 IW 0:00.00 sh > 72614 markmi 4002 0 IW 0:00.00 sh > 72615 markmi 4002 0 IW 0:00.00 sh > 72616 markmi 4002 0 IW 0:00.00 sh > 72617 markmi 4002 0 IW 0:00.00 sh > 72618 markmi 4002 0 IW 0:00.00 sh > 72619 markmi 4002 0 IW 0:00.00 sh > 72620 markmi 4002 0 IW 0:00.00 sh > 72621 markmi 4002 0 IW 0:00.00 sh > 72622 markmi 4002 0 IW 0:00.00 sh > 72623 markmi 4002 0 IW 0:00.00 sh > 72624 markmi 4002 0 IW 0:00.00 sh > 72625 markmi 4002 0 IW 0:00.00 sh > 72626 markmi 10004002 0 I+ 0:00.01 sh > 167 markmi 4002 1 IWs 0:00.00 -sh (sh) > 169 root 4102 1 IW 0:00.00 su > 170 root 4002 1 IW 0:00.00 su (sh) > 171 root 4002 1 IW 0:00.00 sh > 172 root 4002 1 IW 0:00.00 sh > 173 root 4002 1 IW 0:00.00 sh > 174 root 4002 1 IW 0:00.00 sh > 175 root 4002 1 IW 0:00.00 sh > 176 root 4002 1 IW 0:00.00 sh > 177 root 4002 1 IW 0:00.00 sh > 178 root 4002 1 IW 0:00.00 sh > 179 root 10004002 1 I+ 0:00.01 sh > 60961 root 4002 2 IW 0:00.00 stress -m 1 --vm-bytes 1536M > 60962 root 10000002 2 R 0:29.41 stress -m 1 --vm-bytes 1536M > 60964 root 10004002 2 R+ 0:00.01 ps -aOuser,flags > 82389 markmi 4002 2 IWs 0:00.00 -sh (sh) > 82391 root 4102 2 IW 0:00.00 su > 82392 root 10004002 2 S 0:00.22 su (sh) >=20 > So with that swapped out context established: >=20 > (The markmi user case of nested sh's:) >=20 > $ ^D<jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" >=20 >=20 > (The su'd user case of nested sh's:) >=20 > # ^D<jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" >=20 > The one without the deeply nested sh's (but an su): >=20 > # exit > <jemalloc>: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" >=20 > So far I've not gotten such without having had the failing > processes having been swapped-out at some point before failure. >=20 >=20 > # ls -ltTU /var/crash/ > total 589220 > -rw------- 1 markmi wheel 4718592 Feb 27 18:31:00 2017 = sh.82389.core > -rw------- 1 root wheel 4812800 Feb 27 18:30:59 2017 = su.82391.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:35 2017 = sh.72611.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:34 2017 = sh.72613.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:34 2017 = sh.72614.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:33 2017 = sh.72615.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:32 2017 = sh.72616.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:31 2017 = sh.72617.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:30 2017 = sh.72618.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:29 2017 = sh.72619.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:28 2017 = sh.72620.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:28 2017 = sh.72621.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:27 2017 = sh.72622.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:26 2017 = sh.72623.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:25 2017 = sh.72624.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:24 2017 = sh.72625.core > -rw------- 1 markmi wheel 4718592 Feb 27 18:20:20 2017 sh.167.core > -rw------- 1 root wheel 4718592 Feb 27 18:20:19 2017 sh.170.core > -rw------- 1 root wheel 4718592 Feb 27 18:20:18 2017 sh.171.core > -rw------- 1 root wheel 4718592 Feb 27 18:20:17 2017 sh.172.core > -rw------- 1 root wheel 4718592 Feb 27 18:20:16 2017 sh.173.core > -rw------- 1 root wheel 4718592 Feb 27 18:20:16 2017 sh.174.core > -rw------- 1 root wheel 4718592 Feb 27 18:20:15 2017 sh.175.core > -rw------- 1 root wheel 4718592 Feb 27 18:20:14 2017 sh.176.core > -rw------- 1 root wheel 4718592 Feb 27 18:20:13 2017 sh.177.core > -rw------- 1 root wheel 4718592 Feb 27 18:20:12 2017 sh.178.core > . . . (from prior investigative activity) . . . >=20 > Note it was about 10 minutes after the two nested examples before I > existed the one that only had the su --at which point it detected > the Failed assertion: "tsd_booted" as well (su and sh, su first). >=20 >=20 > In other experiments I found that it was when buildworld activity > caused swapping out of the failing processes that there were the > later failures. >=20 >=20 >=20 > I can not claim swapping out and back in is the only context > with issues but it has been true of every such failure that > I've looked at since I made this discovery. >=20 > My context has a USB SSD for the root filesystem and the swap > parition. I was careless and did not list context details for the FreeBSD on the PINE64+ for the examples used: # uname -apKU FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r313999M arm64 = aarch64 1200021 1200021 -r313999's buildworld was built without MALLOC_PRODUCTION. I've done no = tailoring of jemalloc behavior. It has a non-debug kernel build in use. I've not been using/testing stable/11 but from what I understand it likely has the same status as head (12) for the Failed assertion: "tsd_booted" notices/aborts. The evidence is that process-memory is trashed and so likely continued operation of any previously swapped-out processes is unreliable. =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?41A51B66-4290-48E0-A3E3-AEB809B2748D>