From owner-freebsd-stable@freebsd.org  Tue Feb 28 03:21:48 2017
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DCB95CF08B9
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Tue, 28 Feb 2017 03:21:48 +0000 (UTC)
 (envelope-from markmi@dsl-only.net)
Received: from asp.reflexion.net (outbound-mail-210-89.reflexion.net
 [208.70.210.89])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9DF5A797
 for <freebsd-stable@freebsd.org>; Tue, 28 Feb 2017 03:21:48 +0000 (UTC)
 (envelope-from markmi@dsl-only.net)
Received: (qmail 29151 invoked from network); 28 Feb 2017 03:21:40 -0000
Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1)
 by 0 (rfx-qmail) with SMTP; 28 Feb 2017 03:21:40 -0000
Received: by mail-cs-01.app.dca.reflexion.local
 (Reflexion email security v8.30.0) with SMTP;
 Mon, 27 Feb 2017 22:21:41 -0500 (EST)
Received: (qmail 1440 invoked from network); 28 Feb 2017 03:21:40 -0000
Received: from unknown (HELO iron2.pdx.net) (69.64.224.71)
 by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 28 Feb 2017 03:21:40 -0000
Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net
 [67.170.167.181])
 by iron2.pdx.net (Postfix) with ESMTPSA id 06F4AEC7822;
 Mon, 27 Feb 2017 19:21:39 -0800 (PST)
From: Mark Millard <markmi@dsl-only.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\))
Subject: arm64 vs. jemalloc and swapping in and out, sh/su examples: being
 swapped out leads to later Failed assertion: "tsd_booted" after being swapped
 in
Message-Id: <6A5AE299-8594-45CC-A801-F0B1125BB042@dsl-only.net>
Date: Mon, 27 Feb 2017 19:21:39 -0800
Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>,
 jasone@FreeBSD.org, Shawn Webb <shawn.webb@hardenedbsd.org>
To: freebsd-arm <freebsd-arm@freebsd.org>,
 FreeBSD Current <freebsd-current@freebsd.org>
X-Mailer: Apple Mail (2.3259)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Feb 2017 03:21:49 -0000

[I've added a variant of this material to bugzilla 217138.]

I've reduced the testing context to the following
type of example (no longer involving buildworld
buildkernel):

# sh
# sh
# sh
. . .
# sh

(So a bunch of nested sh's in an ssh session that
will have most swapped out. I happened to have
done this under 2 users for the example material
below: in one I did an su first. Plus I had another
session without such a nesting but with an su: this
is the one were I ran stress.)

After forcing these to mostly swap out (see below)
I used ^D or exit exit a currently interactive one.
Then the rest of the swapped out ones from the
nesting get the tsd_booted failure.

So, for example:

# stress -m 1 --vm-bytes 1536M

The context is a PINE64+ with 2GB of RAM. The above
was enough in my context to cause the needed swapouts:

# ps -aOuser,flags
  PID USER          F TT  STAT    TIME COMMAND
  688 root       4102 u0  IWs  0:00.00 login [pam] (login)
  689 root   10004002 u0  I+   0:00.06 -sh (sh)
72611 markmi     4002  0  IWs  0:00.00 -sh (sh)
72613 markmi     4002  0  IW   0:00.00 sh
72614 markmi     4002  0  IW   0:00.00 sh
72615 markmi     4002  0  IW   0:00.00 sh
72616 markmi     4002  0  IW   0:00.00 sh
72617 markmi     4002  0  IW   0:00.00 sh
72618 markmi     4002  0  IW   0:00.00 sh
72619 markmi     4002  0  IW   0:00.00 sh
72620 markmi     4002  0  IW   0:00.00 sh
72621 markmi     4002  0  IW   0:00.00 sh
72622 markmi     4002  0  IW   0:00.00 sh
72623 markmi     4002  0  IW   0:00.00 sh
72624 markmi     4002  0  IW   0:00.00 sh
72625 markmi     4002  0  IW   0:00.00 sh
72626 markmi 10004002  0  I+   0:00.01 sh
  167 markmi     4002  1  IWs  0:00.00 -sh (sh)
  169 root       4102  1  IW   0:00.00 su
  170 root       4002  1  IW   0:00.00 su (sh)
  171 root       4002  1  IW   0:00.00 sh
  172 root       4002  1  IW   0:00.00 sh
  173 root       4002  1  IW   0:00.00 sh
  174 root       4002  1  IW   0:00.00 sh
  175 root       4002  1  IW   0:00.00 sh
  176 root       4002  1  IW   0:00.00 sh
  177 root       4002  1  IW   0:00.00 sh
  178 root       4002  1  IW   0:00.00 sh
  179 root   10004002  1  I+   0:00.01 sh
60961 root       4002  2  IW   0:00.00 stress -m 1 --vm-bytes 1536M
60962 root   10000002  2  R    0:29.41 stress -m 1 --vm-bytes 1536M
60964 root   10004002  2  R+   0:00.01 ps -aOuser,flags
82389 markmi     4002  2  IWs  0:00.00 -sh (sh)
82391 root       4102  2  IW   0:00.00 su
82392 root   10004002  2  S    0:00.22 su (sh)

So with that swapped out context established:

(The markmi user case of nested sh's:)

$ ^D<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"


(The su'd user case of nested sh's:)

# ^D<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"

The one without the deeply nested sh's (but an su):

# exit
<jemalloc>: =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed =
assertion: "tsd_booted"

So far I've not gotten such without having had the failing
processes having been swapped-out at some point before failure.


# ls -ltTU /var/crash/
total 589220
-rw-------  1 markmi  wheel  4718592 Feb 27 18:31:00 2017 sh.82389.core
-rw-------  1 root    wheel  4812800 Feb 27 18:30:59 2017 su.82391.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:35 2017 sh.72611.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:34 2017 sh.72613.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:34 2017 sh.72614.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:33 2017 sh.72615.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:32 2017 sh.72616.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:31 2017 sh.72617.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:30 2017 sh.72618.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:29 2017 sh.72619.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:28 2017 sh.72620.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:28 2017 sh.72621.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:27 2017 sh.72622.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:26 2017 sh.72623.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:25 2017 sh.72624.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:24 2017 sh.72625.core
-rw-------  1 markmi  wheel  4718592 Feb 27 18:20:20 2017 sh.167.core
-rw-------  1 root    wheel  4718592 Feb 27 18:20:19 2017 sh.170.core
-rw-------  1 root    wheel  4718592 Feb 27 18:20:18 2017 sh.171.core
-rw-------  1 root    wheel  4718592 Feb 27 18:20:17 2017 sh.172.core
-rw-------  1 root    wheel  4718592 Feb 27 18:20:16 2017 sh.173.core
-rw-------  1 root    wheel  4718592 Feb 27 18:20:16 2017 sh.174.core
-rw-------  1 root    wheel  4718592 Feb 27 18:20:15 2017 sh.175.core
-rw-------  1 root    wheel  4718592 Feb 27 18:20:14 2017 sh.176.core
-rw-------  1 root    wheel  4718592 Feb 27 18:20:13 2017 sh.177.core
-rw-------  1 root    wheel  4718592 Feb 27 18:20:12 2017 sh.178.core
. . . (from prior investigative activity) . . .

Note it was about 10 minutes after the two nested examples before I
existed the one that only had the su --at which point it detected
the Failed assertion: "tsd_booted" as well (su and sh, su first).


In other experiments I found that it was when buildworld activity
caused swapping out of the failing processes that there were the
later failures.



I can not claim swapping out and back in is the only context
with issues but it has been true of every such failure that
I've looked at since I made this discovery.

My context has a USB SSD for the root filesystem and the swap
parition.

=3D=3D=3D
Mark Millard
markmi at dsl-only.net