From owner-freebsd-stable@freebsd.org Tue Feb 28 03:21:48 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DCB95CF08B9 for ; Tue, 28 Feb 2017 03:21:48 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-89.reflexion.net [208.70.210.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9DF5A797 for ; Tue, 28 Feb 2017 03:21:48 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 29151 invoked from network); 28 Feb 2017 03:21:40 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 28 Feb 2017 03:21:40 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.30.0) with SMTP; Mon, 27 Feb 2017 22:21:41 -0500 (EST) Received: (qmail 1440 invoked from network); 28 Feb 2017 03:21:40 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 28 Feb 2017 03:21:40 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 06F4AEC7822; Mon, 27 Feb 2017 19:21:39 -0800 (PST) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: arm64 vs. jemalloc and swapping in and out, sh/su examples: being swapped out leads to later Failed assertion: "tsd_booted" after being swapped in Message-Id: <6A5AE299-8594-45CC-A801-F0B1125BB042@dsl-only.net> Date: Mon, 27 Feb 2017 19:21:39 -0800 Cc: FreeBSD-STABLE Mailing List , jasone@FreeBSD.org, Shawn Webb To: freebsd-arm , FreeBSD Current X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Feb 2017 03:21:49 -0000 [I've added a variant of this material to bugzilla 217138.] I've reduced the testing context to the following type of example (no longer involving buildworld buildkernel): # sh # sh # sh . . . # sh (So a bunch of nested sh's in an ssh session that will have most swapped out. I happened to have done this under 2 users for the example material below: in one I did an su first. Plus I had another session without such a nesting but with an su: this is the one were I ran stress.) After forcing these to mostly swap out (see below) I used ^D or exit exit a currently interactive one. Then the rest of the swapped out ones from the nesting get the tsd_booted failure. So, for example: # stress -m 1 --vm-bytes 1536M The context is a PINE64+ with 2GB of RAM. The above was enough in my context to cause the needed swapouts: # ps -aOuser,flags PID USER F TT STAT TIME COMMAND 688 root 4102 u0 IWs 0:00.00 login [pam] (login) 689 root 10004002 u0 I+ 0:00.06 -sh (sh) 72611 markmi 4002 0 IWs 0:00.00 -sh (sh) 72613 markmi 4002 0 IW 0:00.00 sh 72614 markmi 4002 0 IW 0:00.00 sh 72615 markmi 4002 0 IW 0:00.00 sh 72616 markmi 4002 0 IW 0:00.00 sh 72617 markmi 4002 0 IW 0:00.00 sh 72618 markmi 4002 0 IW 0:00.00 sh 72619 markmi 4002 0 IW 0:00.00 sh 72620 markmi 4002 0 IW 0:00.00 sh 72621 markmi 4002 0 IW 0:00.00 sh 72622 markmi 4002 0 IW 0:00.00 sh 72623 markmi 4002 0 IW 0:00.00 sh 72624 markmi 4002 0 IW 0:00.00 sh 72625 markmi 4002 0 IW 0:00.00 sh 72626 markmi 10004002 0 I+ 0:00.01 sh 167 markmi 4002 1 IWs 0:00.00 -sh (sh) 169 root 4102 1 IW 0:00.00 su 170 root 4002 1 IW 0:00.00 su (sh) 171 root 4002 1 IW 0:00.00 sh 172 root 4002 1 IW 0:00.00 sh 173 root 4002 1 IW 0:00.00 sh 174 root 4002 1 IW 0:00.00 sh 175 root 4002 1 IW 0:00.00 sh 176 root 4002 1 IW 0:00.00 sh 177 root 4002 1 IW 0:00.00 sh 178 root 4002 1 IW 0:00.00 sh 179 root 10004002 1 I+ 0:00.01 sh 60961 root 4002 2 IW 0:00.00 stress -m 1 --vm-bytes 1536M 60962 root 10000002 2 R 0:29.41 stress -m 1 --vm-bytes 1536M 60964 root 10004002 2 R+ 0:00.01 ps -aOuser,flags 82389 markmi 4002 2 IWs 0:00.00 -sh (sh) 82391 root 4102 2 IW 0:00.00 su 82392 root 10004002 2 S 0:00.22 su (sh) So with that swapped out context established: (The markmi user case of nested sh's:) $ ^D: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" (The su'd user case of nested sh's:) # ^D: = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" The one without the deeply nested sh's (but an su): # exit : = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed = assertion: "tsd_booted" So far I've not gotten such without having had the failing processes having been swapped-out at some point before failure. # ls -ltTU /var/crash/ total 589220 -rw------- 1 markmi wheel 4718592 Feb 27 18:31:00 2017 sh.82389.core -rw------- 1 root wheel 4812800 Feb 27 18:30:59 2017 su.82391.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:35 2017 sh.72611.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:34 2017 sh.72613.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:34 2017 sh.72614.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:33 2017 sh.72615.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:32 2017 sh.72616.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:31 2017 sh.72617.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:30 2017 sh.72618.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:29 2017 sh.72619.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:28 2017 sh.72620.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:28 2017 sh.72621.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:27 2017 sh.72622.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:26 2017 sh.72623.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:25 2017 sh.72624.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:24 2017 sh.72625.core -rw------- 1 markmi wheel 4718592 Feb 27 18:20:20 2017 sh.167.core -rw------- 1 root wheel 4718592 Feb 27 18:20:19 2017 sh.170.core -rw------- 1 root wheel 4718592 Feb 27 18:20:18 2017 sh.171.core -rw------- 1 root wheel 4718592 Feb 27 18:20:17 2017 sh.172.core -rw------- 1 root wheel 4718592 Feb 27 18:20:16 2017 sh.173.core -rw------- 1 root wheel 4718592 Feb 27 18:20:16 2017 sh.174.core -rw------- 1 root wheel 4718592 Feb 27 18:20:15 2017 sh.175.core -rw------- 1 root wheel 4718592 Feb 27 18:20:14 2017 sh.176.core -rw------- 1 root wheel 4718592 Feb 27 18:20:13 2017 sh.177.core -rw------- 1 root wheel 4718592 Feb 27 18:20:12 2017 sh.178.core . . . (from prior investigative activity) . . . Note it was about 10 minutes after the two nested examples before I existed the one that only had the su --at which point it detected the Failed assertion: "tsd_booted" as well (su and sh, su first). In other experiments I found that it was when buildworld activity caused swapping out of the failing processes that there were the later failures. I can not claim swapping out and back in is the only context with issues but it has been true of every such failure that I've looked at since I made this discovery. My context has a USB SSD for the root filesystem and the swap parition. =3D=3D=3D Mark Millard markmi at dsl-only.net