From owner-freebsd-ppc@freebsd.org Thu May 7 19:06:38 2020 Return-Path: Delivered-To: freebsd-ppc@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C19152E1E9D for ; Thu, 7 May 2020 19:06:38 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic304-25.consmr.mail.gq1.yahoo.com (sonic304-25.consmr.mail.gq1.yahoo.com [98.137.68.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 49J2xn1xzKz4Zqh for ; Thu, 7 May 2020 19:06:37 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: ymUHbHAVM1nxMvrnBetflgticeA9_e1fEHsRRHVMHn0m2qCUC.tTFaDDeMuvFXO JMGVC3bkuHxMQ3Gh2xCgvxW4jAGW6JsVsVXrx8HToAGqGgMVCwx1cYVzQ9BNdiu1NFjhrJMUFNX3 KOPexm9XqKb_62elZnxldMtv6kG67pexLkfYuhEK9GdrVmZoPQPmNmfeTxtTlAdcoHeStPre1w2O Ce0doYJbKpkR39sSZm7clKruyOcUtyvXTNQKFn7L63qLYm.IDcrwekGcccVJct0Ihcw2hnVctlXD VSEssj9Dyqr2B8XK9PhIyZSiXkN.kIiXSKma5esYqfAREwas0lVoLKmkQ1LJLvMFnI2a630GjD9L bUJeisuATRy2OHPRzFKAlAyymPTwMzg23QeWoBQKvLBwzp_JsDmBQOzRWZOdGWkXkJAQAJghNNjL aWdUNOc5seAEdCA7qBs7Em9miC5WHxmYlNanPibU9x0x2Y5Ih_9c1MdOB8edHM3MfA0BmDcw2e3A A4.xhSVnMHvFB2kl9IwAlP1jFpNEoDUbsMwYPJaoSvuzv2BSwPcGvGqHkX870PPgfrDj32d7O7hn ul_vXE4u42qUxd.KMVyI27S7uGVNDXmzyuT2d6ZGROYtvob__cL6xBmNhT7C_81b5aCwSyJOUbTs cx9f__cSs8bOGlC6sbtxRs8VqrLaxHyJKvoZvsbIfFkLs0M9ANzOXEzG5fllGBnUNFCI6bIS9WCx ZcllYQuVYqpLDvuhS3h00YppZc4LDLZN40bnsZj8VOafpMtgoC1fWEllCewltGw11jV.Had9Mfud f5LxYdj3CXef8em.11Zv.Pdt6qDYCD0har50cPmDgT13cSYL.O9GsbK5DeSxdNqciTJuv5LdcKob xgHlWPjb8dxR1fHYXgieiHprpzMBHpz_kL_K_kLqp0i6xs9QiMQ17mZ5mo5pVrRbyXro1TUn2Bvc pF2WP6I6vjM6G55BVesaEl.VAgS3mxTJVMps52zZfq2mw0BStsODtQZ3SAQKf7KEZlXoNx91xXLz WTkybn.q9c5UzDLu_pX1aJDr1fGQPp9asliXeImYT1l4DsySQU8fuGnx.k5RleIMVwH7cRVr1BOj .cPbQvbcsNyHLvizcNWwqzHzYpopO3_gkwVXcRYlerQgiUztzU.1zWxpYOa7HQXowhAs5bFoHsbl 7fMk0lceaJTOCuM1_z3Ab_FRuiMuIV6cheVf_Ai.QzEGOw3.XCr7gyDL1t61cLC8LPareeeH_x9E zl1CVhlvBMkpWhiJiXIpICBhd1uWUqLpd8oakTxub26b3x4Zzgk0gT5OlK6YeMs_vL58OepvSMNH nf9yMHViwKnvByXTPJAQsxNKB8TrDLI7Z5PwSOpjXusG59yUk6.nEwKCAlr3DT5uOpYO1ufMP5XF PZSUUB1BRwmXrx2to51w4MVcqCV.DLSwqh7fFh9IaiCi8sY5qZObGhxPpPGX_.w-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.gq1.yahoo.com with HTTP; Thu, 7 May 2020 19:06:35 +0000 Received: by smtp434.mail.bf1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID f7716cd2133ed3096a69b5432fcac9e7; Thu, 07 May 2020 19:06:34 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Subject: Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311 From: Mark Millard In-Reply-To: <18E62746-80DB-4195-977D-4FF32D0129EE@yahoo.com> Date: Thu, 7 May 2020 12:06:15 -0700 Cc: Brandon Bergren , Justin Hibbits Content-Transfer-Encoding: quoted-printable Message-Id: References: <8479DD58-44F6-446A-9CA5-D01F0F7C1B38@yahoo.com> <17ACDA02-D7EF-4F26-874A-BB3E935CD072@yahoo.com> <695E6836-F860-4557-B7DE-CC1EDB347F18@yahoo.com> <121B9B09-141B-4DC3-918B-1E7CFB99E779@yahoo.com> <8AAB0462-3FA8-490C-8D8D-7C15B1C9E2DE@yahoo.com> <18E62746-80DB-4195-977D-4FF32D0129EE@yahoo.com> To: "vangyzen@freebsd.org" , svn-src-head@freebsd.org, FreeBSD Current , FreeBSD Hackers , FreeBSD PowerPC ML X-Mailer: Apple Mail (2.3608.80.23.2.2) X-Rspamd-Queue-Id: 49J2xn1xzKz4Zqh X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.49 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; RCPT_COUNT_SEVEN(0.00)[7]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.99)[-0.993,0]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.00)[ip: (-0.08), ipnet: 98.137.64.0/21(0.83), asn: 36647(0.66), country: US(-0.05)]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[206.68.137.98.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[206.68.137.98.rep.mailspike.net : 127.0.0.17]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 May 2020 19:06:38 -0000 [mountd failure example: also at sz_size2index_lookup assert for the same zero'd memory problem.] > On 2020-May-7, at 00:46, Mark Millard wrote: >=20 > [__je_sz_size2index_tab seems messed up in 2 of the > asserting contexts: first 384 are zero in both. More > before that is also messed up (all zero). I show the > details later below.] >=20 > On 2020-May-6, at 16:57, Mark Millard wrote: >=20 >> [This explores process crashes that happen during system >> shutdown, in a context not having MALLOC_PRODUCTION=3D . >> So assert failures are reported as the stopping points.] >>=20 >> It looks like shutdown -p now, shutdown -r now, and the >> like can lead some processes to assert during their exit >> attempt, including a sshd failure (that I've not seen >> before), rpcbind, and nfsd. I show information about the >> observed asserts for those below. >>=20 >>=20 >> sshd hit an assert, failing slab =3D=3D extent_slab_get(extent) : >>=20 >> (gdb) bt=20 >> #0 thr_kill () at thr_kill.S:4 >> #1 0x50927170 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52 >> #2 0x50886cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 >> #3 0x508834b0 in arena_dalloc (tsdn=3D, = ptr=3D, tcache=3D, alloc_ctx=3D, slow_path=3D) >> at = /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:315 >> #4 idalloctm (tsdn=3D0x500dd040, ptr=3D0x5008a180, = tcache=3D0x500dd160, alloc_ctx=3D, is_internal=3D, slow_path=3D) >> at = /usr/src/contrib/jemalloc/include/jemalloc/internal/jemalloc_internal_inli= nes_c.h:118 >> #5 0x5087b0a4 in ifree (tsd=3D0x500dd040, ptr=3D0x5008a180, = tcache=3D0x500dd160, slow_path=3D) at = jemalloc_jemalloc.c:2590 >> #6 0x5087acac in __je_free_default (ptr=3D0x5008a180) at = jemalloc_jemalloc.c:2784 >> #7 0x5087b294 in __free (ptr=3D0x5008a180) at = jemalloc_jemalloc.c:2852 >> #8 0x10029464 in server_accept_loop (config_s=3D, = sock_in=3D, sock_out=3D, = newsock=3D) at /usr/src/crypto/openssh/sshd.c:1185 >> #9 main (ac=3D, av=3D0xffffde3c) at = /usr/src/crypto/openssh/sshd.c:2009 >>=20 >> . . . >> (gdb) up >> #2 0x50886cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 >> 67 (void)raise(SIGABRT); >> (gdb) up >> #3 0x508834b0 in arena_dalloc (tsdn=3D, = ptr=3D, tcache=3D, alloc_ctx=3D, slow_path=3D) >> at = /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:315 >> 315 assert(slab =3D=3D extent_slab_get(extent)); >>=20 >> (gdb) list >> 310 rtree_ctx =3D tsd_rtree_ctx(tsdn_tsd(tsdn)); >> 311 extent_t *extent =3D rtree_extent_read(tsdn, = &extents_rtree, >> 312 rtree_ctx, (uintptr_t)ptr, true); >> 313 assert(szind =3D=3D extent_szind_get(extent)); >> 314 assert(szind < SC_NSIZES); >> 315 assert(slab =3D=3D extent_slab_get(extent)); >> 316 } >> 317=09 >> 318 if (likely(slab)) { >> 319 /* Small allocation. */ >>=20 >> More fully: >>=20 >> 285 JEMALLOC_ALWAYS_INLINE void >> 286 arena_dalloc(tsdn_t *tsdn, void *ptr, tcache_t *tcache, >> 287 alloc_ctx_t *alloc_ctx, bool slow_path) { >> 288 assert(!tsdn_null(tsdn) || tcache =3D=3D NULL); >> 289 assert(ptr !=3D NULL); >> 290=09 >> 291 if (unlikely(tcache =3D=3D NULL)) { >> 292 arena_dalloc_no_tcache(tsdn, ptr); >> 293 return; >> 294 } >> 295=09 >> 296 szind_t szind; >> 297 bool slab; >> 298 rtree_ctx_t *rtree_ctx; >> 299 if (alloc_ctx !=3D NULL) { >> 300 szind =3D alloc_ctx->szind; >> 301 slab =3D alloc_ctx->slab; >> 302 assert(szind !=3D SC_NSIZES); >> 303 } else { >> 304 rtree_ctx =3D tsd_rtree_ctx(tsdn_tsd(tsdn)); >> 305 rtree_szind_slab_read(tsdn, &extents_rtree, = rtree_ctx, >> 306 (uintptr_t)ptr, true, &szind, &slab); >> 307 } >> 308=09 >> 309 if (config_debug) { >> 310 rtree_ctx =3D tsd_rtree_ctx(tsdn_tsd(tsdn)); >> 311 extent_t *extent =3D rtree_extent_read(tsdn, = &extents_rtree, >> 312 rtree_ctx, (uintptr_t)ptr, true); >> 313 assert(szind =3D=3D extent_szind_get(extent)); >> 314 assert(szind < SC_NSIZES); >> 315 assert(slab =3D=3D extent_slab_get(extent)); >> 316 } >> 317=09 >> 318 if (likely(slab)) { >> 319 /* Small allocation. */ >> 320 tcache_dalloc_small(tsdn_tsd(tsdn), tcache, ptr, = szind, >> 321 slow_path); >> 322 } else { >> 323 arena_dalloc_large(tsdn, ptr, tcache, szind, = slow_path); >> 324 } >> 325 } >=20 > The following are only shown for contrast with the later > cases of lots of zeros showing up where below shows > non-zero values (taken from sshd.core, which failed > differently): >=20 > (gdb) x/4x __je_arenas+16368/4 > 0x50981ab0 <__je_arenas+16368>: 0x00000000 0x00000000 = 0x00000000 0x00000009 > (gdb) print/x __je_arenas_lock=20 > $1 =3D {{{prof_data =3D {tot_wait_time =3D {ns =3D 0x0}, max_wait_time = =3D {ns =3D 0x0}, n_wait_times =3D 0x0, n_spin_acquired =3D 0x0, = max_n_thds =3D 0x0, n_waiting_thds =3D {repr =3D 0x0}, n_owner_switches = =3D 0x0,=20 > prev_owner =3D 0x0, n_lock_ops =3D 0x0}, lock =3D 0x5008bf00, = postponed_next =3D 0x50974070, locked =3D {repr =3D 0x0}}}, witness =3D = {name =3D 0x50760b04, rank =3D 0x3, comp =3D 0x0, opaque =3D 0x0, link =3D= { > qre_next =3D 0x50981b10, qre_prev =3D 0x50981b10}}, lock_order =3D = 0x0} > (gdb) print/x __je_narenas_auto > $2 =3D 0x8 > (gdb) print/x malloc_conf > $3 =3D 0x0 > (gdb) print/x __je_ncpus > $4 =3D 0x2 > (gdb) print/x __je_manual_arena_base > $5 =3D 0x9 > (gdb) print/x __je_sz_pind2sz_tab=20 > $6 =3D {0x1000, 0x2000, 0x3000, 0x4000, 0x5000, 0x6000, 0x7000, = 0x8000, 0xa000, 0xc000, 0xe000, 0x10000, 0x14000, 0x18000, 0x1c000, = 0x20000, 0x28000, 0x30000, 0x38000, 0x40000, 0x50000, 0x60000,=20 > 0x70000, 0x80000, 0xa0000, 0xc0000, 0xe0000, 0x100000, 0x140000, = 0x180000, 0x1c0000, 0x200000, 0x280000, 0x300000, 0x380000, 0x400000, = 0x500000, 0x600000, 0x700000, 0x800000, 0xa00000, 0xc00000,=20 > 0xe00000, 0x1000000, 0x1400000, 0x1800000, 0x1c00000, 0x2000000, = 0x2800000, 0x3000000, 0x3800000, 0x4000000, 0x5000000, 0x6000000, = 0x7000000, 0x8000000, 0xa000000, 0xc000000, 0xe000000,=20 > 0x10000000, 0x14000000, 0x18000000, 0x1c000000, 0x20000000, = 0x28000000, 0x30000000, 0x38000000, 0x40000000, 0x50000000, 0x60000000, = 0x70000000, 0x70001000} > (gdb) print/x __je_sz_index2size_tab > $7 =3D {0x8, 0x10, 0x20, 0x30, 0x40, 0x50, 0x60, 0x70, 0x80, 0xa0, = 0xc0, 0xe0, 0x100, 0x140, 0x180, 0x1c0, 0x200, 0x280, 0x300, 0x380, = 0x400, 0x500, 0x600, 0x700, 0x800, 0xa00, 0xc00, 0xe00, 0x1000,=20 > 0x1400, 0x1800, 0x1c00, 0x2000, 0x2800, 0x3000, 0x3800, 0x4000, = 0x5000, 0x6000, 0x7000, 0x8000, 0xa000, 0xc000, 0xe000, 0x10000, = 0x14000, 0x18000, 0x1c000, 0x20000, 0x28000, 0x30000, 0x38000,=20 > 0x40000, 0x50000, 0x60000, 0x70000, 0x80000, 0xa0000, 0xc0000, = 0xe0000, 0x100000, 0x140000, 0x180000, 0x1c0000, 0x200000, 0x280000, = 0x300000, 0x380000, 0x400000, 0x500000, 0x600000, 0x700000,=20 > 0x800000, 0xa00000, 0xc00000, 0xe00000, 0x1000000, 0x1400000, = 0x1800000, 0x1c00000, 0x2000000, 0x2800000, 0x3000000, 0x3800000, = 0x4000000, 0x5000000, 0x6000000, 0x7000000, 0x8000000, 0xa000000,=20 > 0xc000000, 0xe000000, 0x10000000, 0x14000000, 0x18000000, 0x1c000000, = 0x20000000, 0x28000000, 0x30000000, 0x38000000, 0x40000000, 0x50000000, = 0x60000000, 0x70000000} > (gdb) print/x __je_sz_size2index_tab > $2 =3D {0x0, 0x0, 0x1, 0x2, 0x2, 0x3, 0x3, 0x4, 0x4, 0x5, 0x5, 0x6, = 0x6, 0x7, 0x7, 0x8, 0x8, 0x9, 0x9, 0x9, 0x9, 0xa, 0xa, 0xa, 0xa, 0xb, = 0xb, 0xb, 0xb, 0xc, 0xc, 0xc, 0xc, 0xd, 0xd, 0xd, 0xd, 0xd,=20 > 0xd, 0xd, 0xd, 0xe, 0xe, 0xe, 0xe, 0xe, 0xe, 0xe, 0xe, 0xf, 0xf, 0xf, = 0xf, 0xf, 0xf, 0xf, 0xf, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, = 0x11 , 0x12 ,=20 > 0x13 , 0x14 , 0x15 , 0x16 , 0x17 , 0x18 , 0x19 ,=20 > 0x1a , 0x1b , 0x1c } >=20 >=20 >> rpcbind hit an assert, failing ret =3D=3D sz_size2index_compute(size) = : >>=20 >> (gdb) bt >> #0 thr_kill () at thr_kill.S:4 >> #1 0x502f2170 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52 >> #2 0x50251d04 in abort () at /usr/src/lib/libc/stdlib/abort.c:79 >> #3 0x5024f260 in sz_size2index_lookup (size=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:159 >> #4 sz_size2index (size=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:166 >> #5 imalloc_body (sopts=3D0xffffb360, dopts=3D0xffffb340, = tsd=3D0x5009a018) at jemalloc_jemalloc.c:2066 >> #6 0x50244874 in imalloc (sopts=3D0xffffb360, dopts=3D0xffffb340) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:331 >> #7 0x50244fe8 in __calloc (num=3D1, size=3D96) at = jemalloc_jemalloc.c:2498 >> #8 0x50265690 in svc_xprt_alloc () at = /usr/src/lib/libc/rpc/svc.c:541 >> #9 0x502635f4 in makefd_xprt (fd=3D14, sendsize=3D9000, = recvsize=3D9000) at /usr/src/lib/libc/rpc/svc_vc.c:250 >> #10 0x502644b4 in rendezvous_request (xprt=3D0x5004c000, = msg=3D) at /usr/src/lib/libc/rpc/svc_vc.c:315 >> #11 0x50265a98 in svc_getreq_common (fd=3D) at = /usr/src/lib/libc/rpc/svc.c:640 >> #12 0x50265d1c in svc_getreq_poll (pfdp=3D, = pollretval=3D1) at /usr/src/lib/libc/rpc/svc.c:739 >> #13 0x10018568 in my_svc_run () at = /usr/src/usr.sbin/rpcbind/rpcb_svc_com.c:1167 >> #14 0x10014ad8 in main (argc=3D, argv=3D) at /usr/src/usr.sbin/rpcbind/rpcbind.c:250 >> (gdb) up 3 >> #3 0x5024f260 in sz_size2index_lookup (size=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:159 >> 159 assert(ret =3D=3D sz_size2index_compute(size)); >> (gdb) print ret >> $1 =3D 0 >>=20 >> 154 JEMALLOC_ALWAYS_INLINE szind_t >> 155 sz_size2index_lookup(size_t size) { >> 156 assert(size <=3D SC_LOOKUP_MAXCLASS); >> 157 szind_t ret =3D (sz_size2index_tab[(size + (ZU(1) << = SC_LG_TINY_MIN) - 1) >> 158 >> SC_LG_TINY_MIN]); >> 159 assert(ret =3D=3D sz_size2index_compute(size)); >> 160 return ret; >> 161 } >=20 > gdb reports for sz_size2index_tab (really = JEMALLOC_N(sz_size2index_tab), > i.e., __je_sz_size2index_tab): >=20 > (gdb) print/x __je_sz_size2index_tab > $1 =3D {0x0 , 0x1a, 0x1b , 0x1c = } >=20 > Also: >=20 > (gdb) x/4x __je_arenas+16368/4 > 0x5034cab0 <__je_arenas+16368>: 0x00000000 0x00000000 = 0x00000000 0x00000000 > (gdb) print/x __je_arenas_lock = =20= > $8 =3D {{{prof_data =3D {tot_wait_time =3D {ns =3D 0x0}, max_wait_time = =3D {ns =3D 0x0}, n_wait_times =3D 0x0, n_spin_acquired =3D 0x0, = max_n_thds =3D 0x0, n_waiting_thds =3D {repr =3D 0x0}, n_owner_switches = =3D 0x0,=20 > prev_owner =3D 0x0, n_lock_ops =3D 0x0}, lock =3D 0x0, = postponed_next =3D 0x0, locked =3D {repr =3D 0x0}}}, witness =3D {name =3D= 0x0, rank =3D 0x0, comp =3D 0x0, opaque =3D 0x0, link =3D {qre_next =3D = 0x0,=20 > qre_prev =3D 0x0}}, lock_order =3D 0x0} > (gdb) print/x __je_narenas_auto > $9 =3D 0x0 > (gdb) print/x malloc_conf =20 > $10 =3D 0x0 > (gdb) print/x __je_ncpus=20 > $11 =3D 0x0 > (gdb) print/x __je_manual_arena_base > $12 =3D 0x0 > (gdb) print/x __je_sz_pind2sz_tab =20 > $13 =3D {0x0 } > (gdb) print/x __je_sz_index2size_tab > $14 =3D {0x0 } >=20 >=20 >> nfsd hit an assert, failing ret =3D=3D sz_size2index_compute(size) >=20 > [Correction: That should have referenced sz_index2size_lookup(index).] >=20 >> (also, but a different caller of sz_size2index): >=20 > [Correction: The "also" comment should be ignored: > sz_index2size_lookup(index) is referenced below.] >=20 >>=20 >> (gdb) bt >> #0 thr_kill () at thr_kill.S:4 >> #1 0x502b2170 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52 >> #2 0x50211cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 >> #3 0x50206104 in sz_index2size_lookup (index=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200 >> #4 sz_index2size (index=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207 >> #5 ifree (tsd=3D0x50094018, ptr=3D0x50041028, tcache=3D0x50094138, = slow_path=3D) at jemalloc_jemalloc.c:2583 >> #6 0x50205cac in __je_free_default (ptr=3D0x50041028) at = jemalloc_jemalloc.c:2784 >> #7 0x50206294 in __free (ptr=3D0x50041028) at = jemalloc_jemalloc.c:2852 >> #8 0x50287ec8 in ns_src_free (src=3D0x50329004, = srclistsize=3D) at /usr/src/lib/libc/net/nsdispatch.c:452 >> #9 ns_dbt_free (dbt=3D0x50329000) at = /usr/src/lib/libc/net/nsdispatch.c:436 >> #10 vector_free (vec=3D0x50329000, count=3D, esize=3D12,= free_elem=3D) at /usr/src/lib/libc/net/nsdispatch.c:253 >> #11 nss_atexit () at /usr/src/lib/libc/net/nsdispatch.c:578 >> #12 0x5028d958 in __cxa_finalize (dso=3D0x0) at = /usr/src/lib/libc/stdlib/atexit.c:240 >> #13 0x502117f8 in exit (status=3D0) at = /usr/src/lib/libc/stdlib/exit.c:74 >> #14 0x10013f9c in child_cleanup (signo=3D) at = /usr/src/usr.sbin/nfsd/nfsd.c:969 >> #15 >> #16 0x00000000 in ?? () >>=20 >> (gdb) up 3 >> #3 0x50206104 in sz_index2size_lookup (index=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200 >> 200 assert(ret =3D=3D sz_index2size_compute(index)); >>=20 >> (ret is optimized out.) >>=20 >> 197 JEMALLOC_ALWAYS_INLINE size_t >> 198 sz_index2size_lookup(szind_t index) { >> 199 size_t ret =3D (size_t)sz_index2size_tab[index]; >> 200 assert(ret =3D=3D sz_index2size_compute(index)); >> 201 return ret; >> 202 } >=20 > (gdb) print/x __je_sz_index2size_tab > $3 =3D {0x0 } >=20 > Also: >=20 > (gdb) x/4x __je_arenas+16368/4 > 0x5030cab0 <__je_arenas+16368>: 0x00000000 0x00000000 = 0x00000000 0x00000000 > (gdb) print/x __je_arenas_lock = =20= > $8 =3D {{{prof_data =3D {tot_wait_time =3D {ns =3D 0x0}, max_wait_time = =3D {ns =3D 0x0}, n_wait_times =3D 0x0, n_spin_acquired =3D 0x0, = max_n_thds =3D 0x0, n_waiting_thds =3D {repr =3D 0x0}, n_owner_switches = =3D 0x0,=20 > prev_owner =3D 0x0, n_lock_ops =3D 0x0}, lock =3D 0x0, = postponed_next =3D 0x0, locked =3D {repr =3D 0x0}}}, witness =3D {name =3D= 0x0, rank =3D 0x0, comp =3D 0x0, opaque =3D 0x0, link =3D {qre_next =3D = 0x0,=20 > qre_prev =3D 0x0}}, lock_order =3D 0x0} > (gdb) print/x __je_narenas_auto > $9 =3D 0x0 > (gdb) print/x malloc_conf =20 > $10 =3D 0x0 > (gdb) print/x __je_ncpus=20 > $11 =3D 0x0 > (gdb) print/x __je_manual_arena_base > $12 =3D 0x0 > (gdb) print/x __je_sz_pind2sz_tab =20 > $13 =3D {0x0 } > (gdb) print/x __je_sz_size2index_tab > $1 =3D {0x0 , 0x1a, 0x1b , 0x1c = } >=20 >> Booting and immediately trying something like: >>=20 >> service nfsd stop >>=20 >> did not lead to a failure. But may be after >> a while it would and be less drastic than a >> reboot or power down. >=20 > More detail: >=20 > So, for rpcbind and nfds at some point a large part of > __je_sz_size2index_tab is being stomped on, as is all of > __je_sz_index2size_tab and more. >=20 > For rpcbind, the following area is zero but in a > live process is not all-zero (I show the partially > non-zero live-process context instead of the all-zero > .core file content): >=20 > 0x5034cab0 <__je_arenas+16368>: 0x00000000 0x00000000 = 0x00000000 0x00000009 > 0x5034cac0 <__je_arenas_lock>: 0x00000000 0x00000000 = 0x00000000 0x00000000 > 0x5034cad0 <__je_arenas_lock+16>: 0x00000000 0x00000000 = 0x00000000 0x00000000 > 0x5034cae0 <__je_arenas_lock+32>: 0x00000000 0x00000000 = 0x00000000 0x00000000 > 0x5034caf0 <__je_arenas_lock+48>: 0x00000000 0x00000000 = 0x00000000 0x00000000 > 0x5034cb00 <__je_arenas_lock+64>: 0x00000000 0x5033f070 = 0x00000000 0x00000000 > 0x5034cb10 <__je_arenas_lock+80>: 0x5012bb04 0x00000003 = 0x00000000 0x00000000 > 0x5034cb20 <__je_arenas_lock+96>: 0x5034cb10 0x5034cb10 = 0x00000000 0x00000000 >=20 > Then the memory in the crash continues to be zero until: >=20 > 0x5034d000 <__je_sz_size2index_tab+384>: 0x1a1b1b1b = 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b >=20 > Notice the interesting page boundary for where non-zero > is first available again! >=20 > Between __je_arenas_lock and __je_sz_size2index_tab are: >=20 > 0x5034cb30 __je_narenas_auto > 0x5034cb38 malloc_conf > 0x5034cb3c __je_ncpus > 0x5034cb40 __je_manual_arena_base > 0x5034cb80 __je_sz_pind2sz_tab > 0x5034ccc0 __je_sz_index2size_tab > 0x5034ce80 __je_sz_size2index_tab >=20 > For nfsd, it is similar (again showing the partially > non-zero live process context instead of the all-zeros > from the .core file): >=20 > 0x5030cab0 <__je_arenas+16368>: 0x00000000 0x00000000 = 0x00000000 0x00000009 > 0x5030cac0 <__je_arenas_lock>: 0x00000000 0x00000000 = 0x00000000 0x00000000 > 0x5030cad0 <__je_arenas_lock+16>: 0x00000000 0x00000000 = 0x00000000 0x00000000 > 0x5030cae0 <__je_arenas_lock+32>: 0x00000000 0x00000000 = 0x00000000 0x00000000 > 0x5030caf0 <__je_arenas_lock+48>: 0x00000000 0x00000000 = 0x00000000 0x00000000 > 0x5030cb00 <__je_arenas_lock+64>: 0x00000000 0x502ff070 = 0x00000000 0x00000000 > 0x5030cb10 <__je_arenas_lock+80>: 0x500ebb04 0x00000003 = 0x00000000 0x00000000 > 0x5030cb20 <__je_arenas_lock+96>: 0x5030cb10 0x5030cb10 = 0x00000000 0x00000000 >=20 > Then the memory in the crash continues to be zero until: >=20 > 0x5030d000 <__je_sz_size2index_tab+384>: 0x1a1b1b1b = 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b >=20 > Notice the interesting page boundary for where non-zero > is first available again! >=20 > Between __je_arenas_lock and __je_sz_size2index_tab are: >=20 > 0x5030cb30 __je_narenas_auto > 0x5030cb38 malloc_conf > 0x5030cb3c __je_ncpus > 0x5030cb40 __je_manual_arena_base > 0x5030cb80 __je_sz_pind2sz_tab > 0x5030ccc0 __je_sz_index2size_tab > 0x5030ce80 __je_sz_size2index_tab >=20 >=20 > Note: because __je_arenas is normally > mostly zero for these contexts, I can > not tell where the memory trashing > started, only where it replaced non-zero > values with zeros. >=20 I got a mountd assert failure in sz_size2index_lookup while attempting __calloc during makefd_xprt. (gdb) bt #0 thr_kill () at thr_kill.S:4 #1 0x50301170 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52 #2 0x50260cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 #3 0x5025e260 in sz_size2index_lookup (size=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:159 #4 sz_size2index (size=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:166 #5 imalloc_body (sopts=3D0xffffd000, dopts=3D0xffffcfe0, = tsd=3D0x50094018) at jemalloc_jemalloc.c:2066 #6 0x50253874 in imalloc (sopts=3D0xffffd000, dopts=3D0xffffcfe0) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:331 #7 0x50253fe8 in __calloc (num=3D1, size=3D96) at = jemalloc_jemalloc.c:2498 #8 0x50274690 in svc_xprt_alloc () at /usr/src/lib/libc/rpc/svc.c:541 #9 0x502725f4 in makefd_xprt (fd=3D10, sendsize=3D9000, recvsize=3D9000) = at /usr/src/lib/libc/rpc/svc_vc.c:250 #10 0x502734b4 in rendezvous_request (xprt=3D0x5007b120, msg=3D) at /usr/src/lib/libc/rpc/svc_vc.c:315 #11 0x50274a98 in svc_getreq_common (fd=3D) at = /usr/src/lib/libc/rpc/svc.c:640 #12 0x502748e0 in svc_getreqset (readfds=3D) at = /usr/src/lib/libc/rpc/svc.c:611 #13 0x1001434c in main (argc=3D, argv=3D0xffffde3c) at = /usr/src/usr.sbin/mountd/mountd.c:683 (gdb) up 3 #3 0x5025e260 in sz_size2index_lookup (size=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:159 159 assert(ret =3D=3D sz_size2index_compute(size)); Again there is that area of memory that has been zeroed, with the same stopping point in __je_sz_size2index_tab: (gdb) print/x __je_narenas_auto $2 =3D 0x0 (gdb) print/x malloc_conf=20 $3 =3D 0x0 (gdb) print/x __je_ncpus $4 =3D 0x0 (gdb) print/x __je_manual_arena_base $5 =3D 0x0 (gdb) print/x __je_sz_pind2sz_tab $6 =3D {0x0 } (gdb) print/x __je_sz_size2index_tab $7 =3D {0x0 , 0x1a, 0x1b , 0x1c = } Showing where the zeroing stopped: . . . (gdb) x/156x __je_sz_size2index_tab 0x5035be80 <__je_sz_size2index_tab>: 0x00000000 0x00000000 = 0x00000000 0x00000000 0x5035be90 <__je_sz_size2index_tab+16>: 0x00000000 0x00000000 = 0x00000000 0x00000000 0x5035bea0 <__je_sz_size2index_tab+32>: 0x00000000 0x00000000 = 0x00000000 0x00000000 0x5035beb0 <__je_sz_size2index_tab+48>: 0x00000000 0x00000000 = 0x00000000 0x00000000 0x5035bec0 <__je_sz_size2index_tab+64>: 0x00000000 0x00000000 = 0x00000000 0x00000000 0x5035bed0 <__je_sz_size2index_tab+80>: 0x00000000 0x00000000 = 0x00000000 0x00000000 0x5035bee0 <__je_sz_size2index_tab+96>: 0x00000000 0x00000000 = 0x00000000 0x00000000 0x5035bef0 <__je_sz_size2index_tab+112>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bf00 <__je_sz_size2index_tab+128>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bf10 <__je_sz_size2index_tab+144>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bf20 <__je_sz_size2index_tab+160>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bf30 <__je_sz_size2index_tab+176>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bf40 <__je_sz_size2index_tab+192>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bf50 <__je_sz_size2index_tab+208>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bf60 <__je_sz_size2index_tab+224>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bf70 <__je_sz_size2index_tab+240>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bf80 <__je_sz_size2index_tab+256>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bf90 <__je_sz_size2index_tab+272>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bfa0 <__je_sz_size2index_tab+288>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bfb0 <__je_sz_size2index_tab+304>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bfc0 <__je_sz_size2index_tab+320>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bfd0 <__je_sz_size2index_tab+336>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bfe0 <__je_sz_size2index_tab+352>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035bff0 <__je_sz_size2index_tab+368>: 0x00000000 = 0x00000000 0x00000000 0x00000000 0x5035c000 <__je_sz_size2index_tab+384>: 0x1a1b1b1b = 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x5035c010 <__je_sz_size2index_tab+400>: 0x1b1b1b1b = 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x5035c020 <__je_sz_size2index_tab+416>: 0x1b1b1b1b = 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x5035c030 <__je_sz_size2index_tab+432>: 0x1b1b1b1b = 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x5035c040 <__je_sz_size2index_tab+448>: 0x1b1c1c1c = 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x5035c050 <__je_sz_size2index_tab+464>: 0x1c1c1c1c = 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x5035c060 <__je_sz_size2index_tab+480>: 0x1c1c1c1c = 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x5035c070 <__je_sz_size2index_tab+496>: 0x1c1c1c1c = 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x5035c080 <__je_sz_size2index_tab+512>: 0x1c000000 = 0x00000000 0x50303474 0x00000000 0x5035c090: 0x00000000 0x00000000 0x00000000 = 0x00000000 0x5035c0a0: 0x00000000 0x00000000 0x00000000 = 0x00000000 0x5035c0b0: 0x00000000 0x00000000 0x00000000 = 0x00000000 0x5035c0c0: 0x00000000 0x00000000 0x00000000 = 0x00000000 0x5035c0d0: 0x00000000 0x00000000 0x00000000 = 0x00000000 0x5035c0e0: 0x00000000 0x00000000 0x00000000 = 0x00000000 Again: a nice page boundary: 0x5035c000 for where the zeros stop. Note that, despite the address ranges shifting around between programs, the same global variables are being stomped on, stopping at the same index into __je_sz_size2index_tab in each of the 3 programs. It always is page aligned there in my context. (The sshd example is different. I've yet to explore it much.) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)