Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 May 2020 12:06:15 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        "vangyzen@freebsd.org" <vangyzen@FreeBSD.org>, svn-src-head@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Cc:        Brandon Bergren <bdragon@FreeBSD.org>, Justin Hibbits <chmeeedalf@gmail.com>
Subject:   Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311
Message-ID:  <F5953A6B-56CE-4D1C-8C18-58D44B639881@yahoo.com>
In-Reply-To: <18E62746-80DB-4195-977D-4FF32D0129EE@yahoo.com>
References:  <C24EE1A1-FAED-42C2-8204-CA7B1D20A369@yahoo.com> <8479DD58-44F6-446A-9CA5-D01F0F7C1B38@yahoo.com> <17ACDA02-D7EF-4F26-874A-BB3E935CD072@yahoo.com> <695E6836-F860-4557-B7DE-CC1EDB347F18@yahoo.com> <DCABCD83-27B0-4F2D-9410-69102294A98E@yahoo.com> <121B9B09-141B-4DC3-918B-1E7CFB99E779@yahoo.com> <8AAB0462-3FA8-490C-8D8D-7C15B1C9E2DE@yahoo.com> <18E62746-80DB-4195-977D-4FF32D0129EE@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[mountd failure example: also at sz_size2index_lookup assert
for the same zero'd memory problem.]

> On 2020-May-7, at 00:46, Mark Millard <marklmi@yahoo.com> wrote:
>=20
> [__je_sz_size2index_tab seems messed up in 2 of the
> asserting contexts: first 384 are zero in both. More
> before that is also messed up (all zero). I show the
> details later below.]
>=20
> On 2020-May-6, at 16:57, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> [This explores process crashes that happen during system
>> shutdown, in a context not having MALLOC_PRODUCTION=3D .
>> So assert failures are reported as the stopping points.]
>>=20
>> It looks like shutdown -p now, shutdown -r now, and the
>> like can lead some processes to assert during their exit
>> attempt, including a sshd failure (that I've not seen
>> before), rpcbind, and nfsd. I show information about the
>> observed asserts for those below.
>>=20
>>=20
>> sshd hit an assert, failing slab =3D=3D extent_slab_get(extent) :
>>=20
>> (gdb) bt=20
>> #0  thr_kill () at thr_kill.S:4
>> #1  0x50927170 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52
>> #2  0x50886cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
>> #3  0x508834b0 in arena_dalloc (tsdn=3D<optimized out>, =
ptr=3D<optimized out>, tcache=3D<optimized out>, alloc_ctx=3D<optimized =
out>, slow_path=3D<optimized out>)
>>   at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:315
>> #4  idalloctm (tsdn=3D0x500dd040, ptr=3D0x5008a180, =
tcache=3D0x500dd160, alloc_ctx=3D<optimized out>, is_internal=3D<optimized=
 out>, slow_path=3D<optimized out>)
>>   at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/jemalloc_internal_inli=
nes_c.h:118
>> #5  0x5087b0a4 in ifree (tsd=3D0x500dd040, ptr=3D0x5008a180, =
tcache=3D0x500dd160, slow_path=3D<optimized out>) at =
jemalloc_jemalloc.c:2590
>> #6  0x5087acac in __je_free_default (ptr=3D0x5008a180) at =
jemalloc_jemalloc.c:2784
>> #7  0x5087b294 in __free (ptr=3D0x5008a180) at =
jemalloc_jemalloc.c:2852
>> #8  0x10029464 in server_accept_loop (config_s=3D<optimized out>, =
sock_in=3D<optimized out>, sock_out=3D<optimized out>, =
newsock=3D<optimized out>) at /usr/src/crypto/openssh/sshd.c:1185
>> #9  main (ac=3D<optimized out>, av=3D0xffffde3c) at =
/usr/src/crypto/openssh/sshd.c:2009
>>=20
>> . . .
>> (gdb) up
>> #2  0x50886cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
>> 67		(void)raise(SIGABRT);
>> (gdb) up
>> #3  0x508834b0 in arena_dalloc (tsdn=3D<optimized out>, =
ptr=3D<optimized out>, tcache=3D<optimized out>, alloc_ctx=3D<optimized =
out>, slow_path=3D<optimized out>)
>>   at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:315
>> 315			assert(slab =3D=3D extent_slab_get(extent));
>>=20
>> (gdb) list
>> 310			rtree_ctx =3D tsd_rtree_ctx(tsdn_tsd(tsdn));
>> 311			extent_t *extent =3D rtree_extent_read(tsdn, =
&extents_rtree,
>> 312			    rtree_ctx, (uintptr_t)ptr, true);
>> 313			assert(szind =3D=3D extent_szind_get(extent));
>> 314			assert(szind < SC_NSIZES);
>> 315			assert(slab =3D=3D extent_slab_get(extent));
>> 316		}
>> 317=09
>> 318		if (likely(slab)) {
>> 319			/* Small allocation. */
>>=20
>> More fully:
>>=20
>> 285	JEMALLOC_ALWAYS_INLINE void
>> 286	arena_dalloc(tsdn_t *tsdn, void *ptr, tcache_t *tcache,
>> 287	    alloc_ctx_t *alloc_ctx, bool slow_path) {
>> 288		assert(!tsdn_null(tsdn) || tcache =3D=3D NULL);
>> 289		assert(ptr !=3D NULL);
>> 290=09
>> 291		if (unlikely(tcache =3D=3D NULL)) {
>> 292			arena_dalloc_no_tcache(tsdn, ptr);
>> 293			return;
>> 294		}
>> 295=09
>> 296		szind_t szind;
>> 297		bool slab;
>> 298		rtree_ctx_t *rtree_ctx;
>> 299		if (alloc_ctx !=3D NULL) {
>> 300			szind =3D alloc_ctx->szind;
>> 301			slab =3D alloc_ctx->slab;
>> 302			assert(szind !=3D SC_NSIZES);
>> 303		} else {
>> 304			rtree_ctx =3D tsd_rtree_ctx(tsdn_tsd(tsdn));
>> 305			rtree_szind_slab_read(tsdn, &extents_rtree, =
rtree_ctx,
>> 306			    (uintptr_t)ptr, true, &szind, &slab);
>> 307		}
>> 308=09
>> 309		if (config_debug) {
>> 310			rtree_ctx =3D tsd_rtree_ctx(tsdn_tsd(tsdn));
>> 311			extent_t *extent =3D rtree_extent_read(tsdn, =
&extents_rtree,
>> 312			    rtree_ctx, (uintptr_t)ptr, true);
>> 313			assert(szind =3D=3D extent_szind_get(extent));
>> 314			assert(szind < SC_NSIZES);
>> 315			assert(slab =3D=3D extent_slab_get(extent));
>> 316		}
>> 317=09
>> 318		if (likely(slab)) {
>> 319			/* Small allocation. */
>> 320			tcache_dalloc_small(tsdn_tsd(tsdn), tcache, ptr, =
szind,
>> 321			    slow_path);
>> 322		} else {
>> 323			arena_dalloc_large(tsdn, ptr, tcache, szind, =
slow_path);
>> 324		}
>> 325	}
>=20
> The following are only shown for contrast with the later
> cases of lots of zeros showing up where below shows
> non-zero values (taken from sshd.core, which failed
> differently):
>=20
> (gdb) x/4x __je_arenas+16368/4
> 0x50981ab0 <__je_arenas+16368>:	0x00000000	0x00000000	=
0x00000000	0x00000009
> (gdb) print/x __je_arenas_lock=20
> $1 =3D {{{prof_data =3D {tot_wait_time =3D {ns =3D 0x0}, max_wait_time =
=3D {ns =3D 0x0}, n_wait_times =3D 0x0, n_spin_acquired =3D 0x0, =
max_n_thds =3D 0x0, n_waiting_thds =3D {repr =3D 0x0}, n_owner_switches =
=3D 0x0,=20
>        prev_owner =3D 0x0, n_lock_ops =3D 0x0}, lock =3D 0x5008bf00, =
postponed_next =3D 0x50974070, locked =3D {repr =3D 0x0}}}, witness =3D =
{name =3D 0x50760b04, rank =3D 0x3, comp =3D 0x0, opaque =3D 0x0, link =3D=
 {
>      qre_next =3D 0x50981b10, qre_prev =3D 0x50981b10}}, lock_order =3D =
0x0}
> (gdb) print/x __je_narenas_auto
> $2 =3D 0x8
> (gdb) print/x malloc_conf
> $3 =3D 0x0
> (gdb) print/x __je_ncpus
> $4 =3D 0x2
> (gdb) print/x __je_manual_arena_base
> $5 =3D 0x9
> (gdb) print/x __je_sz_pind2sz_tab=20
> $6 =3D {0x1000, 0x2000, 0x3000, 0x4000, 0x5000, 0x6000, 0x7000, =
0x8000, 0xa000, 0xc000, 0xe000, 0x10000, 0x14000, 0x18000, 0x1c000, =
0x20000, 0x28000, 0x30000, 0x38000, 0x40000, 0x50000, 0x60000,=20
>  0x70000, 0x80000, 0xa0000, 0xc0000, 0xe0000, 0x100000, 0x140000, =
0x180000, 0x1c0000, 0x200000, 0x280000, 0x300000, 0x380000, 0x400000, =
0x500000, 0x600000, 0x700000, 0x800000, 0xa00000, 0xc00000,=20
>  0xe00000, 0x1000000, 0x1400000, 0x1800000, 0x1c00000, 0x2000000, =
0x2800000, 0x3000000, 0x3800000, 0x4000000, 0x5000000, 0x6000000, =
0x7000000, 0x8000000, 0xa000000, 0xc000000, 0xe000000,=20
>  0x10000000, 0x14000000, 0x18000000, 0x1c000000, 0x20000000, =
0x28000000, 0x30000000, 0x38000000, 0x40000000, 0x50000000, 0x60000000, =
0x70000000, 0x70001000}
> (gdb) print/x __je_sz_index2size_tab
> $7 =3D {0x8, 0x10, 0x20, 0x30, 0x40, 0x50, 0x60, 0x70, 0x80, 0xa0, =
0xc0, 0xe0, 0x100, 0x140, 0x180, 0x1c0, 0x200, 0x280, 0x300, 0x380, =
0x400, 0x500, 0x600, 0x700, 0x800, 0xa00, 0xc00, 0xe00, 0x1000,=20
>  0x1400, 0x1800, 0x1c00, 0x2000, 0x2800, 0x3000, 0x3800, 0x4000, =
0x5000, 0x6000, 0x7000, 0x8000, 0xa000, 0xc000, 0xe000, 0x10000, =
0x14000, 0x18000, 0x1c000, 0x20000, 0x28000, 0x30000, 0x38000,=20
>  0x40000, 0x50000, 0x60000, 0x70000, 0x80000, 0xa0000, 0xc0000, =
0xe0000, 0x100000, 0x140000, 0x180000, 0x1c0000, 0x200000, 0x280000, =
0x300000, 0x380000, 0x400000, 0x500000, 0x600000, 0x700000,=20
>  0x800000, 0xa00000, 0xc00000, 0xe00000, 0x1000000, 0x1400000, =
0x1800000, 0x1c00000, 0x2000000, 0x2800000, 0x3000000, 0x3800000, =
0x4000000, 0x5000000, 0x6000000, 0x7000000, 0x8000000, 0xa000000,=20
>  0xc000000, 0xe000000, 0x10000000, 0x14000000, 0x18000000, 0x1c000000, =
0x20000000, 0x28000000, 0x30000000, 0x38000000, 0x40000000, 0x50000000, =
0x60000000, 0x70000000}
> (gdb) print/x __je_sz_size2index_tab
> $2 =3D {0x0, 0x0, 0x1, 0x2, 0x2, 0x3, 0x3, 0x4, 0x4, 0x5, 0x5, 0x6, =
0x6, 0x7, 0x7, 0x8, 0x8, 0x9, 0x9, 0x9, 0x9, 0xa, 0xa, 0xa, 0xa, 0xb, =
0xb, 0xb, 0xb, 0xc, 0xc, 0xc, 0xc, 0xd, 0xd, 0xd, 0xd, 0xd,=20
>  0xd, 0xd, 0xd, 0xe, 0xe, 0xe, 0xe, 0xe, 0xe, 0xe, 0xe, 0xf, 0xf, 0xf, =
0xf, 0xf, 0xf, 0xf, 0xf, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, =
0x11 <repeats 16 times>, 0x12 <repeats 16 times>,=20
>  0x13 <repeats 16 times>, 0x14 <repeats 16 times>, 0x15 <repeats 32 =
times>, 0x16 <repeats 32 times>, 0x17 <repeats 32 times>, 0x18 <repeats =
32 times>, 0x19 <repeats 64 times>,=20
>  0x1a <repeats 64 times>, 0x1b <repeats 64 times>, 0x1c <repeats 64 =
times>}
>=20
>=20
>> rpcbind hit an assert, failing ret =3D=3D sz_size2index_compute(size) =
:
>>=20
>> (gdb) bt
>> #0  thr_kill () at thr_kill.S:4
>> #1  0x502f2170 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52
>> #2  0x50251d04 in abort () at /usr/src/lib/libc/stdlib/abort.c:79
>> #3  0x5024f260 in sz_size2index_lookup (size=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:159
>> #4  sz_size2index (size=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:166
>> #5  imalloc_body (sopts=3D0xffffb360, dopts=3D0xffffb340, =
tsd=3D0x5009a018) at jemalloc_jemalloc.c:2066
>> #6  0x50244874 in imalloc (sopts=3D0xffffb360, dopts=3D0xffffb340) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:331
>> #7  0x50244fe8 in __calloc (num=3D1, size=3D96) at =
jemalloc_jemalloc.c:2498
>> #8  0x50265690 in svc_xprt_alloc () at =
/usr/src/lib/libc/rpc/svc.c:541
>> #9  0x502635f4 in makefd_xprt (fd=3D14, sendsize=3D9000, =
recvsize=3D9000) at /usr/src/lib/libc/rpc/svc_vc.c:250
>> #10 0x502644b4 in rendezvous_request (xprt=3D0x5004c000, =
msg=3D<optimized out>) at /usr/src/lib/libc/rpc/svc_vc.c:315
>> #11 0x50265a98 in svc_getreq_common (fd=3D<optimized out>) at =
/usr/src/lib/libc/rpc/svc.c:640
>> #12 0x50265d1c in svc_getreq_poll (pfdp=3D<optimized out>, =
pollretval=3D1) at /usr/src/lib/libc/rpc/svc.c:739
>> #13 0x10018568 in my_svc_run () at =
/usr/src/usr.sbin/rpcbind/rpcb_svc_com.c:1167
>> #14 0x10014ad8 in main (argc=3D<optimized out>, argv=3D<optimized =
out>) at /usr/src/usr.sbin/rpcbind/rpcbind.c:250
>> (gdb) up 3
>> #3  0x5024f260 in sz_size2index_lookup (size=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:159
>> 159		assert(ret =3D=3D sz_size2index_compute(size));
>> (gdb) print ret
>> $1 =3D 0
>>=20
>> 154	JEMALLOC_ALWAYS_INLINE szind_t
>> 155	sz_size2index_lookup(size_t size) {
>> 156		assert(size <=3D SC_LOOKUP_MAXCLASS);
>> 157		szind_t ret =3D (sz_size2index_tab[(size + (ZU(1) << =
SC_LG_TINY_MIN) - 1)
>> 158						 >> SC_LG_TINY_MIN]);
>> 159		assert(ret =3D=3D sz_size2index_compute(size));
>> 160		return ret;
>> 161	}
>=20
> gdb reports for sz_size2index_tab (really =
JEMALLOC_N(sz_size2index_tab),
> i.e., __je_sz_size2index_tab):
>=20
> (gdb) print/x __je_sz_size2index_tab
> $1 =3D {0x0 <repeats 384 times>, 0x1a, 0x1b <repeats 64 times>, 0x1c =
<repeats 64 times>}
>=20
> Also:
>=20
> (gdb) x/4x __je_arenas+16368/4
> 0x5034cab0 <__je_arenas+16368>:	0x00000000	0x00000000	=
0x00000000	0x00000000
> (gdb) print/x __je_arenas_lock                                         =
                                                                       =20=

> $8 =3D {{{prof_data =3D {tot_wait_time =3D {ns =3D 0x0}, max_wait_time =
=3D {ns =3D 0x0}, n_wait_times =3D 0x0, n_spin_acquired =3D 0x0, =
max_n_thds =3D 0x0, n_waiting_thds =3D {repr =3D 0x0}, n_owner_switches =
=3D 0x0,=20
>        prev_owner =3D 0x0, n_lock_ops =3D 0x0}, lock =3D 0x0, =
postponed_next =3D 0x0, locked =3D {repr =3D 0x0}}}, witness =3D {name =3D=
 0x0, rank =3D 0x0, comp =3D 0x0, opaque =3D 0x0, link =3D {qre_next =3D =
0x0,=20
>      qre_prev =3D 0x0}}, lock_order =3D 0x0}
> (gdb) print/x __je_narenas_auto
> $9 =3D 0x0
> (gdb) print/x malloc_conf     =20
> $10 =3D 0x0
> (gdb) print/x __je_ncpus=20
> $11 =3D 0x0
> (gdb) print/x __je_manual_arena_base
> $12 =3D 0x0
> (gdb) print/x __je_sz_pind2sz_tab  =20
> $13 =3D {0x0 <repeats 72 times>}
> (gdb) print/x __je_sz_index2size_tab
> $14 =3D {0x0 <repeats 104 times>}
>=20
>=20
>> nfsd hit an assert, failing ret =3D=3D sz_size2index_compute(size)
>=20
> [Correction: That should have referenced sz_index2size_lookup(index).]
>=20
>> (also, but a different caller of sz_size2index):
>=20
> [Correction: The "also" comment should be ignored:
> sz_index2size_lookup(index) is referenced below.]
>=20
>>=20
>> (gdb) bt
>> #0  thr_kill () at thr_kill.S:4
>> #1  0x502b2170 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52
>> #2  0x50211cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
>> #3  0x50206104 in sz_index2size_lookup (index=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200
>> #4  sz_index2size (index=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207
>> #5  ifree (tsd=3D0x50094018, ptr=3D0x50041028, tcache=3D0x50094138, =
slow_path=3D<optimized out>) at jemalloc_jemalloc.c:2583
>> #6  0x50205cac in __je_free_default (ptr=3D0x50041028) at =
jemalloc_jemalloc.c:2784
>> #7  0x50206294 in __free (ptr=3D0x50041028) at =
jemalloc_jemalloc.c:2852
>> #8  0x50287ec8 in ns_src_free (src=3D0x50329004, =
srclistsize=3D<optimized out>) at /usr/src/lib/libc/net/nsdispatch.c:452
>> #9  ns_dbt_free (dbt=3D0x50329000) at =
/usr/src/lib/libc/net/nsdispatch.c:436
>> #10 vector_free (vec=3D0x50329000, count=3D<optimized out>, esize=3D12,=
 free_elem=3D<optimized out>) at /usr/src/lib/libc/net/nsdispatch.c:253
>> #11 nss_atexit () at /usr/src/lib/libc/net/nsdispatch.c:578
>> #12 0x5028d958 in __cxa_finalize (dso=3D0x0) at =
/usr/src/lib/libc/stdlib/atexit.c:240
>> #13 0x502117f8 in exit (status=3D0) at =
/usr/src/lib/libc/stdlib/exit.c:74
>> #14 0x10013f9c in child_cleanup (signo=3D<optimized out>) at =
/usr/src/usr.sbin/nfsd/nfsd.c:969
>> #15 <signal handler called>
>> #16 0x00000000 in ?? ()
>>=20
>> (gdb) up 3
>> #3  0x50206104 in sz_index2size_lookup (index=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200
>> 200		assert(ret =3D=3D sz_index2size_compute(index));
>>=20
>> (ret is optimized out.)
>>=20
>> 197	JEMALLOC_ALWAYS_INLINE size_t
>> 198	sz_index2size_lookup(szind_t index) {
>> 199		size_t ret =3D (size_t)sz_index2size_tab[index];
>> 200		assert(ret =3D=3D sz_index2size_compute(index));
>> 201		return ret;
>> 202	}
>=20
> (gdb) print/x __je_sz_index2size_tab
> $3 =3D {0x0 <repeats 104 times>}
>=20
> Also:
>=20
> (gdb) x/4x __je_arenas+16368/4
> 0x5030cab0 <__je_arenas+16368>:	0x00000000	0x00000000	=
0x00000000	0x00000000
> (gdb) print/x __je_arenas_lock                                         =
                                                                       =20=

> $8 =3D {{{prof_data =3D {tot_wait_time =3D {ns =3D 0x0}, max_wait_time =
=3D {ns =3D 0x0}, n_wait_times =3D 0x0, n_spin_acquired =3D 0x0, =
max_n_thds =3D 0x0, n_waiting_thds =3D {repr =3D 0x0}, n_owner_switches =
=3D 0x0,=20
>        prev_owner =3D 0x0, n_lock_ops =3D 0x0}, lock =3D 0x0, =
postponed_next =3D 0x0, locked =3D {repr =3D 0x0}}}, witness =3D {name =3D=
 0x0, rank =3D 0x0, comp =3D 0x0, opaque =3D 0x0, link =3D {qre_next =3D =
0x0,=20
>      qre_prev =3D 0x0}}, lock_order =3D 0x0}
> (gdb) print/x __je_narenas_auto
> $9 =3D 0x0
> (gdb) print/x malloc_conf     =20
> $10 =3D 0x0
> (gdb) print/x __je_ncpus=20
> $11 =3D 0x0
> (gdb) print/x __je_manual_arena_base
> $12 =3D 0x0
> (gdb) print/x __je_sz_pind2sz_tab  =20
> $13 =3D {0x0 <repeats 72 times>}
> (gdb) print/x __je_sz_size2index_tab
> $1 =3D {0x0 <repeats 384 times>, 0x1a, 0x1b <repeats 64 times>, 0x1c =
<repeats 64 times>}
>=20
>> Booting and immediately trying something like:
>>=20
>> service nfsd stop
>>=20
>> did not lead to a failure. But may be after
>> a while it would and be less drastic than a
>> reboot or power down.
>=20
> More detail:
>=20
> So, for rpcbind and nfds at some point a large part of
> __je_sz_size2index_tab is being stomped on, as is all of
> __je_sz_index2size_tab and more.
>=20
> For rpcbind, the following area is zero but in a
> live process is not all-zero (I show the partially
> non-zero live-process context instead of the all-zero
> .core file content):
>=20
> 0x5034cab0 <__je_arenas+16368>:	0x00000000	0x00000000	=
0x00000000	0x00000009
> 0x5034cac0 <__je_arenas_lock>:	0x00000000	0x00000000	=
0x00000000	0x00000000
> 0x5034cad0 <__je_arenas_lock+16>:	0x00000000	0x00000000	=
0x00000000	0x00000000
> 0x5034cae0 <__je_arenas_lock+32>:	0x00000000	0x00000000	=
0x00000000	0x00000000
> 0x5034caf0 <__je_arenas_lock+48>:	0x00000000	0x00000000	=
0x00000000	0x00000000
> 0x5034cb00 <__je_arenas_lock+64>:	0x00000000	0x5033f070	=
0x00000000	0x00000000
> 0x5034cb10 <__je_arenas_lock+80>:	0x5012bb04	0x00000003	=
0x00000000	0x00000000
> 0x5034cb20 <__je_arenas_lock+96>:	0x5034cb10	0x5034cb10	=
0x00000000	0x00000000
>=20
> Then the memory in the crash continues to be zero until:
>=20
> 0x5034d000 <__je_sz_size2index_tab+384>:	0x1a1b1b1b	=
0x1b1b1b1b	0x1b1b1b1b	0x1b1b1b1b
>=20
> Notice the interesting page boundary for where non-zero
> is first available again!
>=20
> Between __je_arenas_lock and __je_sz_size2index_tab are:
>=20
> 0x5034cb30 __je_narenas_auto
> 0x5034cb38 malloc_conf
> 0x5034cb3c __je_ncpus
> 0x5034cb40 __je_manual_arena_base
> 0x5034cb80 __je_sz_pind2sz_tab
> 0x5034ccc0 __je_sz_index2size_tab
> 0x5034ce80 __je_sz_size2index_tab
>=20
> For nfsd, it is similar (again showing the partially
> non-zero live process context instead of the all-zeros
> from the .core file):
>=20
> 0x5030cab0 <__je_arenas+16368>:	0x00000000	0x00000000	=
0x00000000	0x00000009
> 0x5030cac0 <__je_arenas_lock>:	0x00000000	0x00000000	=
0x00000000	0x00000000
> 0x5030cad0 <__je_arenas_lock+16>:	0x00000000	0x00000000	=
0x00000000	0x00000000
> 0x5030cae0 <__je_arenas_lock+32>:	0x00000000	0x00000000	=
0x00000000	0x00000000
> 0x5030caf0 <__je_arenas_lock+48>:	0x00000000	0x00000000	=
0x00000000	0x00000000
> 0x5030cb00 <__je_arenas_lock+64>:	0x00000000	0x502ff070	=
0x00000000	0x00000000
> 0x5030cb10 <__je_arenas_lock+80>:	0x500ebb04	0x00000003	=
0x00000000	0x00000000
> 0x5030cb20 <__je_arenas_lock+96>:	0x5030cb10	0x5030cb10	=
0x00000000	0x00000000
>=20
> Then the memory in the crash continues to be zero until:
>=20
> 0x5030d000 <__je_sz_size2index_tab+384>:	0x1a1b1b1b	=
0x1b1b1b1b	0x1b1b1b1b	0x1b1b1b1b
>=20
> Notice the interesting page boundary for where non-zero
> is first available again!
>=20
> Between __je_arenas_lock and __je_sz_size2index_tab are:
>=20
> 0x5030cb30 __je_narenas_auto
> 0x5030cb38 malloc_conf
> 0x5030cb3c __je_ncpus
> 0x5030cb40 __je_manual_arena_base
> 0x5030cb80 __je_sz_pind2sz_tab
> 0x5030ccc0 __je_sz_index2size_tab
> 0x5030ce80 __je_sz_size2index_tab
>=20
>=20
> Note: because __je_arenas is normally
> mostly zero for these contexts, I can
> not tell where the memory trashing
> started, only where it replaced non-zero
> values with zeros.
>=20

I got a mountd assert failure in sz_size2index_lookup
while attempting __calloc during makefd_xprt.

(gdb) bt
#0  thr_kill () at thr_kill.S:4
#1  0x50301170 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x50260cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x5025e260 in sz_size2index_lookup (size=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:159
#4  sz_size2index (size=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:166
#5  imalloc_body (sopts=3D0xffffd000, dopts=3D0xffffcfe0, =
tsd=3D0x50094018) at jemalloc_jemalloc.c:2066
#6  0x50253874 in imalloc (sopts=3D0xffffd000, dopts=3D0xffffcfe0) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:331
#7  0x50253fe8 in __calloc (num=3D1, size=3D96) at =
jemalloc_jemalloc.c:2498
#8  0x50274690 in svc_xprt_alloc () at /usr/src/lib/libc/rpc/svc.c:541
#9  0x502725f4 in makefd_xprt (fd=3D10, sendsize=3D9000, recvsize=3D9000) =
at /usr/src/lib/libc/rpc/svc_vc.c:250
#10 0x502734b4 in rendezvous_request (xprt=3D0x5007b120, msg=3D<optimized =
out>) at /usr/src/lib/libc/rpc/svc_vc.c:315
#11 0x50274a98 in svc_getreq_common (fd=3D<optimized out>) at =
/usr/src/lib/libc/rpc/svc.c:640
#12 0x502748e0 in svc_getreqset (readfds=3D<optimized out>) at =
/usr/src/lib/libc/rpc/svc.c:611
#13 0x1001434c in main (argc=3D<optimized out>, argv=3D0xffffde3c) at =
/usr/src/usr.sbin/mountd/mountd.c:683

(gdb) up 3
#3  0x5025e260 in sz_size2index_lookup (size=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:159
159		assert(ret =3D=3D sz_size2index_compute(size));

Again there is that area of memory that has been zeroed, with
the same stopping point in __je_sz_size2index_tab:

(gdb) print/x __je_narenas_auto
$2 =3D 0x0
(gdb) print/x malloc_conf=20
$3 =3D 0x0
(gdb) print/x __je_ncpus
$4 =3D 0x0
(gdb) print/x __je_manual_arena_base
$5 =3D 0x0
(gdb) print/x __je_sz_pind2sz_tab
$6 =3D {0x0 <repeats 72 times>}
(gdb) print/x __je_sz_size2index_tab
$7 =3D {0x0 <repeats 384 times>, 0x1a, 0x1b <repeats 64 times>, 0x1c =
<repeats 64 times>}

Showing where the zeroing stopped:

. . .
(gdb) x/156x __je_sz_size2index_tab
0x5035be80 <__je_sz_size2index_tab>:	0x00000000	0x00000000	=
0x00000000	0x00000000
0x5035be90 <__je_sz_size2index_tab+16>:	0x00000000	0x00000000	=
0x00000000	0x00000000
0x5035bea0 <__je_sz_size2index_tab+32>:	0x00000000	0x00000000	=
0x00000000	0x00000000
0x5035beb0 <__je_sz_size2index_tab+48>:	0x00000000	0x00000000	=
0x00000000	0x00000000
0x5035bec0 <__je_sz_size2index_tab+64>:	0x00000000	0x00000000	=
0x00000000	0x00000000
0x5035bed0 <__je_sz_size2index_tab+80>:	0x00000000	0x00000000	=
0x00000000	0x00000000
0x5035bee0 <__je_sz_size2index_tab+96>:	0x00000000	0x00000000	=
0x00000000	0x00000000
0x5035bef0 <__je_sz_size2index_tab+112>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bf00 <__je_sz_size2index_tab+128>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bf10 <__je_sz_size2index_tab+144>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bf20 <__je_sz_size2index_tab+160>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bf30 <__je_sz_size2index_tab+176>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bf40 <__je_sz_size2index_tab+192>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bf50 <__je_sz_size2index_tab+208>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bf60 <__je_sz_size2index_tab+224>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bf70 <__je_sz_size2index_tab+240>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bf80 <__je_sz_size2index_tab+256>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bf90 <__je_sz_size2index_tab+272>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bfa0 <__je_sz_size2index_tab+288>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bfb0 <__je_sz_size2index_tab+304>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bfc0 <__je_sz_size2index_tab+320>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bfd0 <__je_sz_size2index_tab+336>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bfe0 <__je_sz_size2index_tab+352>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035bff0 <__je_sz_size2index_tab+368>:	0x00000000	=
0x00000000	0x00000000	0x00000000
0x5035c000 <__je_sz_size2index_tab+384>:	0x1a1b1b1b	=
0x1b1b1b1b	0x1b1b1b1b	0x1b1b1b1b
0x5035c010 <__je_sz_size2index_tab+400>:	0x1b1b1b1b	=
0x1b1b1b1b	0x1b1b1b1b	0x1b1b1b1b
0x5035c020 <__je_sz_size2index_tab+416>:	0x1b1b1b1b	=
0x1b1b1b1b	0x1b1b1b1b	0x1b1b1b1b
0x5035c030 <__je_sz_size2index_tab+432>:	0x1b1b1b1b	=
0x1b1b1b1b	0x1b1b1b1b	0x1b1b1b1b
0x5035c040 <__je_sz_size2index_tab+448>:	0x1b1c1c1c	=
0x1c1c1c1c	0x1c1c1c1c	0x1c1c1c1c
0x5035c050 <__je_sz_size2index_tab+464>:	0x1c1c1c1c	=
0x1c1c1c1c	0x1c1c1c1c	0x1c1c1c1c
0x5035c060 <__je_sz_size2index_tab+480>:	0x1c1c1c1c	=
0x1c1c1c1c	0x1c1c1c1c	0x1c1c1c1c
0x5035c070 <__je_sz_size2index_tab+496>:	0x1c1c1c1c	=
0x1c1c1c1c	0x1c1c1c1c	0x1c1c1c1c
0x5035c080 <__je_sz_size2index_tab+512>:	0x1c000000	=
0x00000000	0x50303474	0x00000000
0x5035c090:	0x00000000	0x00000000	0x00000000	=
0x00000000
0x5035c0a0:	0x00000000	0x00000000	0x00000000	=
0x00000000
0x5035c0b0:	0x00000000	0x00000000	0x00000000	=
0x00000000
0x5035c0c0:	0x00000000	0x00000000	0x00000000	=
0x00000000
0x5035c0d0:	0x00000000	0x00000000	0x00000000	=
0x00000000
0x5035c0e0:	0x00000000	0x00000000	0x00000000	=
0x00000000

Again: a nice page boundary: 0x5035c000 for where the
zeros stop.

Note that, despite the address ranges shifting around
between programs, the same global variables are being
stomped on, stopping at the same index into
__je_sz_size2index_tab in each of the 3 programs. It
always is page aligned there in my context.

(The sshd example is different. I've yet to explore it
much.)



=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F5953A6B-56CE-4D1C-8C18-58D44B639881>