Date: Fri, 8 May 2020 20:58:02 -0700 From: Mark Millard <marklmi@yahoo.com> To: "vangyzen@freebsd.org" <vangyzen@FreeBSD.org>, svn-src-head@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Cc: Brandon Bergren <bdragon@FreeBSD.org>, Justin Hibbits <chmeeedalf@gmail.com> Subject: Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311 Message-ID: <D0C483E5-3F6A-4816-A6BA-3D2C82C24F8E@yahoo.com> In-Reply-To: <F5953A6B-56CE-4D1C-8C18-58D44B639881@yahoo.com> References: <C24EE1A1-FAED-42C2-8204-CA7B1D20A369@yahoo.com> <8479DD58-44F6-446A-9CA5-D01F0F7C1B38@yahoo.com> <17ACDA02-D7EF-4F26-874A-BB3E935CD072@yahoo.com> <695E6836-F860-4557-B7DE-CC1EDB347F18@yahoo.com> <DCABCD83-27B0-4F2D-9410-69102294A98E@yahoo.com> <121B9B09-141B-4DC3-918B-1E7CFB99E779@yahoo.com> <8AAB0462-3FA8-490C-8D8D-7C15B1C9E2DE@yahoo.com> <18E62746-80DB-4195-977D-4FF32D0129EE@yahoo.com> <F5953A6B-56CE-4D1C-8C18-58D44B639881@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[I caused nfsd to having things shifted in mmeory some to
see it it tracked content vs. page boundary for where the
zeros stop. Non-nfsd examples omitted.]
> . . .
>> nfsd hit an assert, failing ret =3D=3D sz_size2index_compute(size)
>=20
> [Correction: That should have referenced sz_index2size_lookup(index).]
>=20
>> (also, but a different caller of sz_size2index):
>=20
> [Correction: The "also" comment should be ignored:
> sz_index2size_lookup(index) is referenced below.]
>=20
>>=20
>> (gdb) bt
>> #0 thr_kill () at thr_kill.S:4
>> #1 0x502b2170 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52
>> #2 0x50211cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
>> #3 0x50206104 in sz_index2size_lookup (index=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200
>> #4 sz_index2size (index=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207
>> #5 ifree (tsd=3D0x50094018, ptr=3D0x50041028, tcache=3D0x50094138, =
slow_path=3D<optimized out>) at jemalloc_jemalloc.c:2583
>> #6 0x50205cac in __je_free_default (ptr=3D0x50041028) at =
jemalloc_jemalloc.c:2784
>> #7 0x50206294 in __free (ptr=3D0x50041028) at =
jemalloc_jemalloc.c:2852
>> #8 0x50287ec8 in ns_src_free (src=3D0x50329004, =
srclistsize=3D<optimized out>) at /usr/src/lib/libc/net/nsdispatch.c:452
>> #9 ns_dbt_free (dbt=3D0x50329000) at =
/usr/src/lib/libc/net/nsdispatch.c:436
>> #10 vector_free (vec=3D0x50329000, count=3D<optimized out>, esize=3D12,=
free_elem=3D<optimized out>) at /usr/src/lib/libc/net/nsdispatch.c:253
>> #11 nss_atexit () at /usr/src/lib/libc/net/nsdispatch.c:578
>> #12 0x5028d958 in __cxa_finalize (dso=3D0x0) at =
/usr/src/lib/libc/stdlib/atexit.c:240
>> #13 0x502117f8 in exit (status=3D0) at =
/usr/src/lib/libc/stdlib/exit.c:74
>> #14 0x10013f9c in child_cleanup (signo=3D<optimized out>) at =
/usr/src/usr.sbin/nfsd/nfsd.c:969
>> #15 <signal handler called>
>> #16 0x00000000 in ?? ()
>>=20
>> (gdb) up 3
>> #3 0x50206104 in sz_index2size_lookup (index=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200
>> 200 assert(ret =3D=3D sz_index2size_compute(index));
>>=20
>> (ret is optimized out.)
>>=20
>> 197 JEMALLOC_ALWAYS_INLINE size_t
>> 198 sz_index2size_lookup(szind_t index) {
>> 199 size_t ret =3D (size_t)sz_index2size_tab[index];
>> 200 assert(ret =3D=3D sz_index2size_compute(index));
>> 201 return ret;
>> 202 }
>=20
> (gdb) print/x __je_sz_index2size_tab
> $3 =3D {0x0 <repeats 104 times>}
>=20
> Also:
>=20
> (gdb) x/4x __je_arenas+16368/4
> 0x5030cab0 <__je_arenas+16368>: 0x00000000 0x00000000 =
0x00000000 0x00000000
> (gdb) print/x __je_arenas_lock =
=20=
> $8 =3D {{{prof_data =3D {tot_wait_time =3D {ns =3D 0x0}, max_wait_time =
=3D {ns =3D 0x0}, n_wait_times =3D 0x0, n_spin_acquired =3D 0x0, =
max_n_thds =3D 0x0, n_waiting_thds =3D {repr =3D 0x0}, n_owner_switches =
=3D 0x0,=20
> prev_owner =3D 0x0, n_lock_ops =3D 0x0}, lock =3D 0x0, =
postponed_next =3D 0x0, locked =3D {repr =3D 0x0}}}, witness =3D {name =3D=
0x0, rank =3D 0x0, comp =3D 0x0, opaque =3D 0x0, link =3D {qre_next =3D =
0x0,=20
> qre_prev =3D 0x0}}, lock_order =3D 0x0}
> (gdb) print/x __je_narenas_auto
> $9 =3D 0x0
> (gdb) print/x malloc_conf =20
> $10 =3D 0x0
> (gdb) print/x __je_ncpus=20
> $11 =3D 0x0
> (gdb) print/x __je_manual_arena_base
> $12 =3D 0x0
> (gdb) print/x __je_sz_pind2sz_tab =20
> $13 =3D {0x0 <repeats 72 times>}
> (gdb) print/x __je_sz_size2index_tab
> $1 =3D {0x0 <repeats 384 times>, 0x1a, 0x1b <repeats 64 times>, 0x1c =
<repeats 64 times>}
>=20
>> Booting and immediately trying something like:
>>=20
>> service nfsd stop
>>=20
>> did not lead to a failure. But may be after
>> a while it would and be less drastic than a
>> reboot or power down.
>=20
> More detail:
>=20
> So, for rpcbind and nfds at some point a large part of
> __je_sz_size2index_tab is being stomped on, as is all of
> __je_sz_index2size_tab and more.
>=20
> . . .
>=20
> For nfsd, it is similar (again showing the partially
> non-zero live process context instead of the all-zeros
> from the .core file):
>=20
> 0x5030cab0 <__je_arenas+16368>: 0x00000000 0x00000000 =
0x00000000 0x00000009
> 0x5030cac0 <__je_arenas_lock>: 0x00000000 0x00000000 =
0x00000000 0x00000000
> 0x5030cad0 <__je_arenas_lock+16>: 0x00000000 0x00000000 =
0x00000000 0x00000000
> 0x5030cae0 <__je_arenas_lock+32>: 0x00000000 0x00000000 =
0x00000000 0x00000000
> 0x5030caf0 <__je_arenas_lock+48>: 0x00000000 0x00000000 =
0x00000000 0x00000000
> 0x5030cb00 <__je_arenas_lock+64>: 0x00000000 0x502ff070 =
0x00000000 0x00000000
> 0x5030cb10 <__je_arenas_lock+80>: 0x500ebb04 0x00000003 =
0x00000000 0x00000000
> 0x5030cb20 <__je_arenas_lock+96>: 0x5030cb10 0x5030cb10 =
0x00000000 0x00000000
>=20
> Then the memory in the crash continues to be zero until:
>=20
> 0x5030d000 <__je_sz_size2index_tab+384>: 0x1a1b1b1b =
0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b
>=20
> Notice the interesting page boundary for where non-zero
> is first available again!
>=20
> Between __je_arenas_lock and __je_sz_size2index_tab are:
>=20
> 0x5030cb30 __je_narenas_auto
> 0x5030cb38 malloc_conf
> 0x5030cb3c __je_ncpus
> 0x5030cb40 __je_manual_arena_base
> 0x5030cb80 __je_sz_pind2sz_tab
> 0x5030ccc0 __je_sz_index2size_tab
> 0x5030ce80 __je_sz_size2index_tab
>=20
>=20
> Note: because __je_arenas is normally
> mostly zero for these contexts, I can
> not tell where the memory trashing
> started, only where it replaced non-zero
> values with zeros.
> . . .
I caused the memory content to have shifted some in nfsd.
The resultant zeros-stop-at from the failure look like:
(gdb) x/128x __je_sz_size2index_tab
0x5030cf00 <__je_sz_size2index_tab>: 0x00000000 0x00000000 =
0x00000000 0x00000000
0x5030cf10 <__je_sz_size2index_tab+16>: 0x00000000 0x00000000 =
0x00000000 0x00000000
0x5030cf20 <__je_sz_size2index_tab+32>: 0x00000000 0x00000000 =
0x00000000 0x00000000
0x5030cf30 <__je_sz_size2index_tab+48>: 0x00000000 0x00000000 =
0x00000000 0x00000000
0x5030cf40 <__je_sz_size2index_tab+64>: 0x00000000 0x00000000 =
0x00000000 0x00000000
0x5030cf50 <__je_sz_size2index_tab+80>: 0x00000000 0x00000000 =
0x00000000 0x00000000
0x5030cf60 <__je_sz_size2index_tab+96>: 0x00000000 0x00000000 =
0x00000000 0x00000000
0x5030cf70 <__je_sz_size2index_tab+112>: 0x00000000 =
0x00000000 0x00000000 0x00000000
0x5030cf80 <__je_sz_size2index_tab+128>: 0x00000000 =
0x00000000 0x00000000 0x00000000
0x5030cf90 <__je_sz_size2index_tab+144>: 0x00000000 =
0x00000000 0x00000000 0x00000000
0x5030cfa0 <__je_sz_size2index_tab+160>: 0x00000000 =
0x00000000 0x00000000 0x00000000
0x5030cfb0 <__je_sz_size2index_tab+176>: 0x00000000 =
0x00000000 0x00000000 0x00000000
0x5030cfc0 <__je_sz_size2index_tab+192>: 0x00000000 =
0x00000000 0x00000000 0x00000000
0x5030cfd0 <__je_sz_size2index_tab+208>: 0x00000000 =
0x00000000 0x00000000 0x00000000
0x5030cfe0 <__je_sz_size2index_tab+224>: 0x00000000 =
0x00000000 0x00000000 0x00000000
0x5030cff0 <__je_sz_size2index_tab+240>: 0x00000000 =
0x00000000 0x00000000 0x00000000
0x5030d000 <__je_sz_size2index_tab+256>: 0x18191919 =
0x19191919 0x19191919 0x19191919
0x5030d010 <__je_sz_size2index_tab+272>: 0x19191919 =
0x19191919 0x19191919 0x19191919
0x5030d020 <__je_sz_size2index_tab+288>: 0x19191919 =
0x19191919 0x19191919 0x19191919
0x5030d030 <__je_sz_size2index_tab+304>: 0x19191919 =
0x19191919 0x19191919 0x19191919
0x5030d040 <__je_sz_size2index_tab+320>: 0x191a1a1a =
0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a
0x5030d050 <__je_sz_size2index_tab+336>: 0x1a1a1a1a =
0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a
0x5030d060 <__je_sz_size2index_tab+352>: 0x1a1a1a1a =
0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a
0x5030d070 <__je_sz_size2index_tab+368>: 0x1a1a1a1a =
0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a
0x5030d080 <__je_sz_size2index_tab+384>: 0x1a1b1b1b =
0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b
0x5030d090 <__je_sz_size2index_tab+400>: 0x1b1b1b1b =
0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b
0x5030d0a0 <__je_sz_size2index_tab+416>: 0x1b1b1b1b =
0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b
0x5030d0b0 <__je_sz_size2index_tab+432>: 0x1b1b1b1b =
0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b
0x5030d0c0 <__je_sz_size2index_tab+448>: 0x1b1c1c1c =
0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c
0x5030d0d0 <__je_sz_size2index_tab+464>: 0x1c1c1c1c =
0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c
0x5030d0e0 <__je_sz_size2index_tab+480>: 0x1c1c1c1c =
0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c
0x5030d0f0 <__je_sz_size2index_tab+496>: 0x1c1c1c1c =
0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c
So, it is the page boundary that it tracks, not the detailed
placement of the memory contents.
=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D0C483E5-3F6A-4816-A6BA-3D2C82C24F8E>
