From owner-freebsd-current@freebsd.org Wed May 13 04:52:15 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0EED32E701B for ; Wed, 13 May 2020 04:52:15 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic306-21.consmr.mail.gq1.yahoo.com (sonic306-21.consmr.mail.gq1.yahoo.com [98.137.68.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 49MMj94FRtz4V9y for ; Wed, 13 May 2020 04:52:13 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: hxTqXNoVM1krm1NdDyZZ5_3FVc.XfgWwVGVJChm5FDiLq78IW._fK0th4eou6bD C_sOD87eZ8lB9WxU5zMAC7Ib20OlNIX4OL0rUy0iDBYhLFyhSWQ_NCIwSUX9rWUloiLUw.otrXAY vur3B0W59dLxVCWw3JHHANWx9lr2RrpgnDWNWLzNwWJ3uOI1yU5r__vW7Hk1J4DER20UwUatIqFc 1OrlQgiHHn2RfY8v6EIgakn4w8lNc1ALP7MCYBikUyDeabWfXaAczmOjsKdAGFz8y0t8X3yFoFbt K1unfDk_oxmk3RR6it7hJv2lVi5vxv12sxgMYMtBLaMFAJeohreEifp.fPrnhOiLVncqMhe7uBfj O5WofNmJt3Ziw6YHx35ID7qA8sPlU3ybxFaOKetRvupxsm..eYw0fQCTZIw8TskhinvHnKoGYh65 .sCktndbEbgb7s9W74N9GxUoAq22HJfyUnRMZ1KwR1779vbgJdyi2ZC3fr1tjOIyUYFZ0Agu98kg Nqtjd9lVTW8r2YH5f3FDY9fk8eytxbot_PRGRNTIg8oEPgiOU8g8RXBBJGcKbvM06LDFYBzIp26K HmO.PukWZii2LWUMEcluMrUVIeIUzyiZsSAAAy7AE.2Jr81tEfFe9d9nByF9RkQxPjjrMh6P9Idp Ks2ypfMnemHfjvwlQCMXgd9AH6dEPt0OigpFHh.RwP.yr4dk4xKiPmrtWJEaqNGU_KzRSzfsMlqs DjpjnvcJOlJ1qmL_agoI2vEmmftvG0Nm7ba8fHO3zB25s4YptFRwIT.zxG6u.sYGLbQ6nt.f7NQG BcEIJFiODbB1i7OeYI0Ycjv4Sdai.RNTsCLpCH_xP3_KpozdQCCFyCRdIkAqvv7MC6y4uC8tTIOL FmX7mI5BTb2tYQWZmvxLDLPf2QiTaOVSxNT08ZtcfmE7vGqM16k9.yHHoEEs04Vvmexq4VyOe802 z9H5Kv3w2TJ9_O6jcqrtvaHUrVLeckceTFsAvKG1EnPxp7rqQBoi3YQ4sl4bQbzmK6oasZL8x39X o00IHDHmSIh9AytfQEEYFW_Y9f5Vj8QtHMgK.mxlsbr1rn7ROb6ODpTWlt1WfPPSdjyGHTwBi9vz ZbYMOBYmOsWYRDlvlPqldOPD9jyaFfWqVjD2CwDm819GrCPHNUoaIi5wBljd2iBgLhqkARRwZ.uN VdYQ4fzuAupTQL8vsabWhs.XDOZHVinXrkV2ZoF797d1cdTPV6KwbIYsvN0yWqU2KHTwnBozqLdF TCkggOM7q55tN12BAhS9KRI6JS.F8HQx_cYO10Lgfh2bJpLLEZLIm_.04HGUYraB.eGQwapP4jWp VBipvEN4jfkDYSBBZnVNKV2CQpS0bVNzunzTsw_35z5poEW9LEDrThIWHdAeXpHLZ05LM7DbHyhk i42GiepvWtj.IrrYNTKN5jDkTYBLQFQ-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic306.consmr.mail.gq1.yahoo.com with HTTP; Wed, 13 May 2020 04:52:10 +0000 Received: by smtp422.mail.bf1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 43ddc5f4b8c93e8a5b6a17dbda3e0423; Wed, 13 May 2020 04:52:09 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Subject: Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311 From: Mark Millard In-Reply-To: Date: Tue, 12 May 2020 21:52:06 -0700 Cc: Brandon Bergren , Justin Hibbits Content-Transfer-Encoding: quoted-printable Message-Id: <9562EEE4-62EF-4164-91C0-948CC0432984@yahoo.com> References: <8479DD58-44F6-446A-9CA5-D01F0F7C1B38@yahoo.com> <17ACDA02-D7EF-4F26-874A-BB3E935CD072@yahoo.com> <695E6836-F860-4557-B7DE-CC1EDB347F18@yahoo.com> <121B9B09-141B-4DC3-918B-1E7CFB99E779@yahoo.com> <8AAB0462-3FA8-490C-8D8D-7C15B1C9E2DE@yahoo.com> <18E62746-80DB-4195-977D-4FF32D0129EE@yahoo.com> To: "vangyzen@freebsd.org" , svn-src-head@freebsd.org, FreeBSD Current , FreeBSD Hackers , FreeBSD PowerPC ML X-Mailer: Apple Mail (2.3608.80.23.2.2) X-Rspamd-Queue-Id: 49MMj94FRtz4V9y X-Spamd-Bar: / X-Spamd-Result: default: False [-0.97 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; RCPT_COUNT_SEVEN(0.00)[7]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-0.57)[-0.570,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.00)[ip: (4.89), ipnet: 98.137.64.0/21(0.83), asn: 36647(0.66), country: US(-0.05)]; NEURAL_SPAM_MEDIUM(0.10)[0.097,0]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[84.68.137.98.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[84.68.137.98.rep.mailspike.net : 127.0.0.17]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2020 04:52:15 -0000 [Yet another new kind of experiment. But this looks like I can cause problems in fairly sort order on demand now. Finally! And with that I've much better evidence for kernel vs. user-space process for making the zeroed memory appear in, for example, nfsd.] I've managed to get: : = /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:= Failed assertion: "slab =3D=3D extent_slab_get(extent)" : = /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:= Failed assertion: "slab =3D=3D extent_slab_get(extent)" and eventually: [1] Segmentation fault (core dumped) stress -m 2 --vm-bytes 1700M from a user program (stress) while another machine was attempted an nfs mount during the stress activity: # mount -onoatime,soft ...:/ /mnt && umount /mnt && rpcinfo -s ... [tcp] ...:/: RPCPROG_MNT: RPC: Timed out (To get failure I may have to run the commands multiple times. Timing details against stress's activity seem to matter.) That failure lead to: # ls -ldT /*.core* -rw------- 1 root wheel 3899392 May 12 19:52:26 2020 /mountd.core # ls -ldT *.core* -rw------- 1 root wheel 2682880 May 12 20:00:26 2020 stress.core (Note which of nfsd, mountd, or rpcbind need not be fully repeatable. stress.core seems to be written twice, probably because of the -m 2 in use.) The context that let me do this was to first (on the 2 socket G4 with a full 2048 MiBYte RAM configuration): stress -m 2 --vm-bytes 1700M & Note that the stress command keeps the memory busy and causes paging to the swap/page space. I've not tried to make it just fit without paging or just barely paging or such. The original context did not involve paging or low RAM, so I do not expect paging to be required but can be involved. The stress program backtrace is different: 4827 return (tls_get_addr_slow(dtvp, index, offset)); 4828 } (gdb) bt -full #0 0x41831b04 in tls_get_addr_common (dtvp=3D0x4186c010, index=3D2, = offset=3D4294937444) at /usr/src/libexec/rtld-elf/rtld.c:4824 dtv =3D 0x0 #1 0x4182bfcc in __tls_get_addr (ti=3D) at = /usr/src/libexec/rtld-elf/powerpc/reloc.c:848 tp =3D p =3D #2 0x41a83464 in __get_locale () at = /usr/src/lib/libc/locale/xlocale_private.h:199 No locals. #3 fprintf (fp=3D0x41b355f8, fmt=3D0x1804cbc "%s: FAIL: [%lli] (%d) ") = at /usr/src/lib/libc/stdio/fprintf.c:57 ap =3D {{gpr =3D 2 '\002', fpr =3D 0 '\000', reserved =3D 20731, = overflow_arg_area =3D 0xffffdb78, reg_save_area =3D 0xffffdae8}} ret =3D #4 0x01802348 in main (argc=3D, argv=3D) = at stress.c:415 status =3D ret =3D 6 do_dryrun =3D 0 retval =3D 0 children =3D 1 do_backoff =3D do_hdd_bytes =3D do_hdd =3D do_vm_keep =3D 0 do_vm_hang =3D -1 do_vm_stride =3D 4096 do_vm_bytes =3D 1782579200 do_vm =3D 108174317627375616 do_io =3D do_cpu =3D do_timeout =3D 108176117243859333 starttime =3D 1589338322 i =3D forks =3D pid =3D 6140 stoptime =3D runtime =3D Apparently the asserts did not stop the code and it ran until a failure occurred (via dtv=3D0x0). Stress uses a mutex stored on a page that gets the "turns into zeros" problem, preventing the mprotect(ADDR,1,1) type of test because stress will write on the page. (I've not tried to find a minimal form of stress run.) The the same sort of globals are again zeroed, such as: (gdb) print/x __je_sz_size2index_tab $1 =3D {0x0 } Another attempt lost rpcbind instead instead of mountd: # ls -ldT /*.core -rw------- 1 root wheel 3899392 May 12 19:52:26 2020 /mountd.core -rw------- 1 root wheel 3170304 May 12 20:03:00 2020 /rpcbind.core I again find that when I use gdb 3 times to: attach ??? x/x __je_sz_size2index_tab print (int)mprotext(ADDRESS,1,1) quit for each of rpcbind, mountd, and nfsd master that those processes no longer fail during the mount/umount/rpcinfo (or are far less likely to). But it turns out that later when I "service nfsd stop" nfsd does get the zeroed Memory based assert and core dumps. (I'd done a bunch of the mount/umount/ rpcinfo sequences before the stop.) That the failure is during SIGUSR1 based shutdown, leads me to wonder if killing off some child process(es) is involved in the problem. There was *no* evidence of a signal for an attempt to write the page from the user process. It appears that the kernel is doing something that changes what the process sees --instead of the user-space programs stomping on its own memory content. I've no clue how to track down the kernel activity that changes what the process sees on some page(s) of memory. (Prior testing with a debug kernel did not report problems, despite getting an example failure. So that seems insufficient.) At least a procedure is now known that does not involved waiting hours or days. The procedure (adjusted for how much RAM is present and number of cpus/cores?) could be appropriate to run in other contexts than the 32-bit powerpc G4. Part of the context likely should be not using MALLOC_PRODUCTION --so problems would be detected sooner via the asserts in jemalloc. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)