From owner-freebsd-current@freebsd.org Wed May 13 07:29:15 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7C9DD2EC835 for ; Wed, 13 May 2020 07:29:15 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic306-19.consmr.mail.gq1.yahoo.com (sonic306-19.consmr.mail.gq1.yahoo.com [98.137.68.82]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 49MRBL1pnrz4fR9 for ; Wed, 13 May 2020 07:29:13 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: 8E92OWQVM1kjx.V0psqp37JR0A6OiadqLBqZxQTMpzvpvQd1SkZB.qC7qV8g7ol 8gaY51lNKSWHU4BNFKBk3YJb3lfPFUbLWRZizXuHFUqmTspoEWdZvTJU2eRbEWyfSgNYzCh2iBTF DTfgsd6xX8KaxR8yel7fDrqmtXOZGSSIHWKpxzkh2YkzQutzyxBoqAcnGL7IzB_hsVmhZxgb1169 bATqjm8K0dz636t3ZfeodD4qSzT34FaSQuPO1N_e3CMNLby9XF5.8spuXMqMkhFHtuwhzTqkgSjJ QKzBUqswkTIRElp1gRjwRZf4p.aa_ycWsbVKHCEI5rw7qOOce4pUuPBUd6UMF.N6WV6jForvheAY LRYzxp_88bf_2LKZr9I_NXZpewnBWkj4pcpdZOV0tuliy8RlS1vm4V5hZrU724sq9u4c1Gs59pKS G313r_3uCAnRVhKCH6TsGOx6MVovfSySqDj3AcBRn1f.oKotJggpaLJQ6I1RO2yRBkXoW0B4VVoN Y38XsFDWo3uB8qSXg5Ml10dTQuP9s_EGw5vXC_LoxloopBgnPo2F1IAT0VkyeZoJV6IISecLX1QF tWkiImhP4H0SQeQ0HzeaWbOhuplX8QwBy915kkdNKo_wDaT9KNmO9ac1vrm2qhZLGYqj5w3M2tsY KdolnuE_iVdIvqFkpl2mC6lezmTAkKR0TvVneUChgYdes9UHz0v3D9MDfJ2dYt5OVFdoXcDk0_M1 6QwavDlXXvIY5SmYBlCgn1NImFsG6poFajjyrIrK6N4ocBolHCcfirzT5hk.Sc2j6sd8DlyAfLOR qjTMvD0Bdvb02QeA7S6zW8UgGsF.jz8i5Z9xqzqDOC9FS0CCt73Ui1eiN.zmruUTo7rer_WDWtNI 73wtd339a66u4H9tjkagzG3laydaw72wyz9950mW9pOABNj3jq2ohaR2mliHtgemrUnCPsRPEE6. 9wTebdW224Taq7ObWJ55Se1AsgSqH2IAwqDSp3bCwJ2D3vS7nV.C9699ilm9jXQ3YeZTnr5e898. VZ4s_2KQ1XRFcsoZg9r6pRDjprtuxCLt1UbB_T.aKzIla7EPZwVGoZ7S8ZRfv9PxEulXh096qcD4 wgPeDCi_3rTxfPR8AxoTVe68nQmIlk3CcqagKTIt8L8eUwHn5fZ.8a0Y1Yxa4gyLSI51bCNxpicQ Uj3gF1LYfB.O3eqaG2bQOBNFDWAIPyNtI3WIjtiR4VmCoTm9H63kyCNc0LDhPU0uGjwF1Mq8aOQ. Uu0EgnIRqcyb9mb4n3R2Ggr7pPBa6KT9.cUwHwmkrffdDp5a.LTsPc6GieVm5B5QhN3ibcFgZM9h BE3mvXu0UUiRyzsXS6ytAADOIvvS8gtf9ExIw_jXWeSCbeCHrS.upSmanMQkoyioF5fy5IEKMnHw mLDtAYnez5Ll3Bn0QhdwMJMXiYkC2Xr2Ln7It8l7fXHW_Ll61rrrPHaP6jP514H_FPw-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic306.consmr.mail.gq1.yahoo.com with HTTP; Wed, 13 May 2020 07:29:12 +0000 Received: by smtp412.mail.ne1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 8bcb3d014077aa2d68dc334f30fdedbe; Wed, 13 May 2020 07:29:08 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Subject: Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311 From: Mark Millard In-Reply-To: <9562EEE4-62EF-4164-91C0-948CC0432984@yahoo.com> Date: Wed, 13 May 2020 00:29:07 -0700 Cc: Brandon Bergren , Justin Hibbits Content-Transfer-Encoding: quoted-printable Message-Id: <9B68839B-AEC8-43EE-B3B6-B696A4A57DAE@yahoo.com> References: <8479DD58-44F6-446A-9CA5-D01F0F7C1B38@yahoo.com> <17ACDA02-D7EF-4F26-874A-BB3E935CD072@yahoo.com> <695E6836-F860-4557-B7DE-CC1EDB347F18@yahoo.com> <121B9B09-141B-4DC3-918B-1E7CFB99E779@yahoo.com> <8AAB0462-3FA8-490C-8D8D-7C15B1C9E2DE@yahoo.com> <18E62746-80DB-4195-977D-4FF32D0129EE@yahoo.com> <9562EEE4-62EF-4164-91C0-948CC0432984@yahoo.com> To: "vangyzen@freebsd.org" , svn-src-head@freebsd.org, FreeBSD Current , FreeBSD Hackers , FreeBSD PowerPC ML X-Mailer: Apple Mail (2.3608.80.23.2.2) X-Rspamd-Queue-Id: 49MRBL1pnrz4fR9 X-Spamd-Bar: - X-Spamd-Result: default: False [-1.10 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; RCPT_COUNT_SEVEN(0.00)[7]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.21)[-0.208,0]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-0.40)[-0.395,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.00)[ip: (5.37), ipnet: 98.137.64.0/21(0.83), asn: 36647(0.66), country: US(-0.05)]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[82.68.137.98.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[82.68.137.98.rep.mailspike.net : 127.0.0.17]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2020 07:29:15 -0000 [stress alone is sufficient to have jemalloc asserts fail in stress, no need for a multi-socket G4 either. No need to involve nfsd, mountd, rpcbind or the like. This is not a claim that I know all the problems to be the same, just that a jemalloc reported failure in this simpler context happens and zeroed pages are involved.] Reminder: head -r360311 based context. First I show a single CPU/core PowerMac G4 context failing in stress. (I actually did this later, but it is the simpler context.) I simply moved the media from the 2-socket G4 to this slower, single-cpu/core one. cpu0: Motorola PowerPC 7400 revision 2.9, 466.42 MHz cpu0: Features 9c000000 cpu0: HID0 8094c0a4 real memory =3D 1577857024 (1504 MB) avail memory =3D 1527508992 (1456 MB) # stress -m 1 --vm-bytes 1792M stress: info: [1024] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd : = /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:= Failed assertion: "slab =3D=3D extent_slab_get(extent)" stress: FAIL: [1024] (415) <-- worker 1025 got signal 6 stress: WARN: [1024] (417) now reaping child worker processes stress: FAIL: [1024] (451) failed run completed in 243s (Note: 1792 is the biggest it allowed with M.) The following still pages in and out and fails: # stress -m 1 --vm-bytes 1290M stress: info: [1163] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd : = /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:= Failed assertion: "slab =3D=3D extent_slab_get(extent)" . . . By contrast, the following had no problem for as long as I let it run --and did not page in or out: # stress -m 1 --vm-bytes 1280M stress: info: [1181] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd The 2 socket PowerMac G4 context with 2048 MiByte of RAM . . . stress -m 1 --vm-bytes 1792M did not (quickly?) fail or page. 1792 is as large as it would allow with M. The following also did not (quickly?) fail (and were not paging): stress -m 2 --vm-bytes 896M stress -m 4 --vm-bytes 448M stress -m 8 --vm-bytes 224M (Only 1 example was run at a time.) But the following all did quickly fail (and were paging): stress -m 8 --vm-bytes 225M stress -m 4 --vm-bytes 449M stress -m 2 --vm-bytes 897M (Only 1 example was run at a time.) I'll note that when I exited an su process I ended up with a: : = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200: Failed = assertion: "ret =3D=3D sz_index2size_compute(index)" Abort trap (core dumped) and a matching su.core file. It appears that stress's activity leads to other processes also seeing examples of the zeroed-page(s) problem (probably su had paged some or had been fully swapped out): (gdb) bt #0 thr_kill () at thr_kill.S:4 #1 0x503821d0 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52 #2 0x502e1d20 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 #3 0x502d6144 in sz_index2size_lookup (index=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200 #4 sz_index2size (index=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207 #5 ifree (tsd=3D0x5008b018, ptr=3D0x50041460, tcache=3D0x5008b138, = slow_path=3D) at jemalloc_jemalloc.c:2583 #6 0x502d5cec in __je_free_default (ptr=3D0x50041460) at = jemalloc_jemalloc.c:2784 #7 0x502d62d4 in __free (ptr=3D0x50041460) at jemalloc_jemalloc.c:2852 #8 0x501050cc in openpam_destroy_chain (chain=3D0x50041480) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:113 #9 0x50105094 in openpam_destroy_chain (chain=3D0x500413c0) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 #10 0x50105094 in openpam_destroy_chain (chain=3D0x50041320) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 #11 0x50105094 in openpam_destroy_chain (chain=3D0x50041220) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 #12 0x50105094 in openpam_destroy_chain (chain=3D0x50041120) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 #13 0x50105094 in openpam_destroy_chain (chain=3D0x50041100) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 #14 0x50105014 in openpam_clear_chains (policy=3D0x50600004) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:130 #15 0x50101230 in pam_end (pamh=3D0x50600000, status=3D) = at /usr/src/contrib/openpam/lib/libpam/pam_end.c:83 #16 0x1001225c in main (argc=3D, argv=3D0x0) at = /usr/src/usr.bin/su/su.c:477 (gdb) print/x __je_sz_size2index_tab $1 =3D {0x0 } Notes: Given that the original problem did not involve paging to the swap partition, may be just making it to the Laundry list or some such is sufficient, something that is also involved when the swap space is partially in use (according to top). Or sitting in the inactive list for a long time, if that has some special status. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)