From owner-freebsd-current@freebsd.org Wed May 13 08:43:29 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id E11502EEE79 for ; Wed, 13 May 2020 08:43:29 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic312-23.consmr.mail.gq1.yahoo.com (sonic312-23.consmr.mail.gq1.yahoo.com [98.137.69.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 49MSr01lCWz3HXq for ; Wed, 13 May 2020 08:43:27 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: JcuqEyYVM1n16L7Heovzfmq3h815crg5hv6fPDYHSgs3lYKXCy8a1eN2_Mi220S gJc9X_k0O.CEoYEzkCzEDyKdhOrnJ90g85NRM_h9r.V7TrP7YUa7_LisQE.o4zZFwsNHaYO1gbUe CPkdim9V713ydBMaSqY7lDePBz_6PK4aN6Dakwf1RvuxlrJkIbzhYRkfEhJ40Thq.k7mxGDYMAuC TUD0C_CccVPYBmqWUBAAFINIUylfbjOtovyxsdNy4jsNfTJR7QbdI0be0K5NVKyQs2CJ3CSHLPtd Q2VkYoRYn9YnxBl_W26auilF9PAT7u3V1.BD.RvpedUqHbUngsTDqpp6Kocv6FV1P01ehsYwMu9h McNzQLqg68Ku9ensOfg2wL3m4Tm0LOsQzsnCWd4mfRqMxmUztNzaJ7OJMqBaMju0KJbwScRkYTdo ZYtFigJWxSQgs.HXwV6PxC3Owya9UuzySnTRdqQURm60ZA0y2fRCl8vH7D0ebMcTtot5kvOineU0 gjHZocjZDLRbgj6hfQYB3ESVfFUNPLW1v0LD4ltC_u6QydxZJ.6hvq6xZMYpQeMpqoVvB22LnhLY Q1GOYNCvpY.7ZpWHA529itWO7J3bV3lMTwe7jTV7OErQTasHp5bZFD2AYhRtoqlJ4gxEL0tyzE.3 MepGbWH0.IYZMaUlXMksZnuOBJdvWywdWprjstKQrnpTaR2fBb2.FQZ5Qt1CsKi6b5E8B5aITvv4 WnZnghA9tDeYeVaC5pkC.8OJ5MDCWMnhSGcKTOi8oI7W7_iAK2daAYfyE5TNk9zaI7dVRIsku5D3 9seL1Z_92ReB1PfYtMUfzfnScbt5h2BsbTSk7FkVDbCzcThtsZBh9j37lrlpP.3XjYjY_GPD9sCC vh2dmeWa4b.OLsXVBj1uU9CqkBgcQHiHM0ZluYF.2fNABtcCuxE4Wv7nAQD5Ghwxloucdbl4636b nTK2vlvE1j3Fdg5RPLVjqMW5Vr9wPawK46HV9G.QPf1C2joZ32_z9mOjdqJ5KiQdXkc3M7TvnDh6 SX_yPgjAjLydeDx8bVty6mX3c93i5wHLcVI8uVEezaIt8IYafcepOq8v9tc5umKY2Or0Pr0tKVhc ifaE6Degc4s9L8Sb.rLOlgG7ofUk2_a6rz1sRQAyRL7AKeqRtJqLbBknf_Z1Ts1JfzRwSanh56Pz ygElbWOpjUF6PvFATIdsO21haNucI25G1nNCLXrxcnv0z3JIC1UTLRviMdWNjETbjakv_OMMnObC Ibp6JFKTbqLAMi3T0gTKfytgjV1OAzAeJmRGYfrmv8pFyNvF.wNwExAsL4kl80ogHXitYrb42M0a Yq2SiEkHQx3UTSEPVCEAtKmoS5nBcpr8G.PNFOgK.5gCtMGsrGpiG20bgDDurGxnCbOO0.4uIQEK u_IqjEFJp5StoiUGIaKG1gOXI9mldbmQ- Received: from sonic.gate.mail.ne1.yahoo.com by sonic312.consmr.mail.gq1.yahoo.com with HTTP; Wed, 13 May 2020 08:43:26 +0000 Received: by smtp429.mail.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 31a8d8875d7f20b3e1c3e41b2146484f; Wed, 13 May 2020 08:43:24 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Subject: Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311 From: Mark Millard In-Reply-To: <9B68839B-AEC8-43EE-B3B6-B696A4A57DAE@yahoo.com> Date: Wed, 13 May 2020 01:43:23 -0700 Cc: Brandon Bergren , Justin Hibbits Content-Transfer-Encoding: quoted-printable Message-Id: <359C9C7D-4106-42B5-AAB5-08EF995B8100@yahoo.com> References: <8479DD58-44F6-446A-9CA5-D01F0F7C1B38@yahoo.com> <17ACDA02-D7EF-4F26-874A-BB3E935CD072@yahoo.com> <695E6836-F860-4557-B7DE-CC1EDB347F18@yahoo.com> <121B9B09-141B-4DC3-918B-1E7CFB99E779@yahoo.com> <8AAB0462-3FA8-490C-8D8D-7C15B1C9E2DE@yahoo.com> <18E62746-80DB-4195-977D-4FF32D0129EE@yahoo.com> <9562EEE4-62EF-4164-91C0-948CC0432984@yahoo.com> <9B68839B-AEC8-43EE-B3B6-B696A4A57DAE@yahoo.com> To: "vangyzen@freebsd.org" , svn-src-head@freebsd.org, FreeBSD Current , FreeBSD Hackers , FreeBSD PowerPC ML X-Mailer: Apple Mail (2.3608.80.23.2.2) X-Rspamd-Queue-Id: 49MSr01lCWz3HXq X-Spamd-Bar: - X-Spamd-Result: default: False [-1.04 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; RCPT_COUNT_SEVEN(0.00)[7]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.18)[-0.176,0]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-0.36)[-0.364,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.00)[ip: (5.09), ipnet: 98.137.64.0/21(0.83), asn: 36647(0.66), country: US(-0.05)]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[204.69.137.98.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[204.69.137.98.rep.mailspike.net : 127.0.0.17]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2020 08:43:29 -0000 [I'm adding a reference to an old arm64/aarch64 bug that had pages turning to zero, in case this 32-bit powerpc issue is somewhat analogous.] On 2020-May-13, at 00:29, Mark Millard wrote: > [stress alone is sufficient to have jemalloc asserts fail > in stress, no need for a multi-socket G4 either. No need > to involve nfsd, mountd, rpcbind or the like. This is not > a claim that I know all the problems to be the same, just > that a jemalloc reported failure in this simpler context > happens and zeroed pages are involved.] >=20 > Reminder: head -r360311 based context. >=20 >=20 > First I show a single CPU/core PowerMac G4 context failing > in stress. (I actually did this later, but it is the > simpler context.) I simply moved the media from the > 2-socket G4 to this slower, single-cpu/core one. >=20 > cpu0: Motorola PowerPC 7400 revision 2.9, 466.42 MHz > cpu0: Features 9c000000 > cpu0: HID0 8094c0a4 > real memory =3D 1577857024 (1504 MB) > avail memory =3D 1527508992 (1456 MB) >=20 > # stress -m 1 --vm-bytes 1792M > stress: info: [1024] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd > : = /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:= Failed assertion: "slab =3D=3D extent_slab_get(extent)" > stress: FAIL: [1024] (415) <-- worker 1025 got signal 6 > stress: WARN: [1024] (417) now reaping child worker processes > stress: FAIL: [1024] (451) failed run completed in 243s >=20 > (Note: 1792 is the biggest it allowed with M.) >=20 > The following still pages in and out and fails: >=20 > # stress -m 1 --vm-bytes 1290M > stress: info: [1163] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd > : = /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:= Failed assertion: "slab =3D=3D extent_slab_get(extent)" > . . . >=20 > By contrast, the following had no problem for as > long as I let it run --and did not page in or out: >=20 > # stress -m 1 --vm-bytes 1280M > stress: info: [1181] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd >=20 >=20 >=20 >=20 > The 2 socket PowerMac G4 context with 2048 MiByte of RAM . . . >=20 > stress -m 1 --vm-bytes 1792M >=20 > did not (quickly?) fail or page. 1792 > is as large as it would allow with M. >=20 > The following also did not (quickly?) fail > (and were not paging): >=20 > stress -m 2 --vm-bytes 896M > stress -m 4 --vm-bytes 448M > stress -m 8 --vm-bytes 224M >=20 > (Only 1 example was run at a time.) >=20 > But the following all did quickly fail (and were > paging): >=20 > stress -m 8 --vm-bytes 225M > stress -m 4 --vm-bytes 449M > stress -m 2 --vm-bytes 897M >=20 > (Only 1 example was run at a time.) >=20 > I'll note that when I exited an su process > I ended up with a: >=20 > : = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200: Failed = assertion: "ret =3D=3D sz_index2size_compute(index)" > Abort trap (core dumped) >=20 > and a matching su.core file. It appears > that stress's activity leads to other > processes also seeing examples of the > zeroed-page(s) problem (probably su had > paged some or had been fully swapped > out): >=20 > (gdb) bt > #0 thr_kill () at thr_kill.S:4 > #1 0x503821d0 in __raise (s=3D6) at /usr/src/lib/libc/gen/raise.c:52 > #2 0x502e1d20 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 > #3 0x502d6144 in sz_index2size_lookup (index=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200 > #4 sz_index2size (index=3D) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207 > #5 ifree (tsd=3D0x5008b018, ptr=3D0x50041460, tcache=3D0x5008b138, = slow_path=3D) at jemalloc_jemalloc.c:2583 > #6 0x502d5cec in __je_free_default (ptr=3D0x50041460) at = jemalloc_jemalloc.c:2784 > #7 0x502d62d4 in __free (ptr=3D0x50041460) at = jemalloc_jemalloc.c:2852 > #8 0x501050cc in openpam_destroy_chain (chain=3D0x50041480) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:113 > #9 0x50105094 in openpam_destroy_chain (chain=3D0x500413c0) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 > #10 0x50105094 in openpam_destroy_chain (chain=3D0x50041320) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 > #11 0x50105094 in openpam_destroy_chain (chain=3D0x50041220) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 > #12 0x50105094 in openpam_destroy_chain (chain=3D0x50041120) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 > #13 0x50105094 in openpam_destroy_chain (chain=3D0x50041100) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 > #14 0x50105014 in openpam_clear_chains (policy=3D0x50600004) at = /usr/src/contrib/openpam/lib/libpam/openpam_load.c:130 > #15 0x50101230 in pam_end (pamh=3D0x50600000, status=3D) at /usr/src/contrib/openpam/lib/libpam/pam_end.c:83 > #16 0x1001225c in main (argc=3D, argv=3D0x0) at = /usr/src/usr.bin/su/su.c:477 >=20 > (gdb) print/x __je_sz_size2index_tab > $1 =3D {0x0 } >=20 >=20 > Notes: >=20 > Given that the original problem did not involve > paging to the swap partition, may be just making > it to the Laundry list or some such is sufficient, > something that is also involved when the swap > space is partially in use (according to top). Or > sitting in the inactive list for a long time, if > that has some special status. >=20 The following is was a fix for a "pages magically turn into zeros" problem on amd64/aarch64. The original 32-bit powerpc context did not seem a match to me --but the stress test behavior that I've just observed seems closer from an external-test point of view: swapping is involved. My be this will suggest something to someone that knows what they are doing. (Note: dsl-only.net closed down, so the E-mail address reference is no longer valid.) Author: kib Date: Mon Apr 10 15:32:26 2017 New Revision: 316679 URL:=20 https://svnweb.freebsd.org/changeset/base/316679 Log: Do not lose dirty bits for removing PROT_WRITE on arm64. Arm64 pmap interprets accessed writable ptes as modified, since ARMv8.0 does not track Dirty Bit Modifier in hardware. If writable bit is removed, page must be marked as dirty for MI VM. This change is most important for COW, where fork caused losing content of the dirty pages which were not yet scanned by pagedaemon. Reviewed by: alc, andrew Reported and tested by: Mark Millard PR: 217138, 217239 Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Modified: head/sys/arm64/arm64/pmap.c Modified: head/sys/arm64/arm64/pmap.c = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D --- head/sys/arm64/arm64/pmap.c Mon Apr 10 12:35:58 2017 = (r316678) +++ head/sys/arm64/arm64/pmap.c Mon Apr 10 15:32:26 2017 = (r316679) @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sv sva +=3D L3_SIZE) { l3 =3D pmap_load(l3p); if (pmap_l3_valid(l3)) { + if ((l3 & ATTR_SW_MANAGED) && + pmap_page_dirty(l3)) { + vm_page_dirty(PHYS_TO_VM_PAGE(l3 = & + ~ATTR_MASK)); + } pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); PTE_SYNC(l3p); /* XXX: Use pmap_invalidate_range */ =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)