Date: Sun, 10 May 2020 19:53:19 -0700 From: Mark Millard <marklmi@yahoo.com> To: "vangyzen@freebsd.org" <vangyzen@FreeBSD.org>, svn-src-head@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Cc: Brandon Bergren <bdragon@FreeBSD.org>, Justin Hibbits <chmeeedalf@gmail.com> Subject: Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311 Message-ID: <C440956F-139E-4EF7-A68E-FE35D9934BD3@yahoo.com> In-Reply-To: <D0C483E5-3F6A-4816-A6BA-3D2C82C24F8E@yahoo.com> References: <C24EE1A1-FAED-42C2-8204-CA7B1D20A369@yahoo.com> <8479DD58-44F6-446A-9CA5-D01F0F7C1B38@yahoo.com> <17ACDA02-D7EF-4F26-874A-BB3E935CD072@yahoo.com> <695E6836-F860-4557-B7DE-CC1EDB347F18@yahoo.com> <DCABCD83-27B0-4F2D-9410-69102294A98E@yahoo.com> <121B9B09-141B-4DC3-918B-1E7CFB99E779@yahoo.com> <8AAB0462-3FA8-490C-8D8D-7C15B1C9E2DE@yahoo.com> <18E62746-80DB-4195-977D-4FF32D0129EE@yahoo.com> <F5953A6B-56CE-4D1C-8C18-58D44B639881@yahoo.com> <D0C483E5-3F6A-4816-A6BA-3D2C82C24F8E@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[A new kind of experiment and partial results.] Given the zero'ed memory page(s) that for some of the example contexts include a page that should not be changing after initialization in my context (jemalloc global variables), I have attempted the following for such examples: A) Run gdb B) Attach to one of the live example processes. C) Check that the page is not zeroed yet. (print/x __je_sz_size2index_tab) D) protect the page containing the start of __je_sz_size2index_tab, using 0x1 as the PROT_READ mask. (print (int)mprotect(ADDRESS,1,0x1)) E) detach. The hope was to discover which of the following was involved: A) user-space code trying to write the page should get a SIGSEGV. In this case I'd likely be able to see what code was attempting the write. B) kernel-code doing something odd to the content or mapping of memory would not (or need not) lead to SIGSEGV. In this case I'd be unlikely to see what code lead to the zeros on the page. So far I've gotten only one failure example, nfsd during its handling of a SIGUSR1. Previous nfs mounts and dismounts worked fine, not asserting, indicating that at the time the page was not zeroed. I got no evidence of SIGSEGV from an attempted user space write to the page. But the nfsd.core shows the page as zeroed and the assert having caused abort(). That suggests the kernel side of things for what leads to the zeros. It turns out that just before the "unregsiteration()" activity is "killchildren()" activity: (gdb) list 971 972 static void 973 nfsd_exit(int status) 974 { 975 killchildren(); 976 unregistration(); 977 exit(status); 978 } (frame #12) used via: (gdb) list cleanup 954 /* 955 * Cleanup master after SIGUSR1. 956 */ 957 static void 958 cleanup(__unused int signo) 959 { 960 nfsd_exit(0); 961 } . . . and (for master): (void)signal(SIGUSR1, cleanup); This suggests the possibility that the zero'd pages could be associated with killing the child processes. (I've had a past aarch64 context where forking had problems with pages that were initially common to parent and child processes. In that context having the processes swap out [not just mostly paged out] and then swap back in was involved in showing the problem. The issue was fixed and was aarch64 specific. But it leaves me willing to consider fork-related memory management as possibly odd in some way for 32-bit powerpc.) Notes . . . Another possible kind of evidence: I've gone far longer with the machine doing just normal background processing with nothing failing on its own. This suggests that the (int)mprotect(ADDRESS,1,0x1) might be changing the context --or just doing the attach and detach in gdb does. I've nothing solid in this area so I'll ignore it, other than this note. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C440956F-139E-4EF7-A68E-FE35D9934BD3>