Date: Sun, 9 Apr 2017 11:24:59 -0700 From: Mark Millard <markmi@dsl-only.net> To: Konstantin Belousov <kostikbel@gmail.com> Cc: andrew@freebsd.org, freebsd-hackers@freebsd.org, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them Message-ID: <9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB@dsl-only.net> In-Reply-To: <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-Apr-9, at 10:24 AM, Mark Millard <markmi@dsl-only.net> wrote: > On 2017-Apr-9, at 5:27 AM, Konstantin Belousov <kostikbel@gmail.com> = wrote: >=20 >> On Sat, Apr 08, 2017 at 06:02:00PM -0700, Mark Millard wrote: >>> [I've identified the code path involved is the arm64 small = allocations >>> turning into zeros for later fork-then-swapout-then-back-in, >>> specifically the ongoing RES(ident memory) size decrease that >>> "top -PCwaopid" shows before the fork/swap sequence. Hopefully >>> I've also exposed enough related information for someone that >>> knows what they are doing to get started with a specific >>> investigation, looking for a fix. I'd like for a pine64+ >>> 2GB to have buildworld complete despite the forking and >>> swapping involved (yep: for a time zero RES(ident memory) for >>> some processes involved in the build).] >>=20 >> I was not able to follow the walls of text, but do not think that >> I pmap_ts_reference() is the real culprit there. >>=20 >> Is my impression right that the issue occurs on fork, and looks as >> a memory corruption, where some page suddently becomes zero-filled ? >> And swapping seems to be involved ? It is somewhat interesting to = see >> if the problem is reproducable on non-arm64 machines, e.g. armv7 or = amd64. >=20 > Yes, yes, non-arm64 that I've tried works. >=20 > But I think that the following extra detail my be of use: what top > shows for RES over time is also odd on arm64 (only) and the amount > of pages that are zeroed is proportional to the decrease in RES. >=20 > In the test sequence: >=20 > A) Allocate lots of 14 KiByte allocations and initialize the content = of each > to non-zero. The example ends up with RES of about 265M. I did forget to list one important property: why I picked 14 KiBytes. A) Any allocation sizes <=3D 14 KiBytes that I've tried gets the zero's problem in my arm64 contexts (bpim3 and rip3). B) Any allocation size >=3D 14 KiBYtes + 1 Byte that I've tried works in those contexts. For the arm64 contexts that I use this happens to match with the jemalloc SMALL_MAXCLASS size boundary. When I looked it appeared that 14 Ki was the smallest SMALL_MAXCLASS value in jemalloc so it would always fit the category. > B) sleep some amount of time, I've been using well over 30 seconds = here. >=20 > C) fork >=20 > D) sleep again (parent and child), also forcing swapping during the = sleep > (I used stress, manually run.) >=20 > E) Test the memory pattern in the parent and child process, passing = over > all the bytes, failed and good. >=20 > Both the parent and the child in (E) see the first pages allocated as = zero, > with the number of pages being zero increasing as the sleep time in = (B) > increases (as long as the sleep is over 30 sec or so). The parent and = child > match for which pages are zero vs. not. >=20 > It fails with (B) being a no-op as well. But the proportionality with > the time for the sleep is interesting. >=20 > During (B) "top -PCwaopid" shows RES decreasing, starting after 30 sec > or so. The fork in (C) produces a child that does not have the same = RES > as the parent but instead a tiny RES (80K as I remember). During (E) > the child's RES increases to full size. >=20 > My powerpc64, armv7, and amd64 tests of such do not fail, nor does RES > decrease during (B). The child process gets the same RES as the parent > as well, unlike for arm64. >=20 > In the failing context (arm64) RES in the parent decreases during (D) > before the swap-out as well. >=20 >> If answers to my two questions are yes, there is probably some bug = with >> arm64 pmap handling of the dirty bit emulation. ARMv8.0 does not = provide >> hardware dirty bit, and pmap interprets an accessed writeable page as >> unconditionally dirty. More, accessed bit is also not maintained by >> hardware, instead if should be set by pmap. And arm64 pmap sets the >> AF bit unconditionally when creating valid pte. >=20 > fork-then-swap-out/in is required to see the problem. Neither fork > by itself nor swapping (zero RES as shown in top) by itself have > shown the problem so far. >=20 >> Hmm, could you try the following patch, I did not even compiled it. >=20 > I'll try it later today. >=20 >> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c >> index 3d5756ba891..55aa402eb1c 100644 >> --- a/sys/arm64/arm64/pmap.c >> +++ b/sys/arm64/arm64/pmap.c >> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, = vm_offset_t eva, vm_prot_t prot) >> sva +=3D L3_SIZE) { >> l3 =3D pmap_load(l3p); >> if (pmap_l3_valid(l3)) { >> + if ((l3 & ATTR_SW_MANAGED) && >> + pmap_page_dirty(l3)) { >> + vm_page_dirty(PHYS_TO_VM_PAGE(l3 = & >> + ~ATTR_MASK)); >> + } >> pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); >> PTE_SYNC(l3p); >> /* XXX: Use pmap_invalidate_range */ =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB>