From owner-freebsd-hackers@freebsd.org Sun Apr 9 17:24:38 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 64FBAD36426 for ; Sun, 9 Apr 2017 17:24:38 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-7.reflexion.net [208.70.210.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 29E6BC8 for ; Sun, 9 Apr 2017 17:24:38 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 31129 invoked from network); 9 Apr 2017 17:24:31 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 9 Apr 2017 17:24:31 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sun, 09 Apr 2017 13:24:31 -0400 (EDT) Received: (qmail 20454 invoked from network); 9 Apr 2017 17:24:31 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 9 Apr 2017 17:24:31 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 5CF31EC8630; Sun, 9 Apr 2017 10:24:30 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them From: Mark Millard In-Reply-To: <20170409122715.GF1788@kib.kiev.ua> Date: Sun, 9 Apr 2017 10:24:29 -0700 Cc: andrew@freebsd.org, freebsd-hackers@freebsd.org, freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 17:24:38 -0000 On 2017-Apr-9, at 5:27 AM, Konstantin Belousov = wrote: > On Sat, Apr 08, 2017 at 06:02:00PM -0700, Mark Millard wrote: >> [I've identified the code path involved is the arm64 small = allocations >> turning into zeros for later fork-then-swapout-then-back-in, >> specifically the ongoing RES(ident memory) size decrease that >> "top -PCwaopid" shows before the fork/swap sequence. Hopefully >> I've also exposed enough related information for someone that >> knows what they are doing to get started with a specific >> investigation, looking for a fix. I'd like for a pine64+ >> 2GB to have buildworld complete despite the forking and >> swapping involved (yep: for a time zero RES(ident memory) for >> some processes involved in the build).] >=20 > I was not able to follow the walls of text, but do not think that > I pmap_ts_reference() is the real culprit there. >=20 > Is my impression right that the issue occurs on fork, and looks as > a memory corruption, where some page suddently becomes zero-filled ? > And swapping seems to be involved ? It is somewhat interesting to see > if the problem is reproducable on non-arm64 machines, e.g. armv7 or = amd64. Yes, yes, non-arm64 that I've tried works. But I think that the following extra detail my be of use: what top shows for RES over time is also odd on arm64 (only) and the amount of pages that are zeroed is proportional to the decrease in RES. In the test sequence: A) Allocate lots of 14 KiByte allocations and initialize the content of = each to non-zero. The example ends up with RES of about 265M. B) sleep some amount of time, I've been using well over 30 seconds here. C) fork D) sleep again (parent and child), also forcing swapping during the = sleep (I used stress, manually run.) E) Test the memory pattern in the parent and child process, passing over all the bytes, failed and good. Both the parent and the child in (E) see the first pages allocated as = zero, with the number of pages being zero increasing as the sleep time in (B) increases (as long as the sleep is over 30 sec or so). The parent and = child match for which pages are zero vs. not. It fails with (B) being a no-op as well. But the proportionality with the time for the sleep is interesting. During (B) "top -PCwaopid" shows RES decreasing, starting after 30 sec or so. The fork in (C) produces a child that does not have the same RES as the parent but instead a tiny RES (80K as I remember). During (E) the child's RES increases to full size. My powerpc64, armv7, and amd64 tests of such do not fail, nor does RES decrease during (B). The child process gets the same RES as the parent as well, unlike for arm64. In the failing context (arm64) RES in the parent decreases during (D) before the swap-out as well. > If answers to my two questions are yes, there is probably some bug = with > arm64 pmap handling of the dirty bit emulation. ARMv8.0 does not = provide > hardware dirty bit, and pmap interprets an accessed writeable page as > unconditionally dirty. More, accessed bit is also not maintained by > hardware, instead if should be set by pmap. And arm64 pmap sets the > AF bit unconditionally when creating valid pte. fork-then-swap-out/in is required to see the problem. Neither fork by itself nor swapping (zero RES as shown in top) by itself have shown the problem so far. > Hmm, could you try the following patch, I did not even compiled it. I'll try it later today. > diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c > index 3d5756ba891..55aa402eb1c 100644 > --- a/sys/arm64/arm64/pmap.c > +++ b/sys/arm64/arm64/pmap.c > @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, = vm_offset_t eva, vm_prot_t prot) > sva +=3D L3_SIZE) { > l3 =3D pmap_load(l3p); > if (pmap_l3_valid(l3)) { > + if ((l3 & ATTR_SW_MANAGED) && > + pmap_page_dirty(l3)) { > + vm_page_dirty(PHYS_TO_VM_PAGE(l3 = & > + ~ATTR_MASK)); > + } > pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); > PTE_SYNC(l3p); > /* XXX: Use pmap_invalidate_range */ =3D=3D=3D Mark Millard markmi at dsl-only.net