Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Nov 2025 10:36:36 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Michal Meloun <mmel@freebsd.org>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: mmap( MAP_ANON) is broken on current. (was Still seeing Failed assertion: "p[i] == 0" on armv7 buildworld)
Message-ID:  <aSAklF9D8haCAaNU@kib.kiev.ua>
In-Reply-To: <8657a2f4-cb32-49a5-bbf6-cd5a4394c7be@FreeBSD.org>
References:  <8657a2f4-cb32-49a5-bbf6-cd5a4394c7be@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Fri, Nov 21, 2025 at 08:12:55AM +0100, Michal Meloun wrote:
> I have confirmed that jmalloc assertions are caused by mmap() failure. It
> can return non-zeroed page(s) for mmap(MAP_ANON), which is clearly a bug.
> 
> I have confirmed this on native ARMv7, and according to Mark, it is also
> reproducible on ARM32 and i386 jails. I think I saw it also on a
> memory-constrained (4 GB) aarch64, but I cannot reproduce it yet.
> 
> Have somebody idea how to identify vm faults associated with anon mmap to
> trigger detection of this failure in kernel? Or any other hint?

I think It would be much more visible if freshly allocated anonymous pages
are corrupted.  A similar mechanism to get zeroed pages is used to get
fresh page table pages, and corruption there must cause a lot of kernel
page faults with 'invalid PTE bit' hw reports.
But of course everything is possible.

VM has an optimization where we track known-to-be-zeroed free page
separately, by marking them with PG_ZERO flag. If allocation needs a
zeroed page and the flag is set, we skip calling pmap_zero_page() on it.

Also, in vm_page_free_prep() when we are told that the page is zeroed,
with DIAGNOSTIC enabled, on amd64 and arm64, we do check for that.

So lets add slow check for vm_fault code that supposedly zeroed page is
indeed zeroed.  Can you try to catch the issue with the patch applied,
and DIAGNOSTIC enabled?  Patch is arch-agnostic and I believe should
work on armv7, although obviously causing slowdown.

commit 1a9e20dc8f7faadeb839ea6a04c83a4bf2652925
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Fri Nov 21 10:34:51 2025 +0200

    vm_fault: under DIAGNOSTIC, verify that PG_ZERO page is indeed zeroed

diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index 2e150b368d71..32bec33502fb 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -85,6 +85,8 @@
 #include <sys/refcount.h>
 #include <sys/resourcevar.h>
 #include <sys/rwlock.h>
+#include <sys/sched.h>
+#include <sys/sf_buf.h>
 #include <sys/signalvar.h>
 #include <sys/sysctl.h>
 #include <sys/sysent.h>
@@ -1220,6 +1222,20 @@ vm_fault_zerofill(struct faultstate *fs)
 	if ((fs->m->flags & PG_ZERO) == 0) {
 		pmap_zero_page(fs->m);
 	} else {
+#ifdef DIAGNOSTIC
+		struct sf_buf *sf;
+		unsigned long *p;
+		int i;
+
+		sched_pin();
+		sf = sf_buf_alloc(fs->m, SFB_CPUPRIVATE);
+		p = (unsigned long *)sf_buf_kva(sf);
+		for (i = 0; i < PAGE_SIZE / sizeof(*p); i++, p++)
+			KASSERT(*p == 0, ("zerocheck failed page %p PG_ZERO %d %jx",
+			    fs->m, i, (uintmax_t)*p));
+		sf_buf_free(sf);
+		sched_unpin();
+#endif
 		VM_CNT_INC(v_ozfod);
 	}
 	VM_CNT_INC(v_zfod);



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?aSAklF9D8haCAaNU>