Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Mar 2014 21:11:38 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Henrik Gulbrandsen <henrik@gulbra.net>
Cc:        Craig Rodrigues <rodrigc@freebsd.org>, Alan Cox <alc@freebsd.org>, pho@freebsd.org, bug-followup@freebsd.org, freebsd-java@freebsd.org
Subject:   Re: kern/187238: vm.pmap.pcid_enabled="1" causes Java to coredump in FBSD 10
Message-ID:  <20140324191138.GZ21331@kib.kiev.ua>
In-Reply-To: <831cb7b9f4719265e66a26edcf6c0859@www.gulbra.net>
References:  <831cb7b9f4719265e66a26edcf6c0859@www.gulbra.net>

next in thread | previous in thread | raw e-mail | index | archive | help

--xd76R9tHJzDpbIo0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Mar 23, 2014 at 01:03:00PM +0100, Henrik Gulbrandsen wrote:
> This is the most time-consuming bug I've encountered in my life, and not
> only because I started looking for it in the JVM, but now it seems to=20
> have
> been hiding in plain sight. I'm pretty sure that pmap->pm_save is=20
> handled
> incorrectly in the current kernel. Judging from the code, it's supposed=
=20
> to
> include all CPUs where the pmap has been active since the latest call to
> pmap_invalidate_all(...). However, that means that it should always be a
> superset of pmap->pm_active, since any CPU where the pmap is active may
> cache pmap information at any time. Currently, this is not the case, and
> since only CPUs in pmap->pm_save are targeted in the TLB shootdown, we
> are left with inconsistencies that crash the process soon afterwards.
>=20
> The attached patch solves this by only clearing a CPU from pmap->pm_save
> if it is not currently included in pmap->pm_active. As far as I can=20
> tell,
> that eliminates the bug. The patch is against STABLE, since that's what
> I'm currently running, but CURRENT should be pretty close, except for=20
> the
> default setting of pmap_pcid_enabled.
Yes, I think that the analysis and the patch (for stable/10) is correct.
Thank you for tracking this down.

>=20
> By the way, the logic in the invalidation functions is a bit messy now
> and can probably be simplified. Also, is there a good reason for=20
> ignoring
> the pmap argument in smp_masked_invltlb(...)?
I think this was an ommission.

I adopted the patch to HEAD, and apparantly the rewrite of the page
invalidation IPI handlers in C introduced a regression, so it took
some more time to find the issue. Apparently, the invlrng_handler()
did the checks for special cases in the wrong order, doing the
invpcid(INVPCID_ADDR) before checking for special PCIDs.

Show me the first lines of the verbose dmesg for your machine.

Below is the cumulative patch for HEAD.  If somebody can test this
with jdk build on HEAD, it would be useful.
For testing, the tunable vm.pmap.pcid_enabled must be set to 1
=66rom the loader prompt.

diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c
index 80b4e12..afc1bef 100644
--- a/sys/amd64/amd64/mp_machdep.c
+++ b/sys/amd64/amd64/mp_machdep.c
@@ -1257,7 +1257,7 @@ smp_masked_invltlb(cpuset_t mask, pmap_t pmap)
 {
=20
 	if (smp_started) {
-		smp_targeted_tlb_shootdown(mask, IPI_INVLTLB, NULL, 0, 0);
+		smp_targeted_tlb_shootdown(mask, IPI_INVLTLB, pmap, 0, 0);
 #ifdef COUNT_XINVLTLB_HITS
 		ipi_masked_global++;
 #endif
@@ -1517,6 +1517,7 @@ void
 invltlb_pcid_handler(void)
 {
 	uint64_t cr3;
+	u_int cpuid;
 #ifdef COUNT_XINVLTLB_HITS
 	xhits_gbl[PCPU_GET(cpuid)]++;
 #endif /* COUNT_XINVLTLB_HITS */
@@ -1524,14 +1525,13 @@ invltlb_pcid_handler(void)
 	(*ipi_invltlb_counts[PCPU_GET(cpuid)])++;
 #endif /* COUNT_IPIS */
=20
-	cr3 =3D rcr3();
 	if (smp_tlb_invpcid.pcid !=3D (uint64_t)-1 &&
 	    smp_tlb_invpcid.pcid !=3D 0) {
-
 		if (invpcid_works) {
 			invpcid(&smp_tlb_invpcid, INVPCID_CTX);
 		} else {
 			/* Otherwise reload %cr3 twice. */
+			cr3 =3D rcr3();
 			if (cr3 !=3D pcid_cr3) {
 				load_cr3(pcid_cr3);
 				cr3 |=3D CR3_PCID_SAVE;
@@ -1541,8 +1541,11 @@ invltlb_pcid_handler(void)
 	} else {
 		invltlb_globpcid();
 	}
-	if (smp_tlb_pmap !=3D NULL)
-		CPU_CLR_ATOMIC(PCPU_GET(cpuid), &smp_tlb_pmap->pm_save);
+	if (smp_tlb_pmap !=3D NULL) {
+		cpuid =3D PCPU_GET(cpuid);
+		if (!CPU_ISSET(cpuid, &smp_tlb_pmap->pm_active))
+			CPU_CLR_ATOMIC(cpuid, &smp_tlb_pmap->pm_save);
+	}
=20
 	atomic_add_int(&smp_tlb_wait, 1);
 }
@@ -1608,7 +1611,10 @@ invlpg_range(vm_offset_t start, vm_offset_t end)
 void
 invlrng_handler(void)
 {
+	struct invpcid_descr d;
 	vm_offset_t addr;
+	uint64_t cr3;
+	u_int cpuid;
 #ifdef COUNT_XINVLTLB_HITS
 	xhits_rng[PCPU_GET(cpuid)]++;
 #endif /* COUNT_XINVLTLB_HITS */
@@ -1618,15 +1624,7 @@ invlrng_handler(void)
=20
 	addr =3D smp_tlb_invpcid.addr;
 	if (pmap_pcid_enabled) {
-		if (invpcid_works) {
-			struct invpcid_descr d;
-
-			d =3D smp_tlb_invpcid;
-			do {
-				invpcid(&d, INVPCID_ADDR);
-				d.addr +=3D PAGE_SIZE;
-			} while (d.addr < smp_tlb_addr2);
-		} else if (smp_tlb_invpcid.pcid =3D=3D 0) {
+		if (smp_tlb_invpcid.pcid =3D=3D 0) {
 			/*
 			 * kernel pmap - use invlpg to invalidate
 			 * global mapping.
@@ -1635,12 +1633,18 @@ invlrng_handler(void)
 		} else if (smp_tlb_invpcid.pcid =3D=3D (uint64_t)-1) {
 			invltlb_globpcid();
 			if (smp_tlb_pmap !=3D NULL) {
-				CPU_CLR_ATOMIC(PCPU_GET(cpuid),
-				    &smp_tlb_pmap->pm_save);
+				cpuid =3D PCPU_GET(cpuid);
+				if (!CPU_ISSET(cpuid, &smp_tlb_pmap->pm_active))
+					CPU_CLR_ATOMIC(cpuid,
+					    &smp_tlb_pmap->pm_save);
 			}
+		} else if (invpcid_works) {
+			d =3D smp_tlb_invpcid;
+			do {
+				invpcid(&d, INVPCID_ADDR);
+				d.addr +=3D PAGE_SIZE;
+			} while (d.addr <=3D smp_tlb_addr2);
 		} else {
-			uint64_t cr3;
-
 			cr3 =3D rcr3();
 			if (cr3 !=3D pcid_cr3)
 				load_cr3(pcid_cr3 | CR3_PCID_SAVE);
diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index 51ac9be..183a456 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -838,7 +838,7 @@ pmap_bootstrap(vm_paddr_t *firstaddr)
 	kernel_pmap->pm_pml4 =3D (pdp_entry_t *)PHYS_TO_DMAP(KPML4phys);
 	kernel_pmap->pm_cr3 =3D KPML4phys;
 	CPU_FILL(&kernel_pmap->pm_active);	/* don't allow deactivation */
-	CPU_ZERO(&kernel_pmap->pm_save);
+	CPU_FILL(&kernel_pmap->pm_save);	/* always superset of pm_active */
 	TAILQ_INIT(&kernel_pmap->pm_pvchunk);
 	kernel_pmap->pm_flags =3D pmap_flags;
=20
@@ -1494,7 +1494,8 @@ pmap_invalidate_all(pmap_t pmap)
 		} else {
 			invltlb_globpcid();
 		}
-		CPU_CLR_ATOMIC(cpuid, &pmap->pm_save);
+		if (!CPU_ISSET(cpuid, &pmap->pm_active))
+			CPU_CLR_ATOMIC(cpuid, &pmap->pm_save);
 		smp_invltlb(pmap);
 	} else {
 		other_cpus =3D all_cpus;
@@ -1528,7 +1529,8 @@ pmap_invalidate_all(pmap_t pmap)
 			}
 		} else if (CPU_ISSET(cpuid, &pmap->pm_active))
 			invltlb();
-		CPU_CLR_ATOMIC(cpuid, &pmap->pm_save);
+		if (!CPU_ISSET(cpuid, &pmap->pm_active))
+			CPU_CLR_ATOMIC(cpuid, &pmap->pm_save);
 		if (pmap_pcid_enabled)
 			CPU_AND(&other_cpus, &pmap->pm_save);
 		else

--xd76R9tHJzDpbIo0
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJTMINpAAoJEJDCuSvBvK1BR4cP/iyn79yCDDt9ScWIqspK45it
3ajXSX3gHoQUa2KbLMwduKR3K+CLv7yZB1cXeTcpo682FT8q4PyNDlWQAZ7/n7VT
q6TUuZFBacka5L0/P/BcJBQEe1ymv4J0s+BsEbpinpyj51dIHpVEMdeHuDsQKqEo
tghStwTXyXTVW2t5WJd0+boXFI2CkM9BrYAb3Bm+FfHvogimAI9ObLHdU3cs1CkF
l50TR6+gjrmkDX7Ee8ao9rCYxJGt1B7dlFx3mWilh5zI7qFeQbrY7CCKziq4pH7t
DwROdyYZTT+PIKg899suA8MUWOdrCXcK50TIABDZ4p79DifuJZqVA3ICAvmtcMnS
QgBmf77FXLp3LUELBCQiR5uReo3bQhGRPA2h1HNC1gVN0d8nG+3B3uBKPtvXcmFj
Thl6iMNfNirNrAg1qSAQHTmV4j0h2QSOSLfcTjTx9MM/mNUe22qylvzFo7HPxIdd
1dTlmurZGUmFAcRqEIV1GmX7wl708i3TRqRA27TsrOrpnFDroaP8pnFyjtV3+h1C
UKmneC6A4f/51QuxpxBp/oBlBX/haLbTm2H/zMSpw51ijCBAzEEJpK79rGI7ncYP
DnXkGgdTbYGWt71EvVDqiw3icFcckKNUAPu4P6hyX/uRCYcBDfLzERDf2TkpMX9+
P1Zqw8P6WT5flRIK/C7G
=1O0g
-----END PGP SIGNATURE-----

--xd76R9tHJzDpbIo0--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140324191138.GZ21331>