Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Feb 2021 10:48:46 -0500
From:      Mark Johnston <markj@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Alan Somers <asomers@freebsd.org>, Matthew Macy <mmacy@freebsd.org>, FreeBSD Stable ML <stable@freebsd.org>
Subject:   Re: Page fault in _mca_init during startup
Message-ID:  <YCFdXgBD5CRxaB5G@raichu>
In-Reply-To: <YCFZwi%2BawXj7qHvD@kib.kiev.ua>
References:  <CAOtMX2hY2WFvtuG2U_4PCqL8fPTqmVPKgHkmh-A88GBz85obNw@mail.gmail.com> <YBywA/5PHEqDJ4J4@kib.kiev.ua> <CAOtMX2iXXgBuXWVBmS3oorZd7UxTgvYPPh9eTSfTNvTn8q_TSw@mail.gmail.com> <YB1ZDMGCOL%2BJ0SWE@kib.kiev.ua> <CAOtMX2g1Nz8BzRUhbeygTAniVObCTT2F0_U3se2kKOhnKJbjAQ@mail.gmail.com> <YB1%2BiUxs1ZITHaR/@kib.kiev.ua> <CAOtMX2iF7QCNvNfU2CSseH-mgNGudZ_TCVoXuoF%2BPE9sk_TB6Q@mail.gmail.com> <YCBnJB104tqztbYE@kib.kiev.ua> <YCFS39xe4bUhMEQr@raichu> <YCFZwi%2BawXj7qHvD@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Feb 08, 2021 at 05:33:22PM +0200, Konstantin Belousov wrote:
> On Mon, Feb 08, 2021 at 10:03:59AM -0500, Mark Johnston wrote:
> > On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote:
> > > On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> > > > Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit on all
> > > > processors.  I don't have strong opinions about whether we should commit
> > > > kib's patch too.  Kib, what do you think?
> > > 
> > > The patch causes some memory over-use.
> > > 
> > > If this issue is not too widely experienced, I prefer to not commit the patch.
> > 
> > Couldn't we short-circuit cmci_monitor() if the BSP did not allocate
> > anything?
> > 
> > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> > index 03100e77d45..0619a41b128 100644
> > --- a/sys/x86/x86/mca.c
> > +++ b/sys/x86/x86/mca.c

> I think something should be printed in this case, at least once.
> I believe printf() already works, because spin locks do.

Indeed, the printf() below should only fire on an AP during SI_SUB_SMP.
Access to the static flag is synchronized by mca_lock.

diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d45..8098bcfb4bd 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1065,11 +1065,26 @@ mca_setup(uint64_t mcg_cap)
 static void
 cmci_monitor(int i)
 {
+	static bool first = true;
 	struct cmc_state *cc;
 	uint64_t ctl;
 
 	KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
 
+	/*
+	 * It is possible for some APs to report CMCI support even if the BSP
+	 * does not, apparently due to a BIOS bug.
+	 */
+	if (cmc_state == NULL) {
+		if (first) {
+			printf(
+			    "AP %d reports CMCI support but the BSP does not\n",
+			    PCPU_GET(cpuid));
+			first = false;
+		}
+		return;
+	}
+
 	ctl = rdmsr(MSR_MC_CTL2(i));
 	if (ctl & MC_CTL2_CMCI_EN)
 		/* Already monitored by another CPU. */
@@ -1114,6 +1129,10 @@ cmci_resume(int i)
 
 	KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
 
+	/* See cmci_monitor(). */
+	if (cmc_state == NULL)
+		return;
+
 	/* Ignore banks not monitored by this CPU. */
 	if (!(PCPU_GET(cmci_mask) & 1 << i))
 		return;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YCFdXgBD5CRxaB5G>