Date: Mon, 8 Feb 2021 20:04:04 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Mark Johnston <markj@freebsd.org> Cc: Alan Somers <asomers@freebsd.org>, Matthew Macy <mmacy@freebsd.org>, FreeBSD Stable ML <stable@freebsd.org> Subject: Re: Page fault in _mca_init during startup Message-ID: <YCF9FI2OmxIprYnu@kib.kiev.ua> In-Reply-To: <YCFdXgBD5CRxaB5G@raichu> References: <YBywA/5PHEqDJ4J4@kib.kiev.ua> <CAOtMX2iXXgBuXWVBmS3oorZd7UxTgvYPPh9eTSfTNvTn8q_TSw@mail.gmail.com> <YB1ZDMGCOL%2BJ0SWE@kib.kiev.ua> <CAOtMX2g1Nz8BzRUhbeygTAniVObCTT2F0_U3se2kKOhnKJbjAQ@mail.gmail.com> <YB1%2BiUxs1ZITHaR/@kib.kiev.ua> <CAOtMX2iF7QCNvNfU2CSseH-mgNGudZ_TCVoXuoF%2BPE9sk_TB6Q@mail.gmail.com> <YCBnJB104tqztbYE@kib.kiev.ua> <YCFS39xe4bUhMEQr@raichu> <YCFZwi%2BawXj7qHvD@kib.kiev.ua> <YCFdXgBD5CRxaB5G@raichu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Feb 08, 2021 at 10:48:46AM -0500, Mark Johnston wrote: > On Mon, Feb 08, 2021 at 05:33:22PM +0200, Konstantin Belousov wrote: > > On Mon, Feb 08, 2021 at 10:03:59AM -0500, Mark Johnston wrote: > > > On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote: > > > > On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote: > > > > > Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit on all > > > > > processors. I don't have strong opinions about whether we should commit > > > > > kib's patch too. Kib, what do you think? > > > > > > > > The patch causes some memory over-use. > > > > > > > > If this issue is not too widely experienced, I prefer to not commit the patch. > > > > > > Couldn't we short-circuit cmci_monitor() if the BSP did not allocate > > > anything? > > > > > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c > > > index 03100e77d45..0619a41b128 100644 > > > --- a/sys/x86/x86/mca.c > > > +++ b/sys/x86/x86/mca.c > > > I think something should be printed in this case, at least once. > > I believe printf() already works, because spin locks do. > > Indeed, the printf() below should only fire on an AP during SI_SUB_SMP. > Access to the static flag is synchronized by mca_lock. > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c > index 03100e77d45..8098bcfb4bd 100644 > --- a/sys/x86/x86/mca.c > +++ b/sys/x86/x86/mca.c > @@ -1065,11 +1065,26 @@ mca_setup(uint64_t mcg_cap) > static void > cmci_monitor(int i) > { > + static bool first = true; > struct cmc_state *cc; > uint64_t ctl; > > KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid))); > > + /* > + * It is possible for some APs to report CMCI support even if the BSP > + * does not, apparently due to a BIOS bug. > + */ > + if (cmc_state == NULL) { > + if (first) { > + printf( I would wrote if (bootverbose) printf(). Also it might be useful to report ACPI id/APIC id as well, since the data is most likely the source for BIOS bug report. Otherwise fine with me. > + "AP %d reports CMCI support but the BSP does not\n", > + PCPU_GET(cpuid)); > + first = false; > + } > + return; > + } > + > ctl = rdmsr(MSR_MC_CTL2(i)); > if (ctl & MC_CTL2_CMCI_EN) > /* Already monitored by another CPU. */ > @@ -1114,6 +1129,10 @@ cmci_resume(int i) > > KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid))); > > + /* See cmci_monitor(). */ > + if (cmc_state == NULL) > + return; > + > /* Ignore banks not monitored by this CPU. */ > if (!(PCPU_GET(cmci_mask) & 1 << i)) > return;
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YCF9FI2OmxIprYnu>