Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Feb 2021 00:58:34 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Matthew Macy <mmacy@freebsd.org>
Cc:        Alan Somers <asomers@freebsd.org>, FreeBSD Stable ML <stable@freebsd.org>
Subject:   Re: Page fault in _mca_init during startup
Message-ID:  <YBx8GmXvmLnwFYql@kib.kiev.ua>
In-Reply-To: <CAPrugNofKuCZmdkb41j%2Bu%2BX0BPV-cK8WjgrBu7akuD=XezseMw@mail.gmail.com>
References:  <CAOtMX2imwP3x-8LBKGFvMJ%2BjuD%2BsH_02yzs9XvMcCHY=jJs86A@mail.gmail.com> <CAPrugNofKuCZmdkb41j%2Bu%2BX0BPV-cK8WjgrBu7akuD=XezseMw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote:
> On Thu, Feb 4, 2021 at 1:31 PM Alan Somers <asomers@freebsd.org> wrote:
> >
> > After upgrading a machine to FreeBSD, 12.2, it hit the following panic on
> > its first reboot.  I suspect that a few other servers have hit this too,
> > but since it happens before swap is mounted there are no core dumps, and
> > they usually reboot immediately.  The code in question hasn't changed since
> > 2018.  The panic happened in cmci_monitor at line 930.  Does anybody have
> > any suggestions for how I could debug further?  I can't readily reproduce
> > it, and I can't dump core, but I'd like to investigate it any way I can.
> > The server in question has dual Xeon Gold 6142 CPUs.
> >
> 
> I can't actually help :( but I can add a +1  with similar hardware or
> equivalent specs. It's not frequent, but it's often enough to be
> annoying.
> -M
> 
> > if (!(ctl & MC_CTL2_CMCI_EN))
> > /* This bank does not support CMCI. */
> > return;
> >
> > cc = &cmc_state[PCPU_GET(cpuid)][i];    // <- panic here
> >
> > /* Determine maximum threshold. */
> >
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 26; apic id = 34
> > fault virtual address = 0xd0
> > fault code = supervisor read data, page not present
> > instruction pointer = 0x20:0xffffffff8125a009
> > stack pointer        = 0x28:0xfffffe0000b65f20
> > frame pointer        = 0x28:0xfffffe0000b65f50
> > code segment = base 0x0, limit 0xfffff, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = resume, IOPL = 0
> > current process = 11 (idle: cpu26)
> > trap number = 12
> > panic: page fault
> > cpuid = 26
> > time = 1
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfffffe0000b65be0
> > vpanic() at vpanic+0x17b/frame 0xfffffe0000b65c30
> > panic() at panic+0x43/frame 0xfffffe0000b65c90
> > trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000b65cf0
> > trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0000b65d40
> > trap() at trap+0x286/frame 0xfffffe0000b65e50
> > calltrap() at calltrap+0x8/frame 0xfffffe0000b65e50
> > --- trap 0xc, rip = 0xffffffff8125a009, rsp = 0xfffffe0000b65f20, rbp =
> > 0xfffffe0000b65f50 ---
> > _mca_init() at _mca_init+0x5d9/frame 0xfffffe0000b65f50
> > init_secondary_tail() at init_secondary_tail+0xfd/frame 0xfffffe0000b65f80
> > init_secondary() at init_secondary+0x2d1/frame 0xfffffe0000b65ff0
> > KDB: enter: panic
> > [ thread pid 11 tid 100029 ]
> > Stopped at      kdb_enter+0x37: movq    $0,0x12bc1f6(%rip)

Try this.

I think that there is no other dependencies in the startup order, but
cannot know it for sure.

commit 19584e3d3e9606d591fa30999b370ed758960e8c
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Fri Feb 5 00:56:09 2021 +0200

    x86: init mca before APs are started

diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d455..e2bf2673cf69 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused)
 
 	mca_init();
 }
-SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL);
+SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL);
 
 /* Called when a machine check exception fires. */
 void



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YBx8GmXvmLnwFYql>