From owner-freebsd-stable@freebsd.org Thu Feb 4 22:58:49 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 38BEF53240C for ; Thu, 4 Feb 2021 22:58:49 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 4DWv9j0LJHz3KRd for ; Thu, 4 Feb 2021 22:58:49 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.nyi.freebsd.org (Postfix) id 097525322AF; Thu, 4 Feb 2021 22:58:49 +0000 (UTC) Delivered-To: stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 093AA532240 for ; Thu, 4 Feb 2021 22:58:49 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4DWv9h3lZlz3KDQ; Thu, 4 Feb 2021 22:58:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 114MwYFk030871 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 5 Feb 2021 00:58:37 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 114MwYFk030871 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 114MwYlv030870; Fri, 5 Feb 2021 00:58:34 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 5 Feb 2021 00:58:34 +0200 From: Konstantin Belousov To: Matthew Macy Cc: Alan Somers , FreeBSD Stable ML Subject: Re: Page fault in _mca_init during startup Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on tom.home X-Rspamd-Queue-Id: 4DWv9h3lZlz3KDQ X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Feb 2021 22:58:49 -0000 On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote: > On Thu, Feb 4, 2021 at 1:31 PM Alan Somers wrote: > > > > After upgrading a machine to FreeBSD, 12.2, it hit the following panic on > > its first reboot. I suspect that a few other servers have hit this too, > > but since it happens before swap is mounted there are no core dumps, and > > they usually reboot immediately. The code in question hasn't changed since > > 2018. The panic happened in cmci_monitor at line 930. Does anybody have > > any suggestions for how I could debug further? I can't readily reproduce > > it, and I can't dump core, but I'd like to investigate it any way I can. > > The server in question has dual Xeon Gold 6142 CPUs. > > > > I can't actually help :( but I can add a +1 with similar hardware or > equivalent specs. It's not frequent, but it's often enough to be > annoying. > -M > > > if (!(ctl & MC_CTL2_CMCI_EN)) > > /* This bank does not support CMCI. */ > > return; > > > > cc = &cmc_state[PCPU_GET(cpuid)][i]; // <- panic here > > > > /* Determine maximum threshold. */ > > > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 26; apic id = 34 > > fault virtual address = 0xd0 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff8125a009 > > stack pointer = 0x28:0xfffffe0000b65f20 > > frame pointer = 0x28:0xfffffe0000b65f50 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = resume, IOPL = 0 > > current process = 11 (idle: cpu26) > > trap number = 12 > > panic: page fault > > cpuid = 26 > > time = 1 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > > 0xfffffe0000b65be0 > > vpanic() at vpanic+0x17b/frame 0xfffffe0000b65c30 > > panic() at panic+0x43/frame 0xfffffe0000b65c90 > > trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000b65cf0 > > trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0000b65d40 > > trap() at trap+0x286/frame 0xfffffe0000b65e50 > > calltrap() at calltrap+0x8/frame 0xfffffe0000b65e50 > > --- trap 0xc, rip = 0xffffffff8125a009, rsp = 0xfffffe0000b65f20, rbp = > > 0xfffffe0000b65f50 --- > > _mca_init() at _mca_init+0x5d9/frame 0xfffffe0000b65f50 > > init_secondary_tail() at init_secondary_tail+0xfd/frame 0xfffffe0000b65f80 > > init_secondary() at init_secondary+0x2d1/frame 0xfffffe0000b65ff0 > > KDB: enter: panic > > [ thread pid 11 tid 100029 ] > > Stopped at kdb_enter+0x37: movq $0,0x12bc1f6(%rip) Try this. I think that there is no other dependencies in the startup order, but cannot know it for sure. commit 19584e3d3e9606d591fa30999b370ed758960e8c Author: Konstantin Belousov Date: Fri Feb 5 00:56:09 2021 +0200 x86: init mca before APs are started diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c index 03100e77d455..e2bf2673cf69 100644 --- a/sys/x86/x86/mca.c +++ b/sys/x86/x86/mca.c @@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused) mca_init(); } -SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL); +SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL); /* Called when a machine check exception fires. */ void