From owner-freebsd-stable@freebsd.org Fri Feb 5 00:19:56 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B22E4534B68 for ; Fri, 5 Feb 2021 00:19:56 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 4DWwzJ3pLlz3RG2 for ; Fri, 5 Feb 2021 00:19:56 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mailman.nyi.freebsd.org (Postfix) id 8073E534EC2; Fri, 5 Feb 2021 00:19:56 +0000 (UTC) Delivered-To: stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7F0D853500F for ; Fri, 5 Feb 2021 00:19:56 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DWwzJ2wG3z3hPD; Fri, 5 Feb 2021 00:19:56 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oi1-f169.google.com with SMTP id v193so287224oie.8; Thu, 04 Feb 2021 16:19:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zCPs58WalTYkW7haZNjP+PcYbYMNed3nPPRe+7eBMak=; b=Tt0BQ7mFvuZtawnT2jT8bIkY5D5LoXbOGFzMH6kjw5LdLh1AibACXo2FXo28CsZFUy lLlG8K2Ksora6FKNwLw1XTwKZFsYsxqAZmiMCSwfmdtXJzPjemnkn3hRNy/7PouhSEXW LafP9AQ89MtMemi+gyFFxr9xbo+Rat2D8hI+la0hEzxMwxJhJe6vXProNExEYZory56V lg+3p13SdI8T+rOfKakdBzWsv+uY5JNlOZxmd8RlW+LwOQfmRR4P93iOrGpo9eb3z+jI meQjh9z9ashr90GY8fYw+r+Qo0QvHkDOfB5pY9L98IFkzBqGWMS+kNILM9CaVPveu75O gxJA== X-Gm-Message-State: AOAM530ldC2pWa1ZLuNof7dX+8Qss7Npp+Y/PQl8UGq/ICRv7OxzUFDo h9JYUZZufwT14HPeuPc7U5oRHejPxoPOXLTCfmDCZwEpoCLcjQ== X-Google-Smtp-Source: ABdhPJz114MqaI8Z/O0agu+o0E9B+jUPHvP+tEIoa7WTFjvDAsAAA8Cg283LF+9RKV4CQlOoTuiMb7X28EThOUtZ5o8= X-Received: by 2002:aca:dd08:: with SMTP id u8mr1451696oig.55.1612484395141; Thu, 04 Feb 2021 16:19:55 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Thu, 4 Feb 2021 17:19:43 -0700 Message-ID: Subject: Re: Page fault in _mca_init during startup To: Mark Johnston Cc: Konstantin Belousov , Matthew Macy , FreeBSD Stable ML X-Rspamd-Queue-Id: 4DWwzJ2wG3z3hPD X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Feb 2021 00:19:56 -0000 On Thu, Feb 4, 2021 at 4:27 PM Mark Johnston wrote: > On Fri, Feb 05, 2021 at 12:58:34AM +0200, Konstantin Belousov wrote: > > On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote: > > > On Thu, Feb 4, 2021 at 1:31 PM Alan Somers > wrote: > > > > > > > > After upgrading a machine to FreeBSD, 12.2, it hit the following > panic on > > > > its first reboot. I suspect that a few other servers have hit this > too, > > > > but since it happens before swap is mounted there are no core dumps, > and > > > > they usually reboot immediately. The code in question hasn't > changed since > > > > 2018. The panic happened in cmci_monitor at line 930. Does anybody > have > > > > any suggestions for how I could debug further? I can't readily > reproduce > > > > it, and I can't dump core, but I'd like to investigate it any way I > can. > > > > The server in question has dual Xeon Gold 6142 CPUs. > > > > > > Try this. > > > > I think that there is no other dependencies in the startup order, but > > cannot know it for sure. > > > > commit 19584e3d3e9606d591fa30999b370ed758960e8c > > Author: Konstantin Belousov > > Date: Fri Feb 5 00:56:09 2021 +0200 > > > > x86: init mca before APs are started > > APs only call mca_init() after they have been released by the BSP > though, and that happens later in SI_SUB_SMP. > > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c > > index 03100e77d455..e2bf2673cf69 100644 > > --- a/sys/x86/x86/mca.c > > +++ b/sys/x86/x86/mca.c > > @@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused) > > > > mca_init(); > > } > > -SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL); > > +SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL); > > > > /* Called when a machine check exception fires. */ > > void > kib's patch causes a different problem, and this one is reproducible: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8125762c stack pointer = 0x28:0xffffffff828dad90 frame pointer = 0x28:0xffffffff828dad90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 0 () trap number = 12 panic: page fault cpuid = 0 time = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff828daa50 vpanic() at vpanic+0x17b/frame 0xffffffff828daaa0 panic() at panic+0x43/frame 0xffffffff828dab00 trap_fatal() at trap_fatal+0x391/frame 0xffffffff828dab60 trap_pfault() at trap_pfault+0x4f/frame 0xffffffff828dabb0 trap() at trap+0x286/frame 0xffffffff828dacc0 calltrap() at calltrap+0x8/frame 0xffffffff828dacc0 --- trap 0xc, rip = 0xffffffff8125762c, rsp = 0xffffffff828dad90, rbp = 0xffffffff828dad90 --- native_lapic_enable_cmc() at native_lapic_enable_cmc+0x1c/frame 0xffffffff828dad90 _mca_init() at _mca_init+0x94c/frame 0xffffffff828dadd0 mi_startup() at mi_startup+0xdf/frame 0xffffffff828dadf0 btext() at btext+0x2c KDB: enter: panic [ thread pid 0 tid 0 ] Stopped at kdb_enter+0x37: movq $0,0x12bc396(%rip) If you're wondering, the panic happens at this point in native_lapic_enable_cmc: apic_id = PCPU_GET(apic_id); KASSERT(lapics[apic_id].la_present, ("%s: missing APIC %u", __func__, apic_id)); lapics[apic_id].la_lvts[APIC_LVT_CMCI].lvt_masked = 0; <- panic here lapics[apic_id].la_lvts[APIC_LVT_CMCI].lvt_active = 1; if (bootverbose) printf("lapic%u: CMCI unmasked\n", apic_id); }