From owner-freebsd-stable@freebsd.org Fri Feb 5 02:53:21 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id E97B653A6F8 for ; Fri, 5 Feb 2021 02:53:21 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mailman.nyi.freebsd.org (mailman.nyi.freebsd.org [IPv6:2610:1c1:1:606c::50:13]) by mx1.freebsd.org (Postfix) with ESMTP id 4DX0NK5RJZz3t6V for ; Fri, 5 Feb 2021 02:53:21 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mailman.nyi.freebsd.org (Postfix) id BA88D53A248; Fri, 5 Feb 2021 02:53:21 +0000 (UTC) Delivered-To: stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id BA51253A4B3 for ; Fri, 5 Feb 2021 02:53:21 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oi1-f176.google.com (mail-oi1-f176.google.com [209.85.167.176]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DX0NK4qPzz3t1h; Fri, 5 Feb 2021 02:53:21 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oi1-f176.google.com with SMTP id k142so6045966oib.7; Thu, 04 Feb 2021 18:53:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=sI1Uk0oe3vONQQgbDvRuvGXBYIzBaWrz8rTlYoAOg6k=; b=nc6N+zGBZxD1JWlNmLoMKQQeKqGjL0UOsjFkq4AiC+OgLY0TzWzfUbrtcAFPz6ZkiK i4QtzpWKAPGJtjqPArpO5BwTIref//q0WCDb2ljhgSNT8Ibmf1ioi+5+rY+dm408AVrW glI1+TLBuDGJlKyMb7SaTovLgxO0zWfYIYDsU7IzfVhwHVFEW7HGmm/OMP3UdgoYVaev jyN3nppO1NwHJD7c9NM9hG/q0szAsC6h5ZFMYti6BCSPifB9DJRVJfH8zcYiJTpyttfh LzCyHZ/BfrnaHT8/816suSVQ2cXY29r9UVQT2Hj0W3WYFnr35vpU0PdpJ9k6GwCiOb+2 7h9A== X-Gm-Message-State: AOAM532Z7iAc6pVo4785fbQWF7MIbuUAkR4gbsWzoY4INBmDN3AWBX40 yrO9AxQzkwNmHlmJDnmZpzsWgHH2z2cSvuMdDuA= X-Google-Smtp-Source: ABdhPJwAF1CHcR1AG3xlo1HW/P3d4I2tix8W0c7HHssFdNR0NWZzT99T9YXTadxF+LxMzRNuOf2M4JwlHTY2vnx20oo= X-Received: by 2002:a54:4813:: with SMTP id j19mr1775503oij.73.1612493600456; Thu, 04 Feb 2021 18:53:20 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Thu, 4 Feb 2021 19:53:09 -0700 Message-ID: Subject: Re: Page fault in _mca_init during startup To: Konstantin Belousov Cc: Mark Johnston , Matthew Macy , FreeBSD Stable ML X-Rspamd-Queue-Id: 4DX0NK4qPzz3t1h X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Feb 2021 02:53:22 -0000 On Thu, Feb 4, 2021 at 7:40 PM Konstantin Belousov wrote: > On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote: > > On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov > > wrote: > > > Do you have INVARIANTS enabled? If not, I am curious if enabling them > > > would convert that rare page fault into rare "CPU %d has more MC banks" > > > assert. > > > > > > Also might be the output of the > > > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179 > > > /dev/cpuctl$x; done > > > command will show the issue (0x179 is the MCG_CAP MSR). > > > You need to load cpuctl(4) if it is not loaded yet. > > > > > > > I don't have INVARIANTS enabled, and I can't enable it on the production > > servers. However, I can turn those three KASSERTs into VERIFYs and see > > what happens. Here is what your command shows on the server that > panicked: > > $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m 0x179 > > /dev/cpuctl$x; done | uniq -c > > 16 MSR 0x179: 0x00000000 0x0f000c14 > > 16 MSR 0x179: 0x00000000 0x0f000814 > > It probably explains it, but it would be more telling if you left the > output as is, so that we can see which CPUs have MCG_CMCI_P (10) bit set. > I didn't sort them, so the first 16 have bit 10 set and the second 16 don't. > > I suspect that your machine has two sockets, and processor in one socket > has CPUs reporting MCG_CMCI_P, while other processor does not. Your SMP > is not quite symmetric, perhaps processors were from different bins? > Could be. Is there some MSR that reports a more specific version number? > > If BSP is selected on reporting socket, everything boots well. If > other socket wins the BSP selection race, cmci is not initialized, but > when per-cpu mca_init() sees CMCI_P bit, it calls cmci_setup() without > allocated cmc state, because BSP did not needed it. > > If I am right, then unconditionally allocating the memory is probably the > only choice there. > > commit 2e2c925ac3b626edc6492a57a80f6b87895801c2 > Author: Konstantin Belousov > Date: Fri Feb 5 04:32:05 2021 +0200 > > x86 mca: unconditionally allocate memory for cmc state > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c > index 03100e77d455..dff3f7631f5c 100644 > --- a/sys/x86/x86/mca.c > +++ b/sys/x86/x86/mca.c > @@ -1047,7 +1047,7 @@ mca_setup(uint64_t mcg_cap) > "force_scan", CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, > 0, > sysctl_mca_scan, "I", "Force an immediate scan for machine > checks"); > #ifdef DEV_APIC > - if (cmci_supported(mcg_cap)) > + if (cpu_vendor_id == CPU_VENDOR_INTEL) > cmci_setup(); > else if (amd_thresholding_supported()) > amd_thresholding_setup(); >