From owner-dev-commits-src-all@freebsd.org Sun Dec 27 22:45:15 2020 Return-Path: Delivered-To: dev-commits-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3C4A64C9EF4; Sun, 27 Dec 2020 22:45:15 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4D3wk23sDSz3qtB; Sun, 27 Dec 2020 22:45:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 0BRMj2Br047557 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Mon, 28 Dec 2020 00:45:05 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 0BRMj2Br047557 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 0BRMj2fn047554; Mon, 28 Dec 2020 00:45:02 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 28 Dec 2020 00:45:02 +0200 From: Konstantin Belousov To: Andrew Gallatin Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: d39f7430a6e1 - main - amd64: preserve %cr2 in NMI/MCE/DBG handlers. Message-ID: References: <202012271114.0BRBEwOO035891@gitrepo.freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on tom.home X-Rspamd-Queue-Id: 4D3wk23sDSz3qtB X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: dev-commits-src-all@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: "Commit messages for all branches of the src repository." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Dec 2020 22:45:15 -0000 On Sun, Dec 27, 2020 at 03:13:09PM -0500, Andrew Gallatin wrote: > On 12/27/20 6:14 AM, Konstantin Belousov wrote: > > The branch main has been updated by kib: > > > > URL: https://urldefense.com/v3/__https://cgit.FreeBSD.org/src/commit/?id=d39f7430a6e1da419d6e4fb871bca5ba7863f738__;!!OToaGQ!7EPo6uRRpq8kWDLzM05a4h158xFeRyJ9PhhE1j04Y5uZaHKskCoGhso0T717aEhpYQ$ > > > > commit d39f7430a6e1da419d6e4fb871bca5ba7863f738 > > Author: Konstantin Belousov > > AuthorDate: 2020-12-25 21:58:43 +0000 > > Commit: Konstantin Belousov > > CommitDate: 2020-12-27 10:59:33 +0000 > > > > amd64: preserve %cr2 in NMI/MCE/DBG handlers. > > These handlers could interrupt code which has interrupts disabled, > > and if a spurious page fault occurs during exception handler run, > > we get clobbered %cr2 in higher level stack. > > This is mostly a speculation, but it is based on hints from good sources. > > I assume this is based around the mystery panic I was talking about on irc > last week. Yes, but it is not supposed to fix it, the hope is that it might reduce amount of the smoke around it. > > Can you please explain what a spurious page fault is? A fault where > there is a valid mapping, but we somehow take a fault for no reason? > How often does this happen? Hopefully spurious faults occur rarely, they happens due to the bugs in CPUs. It was relatively common for older models of Intel' CPUs some time ago so that amd64 trap.c has special handling for page faults that should not occur according to the kernel bookkeeping. Look for TDP_RESETSPUR flag and its use in trap_pfault() if interested. In short, we retry the faulted instruction and fall to normal fault handling if it faulted again on retry. In fact I do not think that this code can trigger during NMI. The patch intent was to cover a case that was immediately asked about when I described the paradoxical %cr2 != %rip fault to some people. If the panic can be repeated, at least we will know for sure that it is not NMI handler corrupting %cr2 and can show evidence to relevant channel.