Date: Mon, 28 Dec 2020 00:45:02 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Andrew Gallatin <gallatin@cs.duke.edu> Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: d39f7430a6e1 - main - amd64: preserve %cr2 in NMI/MCE/DBG handlers. Message-ID: <X%2BkObq3DiTYxuzBU@kib.kiev.ua> In-Reply-To: <f6a1732c-5365-ebec-b579-ac06f15820c6@cs.duke.edu> References: <202012271114.0BRBEwOO035891@gitrepo.freebsd.org> <f6a1732c-5365-ebec-b579-ac06f15820c6@cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Dec 27, 2020 at 03:13:09PM -0500, Andrew Gallatin wrote: > On 12/27/20 6:14 AM, Konstantin Belousov wrote: > > The branch main has been updated by kib: > > > > URL: https://urldefense.com/v3/__https://cgit.FreeBSD.org/src/commit/?id=d39f7430a6e1da419d6e4fb871bca5ba7863f738__;!!OToaGQ!7EPo6uRRpq8kWDLzM05a4h158xFeRyJ9PhhE1j04Y5uZaHKskCoGhso0T717aEhpYQ$ > > > > commit d39f7430a6e1da419d6e4fb871bca5ba7863f738 > > Author: Konstantin Belousov <kib@FreeBSD.org> > > AuthorDate: 2020-12-25 21:58:43 +0000 > > Commit: Konstantin Belousov <kib@FreeBSD.org> > > CommitDate: 2020-12-27 10:59:33 +0000 > > > > amd64: preserve %cr2 in NMI/MCE/DBG handlers. > > These handlers could interrupt code which has interrupts disabled, > > and if a spurious page fault occurs during exception handler run, > > we get clobbered %cr2 in higher level stack. > > This is mostly a speculation, but it is based on hints from good sources. > > I assume this is based around the mystery panic I was talking about on irc > last week. Yes, but it is not supposed to fix it, the hope is that it might reduce amount of the smoke around it. > > Can you please explain what a spurious page fault is? A fault where > there is a valid mapping, but we somehow take a fault for no reason? > How often does this happen? Hopefully spurious faults occur rarely, they happens due to the bugs in CPUs. It was relatively common for older models of Intel' CPUs some time ago so that amd64 trap.c has special handling for page faults that should not occur according to the kernel bookkeeping. Look for TDP_RESETSPUR flag and its use in trap_pfault() if interested. In short, we retry the faulted instruction and fall to normal fault handling if it faulted again on retry. In fact I do not think that this code can trigger during NMI. The patch intent was to cover a case that was immediately asked about when I described the paradoxical %cr2 != %rip fault to some people. If the panic can be repeated, at least we will know for sure that it is not NMI handler corrupting %cr2 and can show evidence to relevant channel.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?X%2BkObq3DiTYxuzBU>