Date: Thu, 17 May 2018 12:19:57 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Andriy Gapon <avg@FreeBSD.org> Cc: Johannes Lundberg <johalun0@gmail.com>, freebsd-current <freebsd-current@freebsd.org> Subject: Re: Lag after resume culprit found Message-ID: <20180517091957.GF6887@kib.kiev.ua> In-Reply-To: <4d69b9f6-9406-74ba-1780-ac783adcf107@FreeBSD.org> References: <CAECmPwtULDe9GGK0PhnUa7_n=zxripJj9nh5m0RTF9XqKhXKYQ@mail.gmail.com> <acaa419d-891e-96b1-7c1f-3203857c07ec@FreeBSD.org> <CAECmPwsgQhMM6zu=EfV=DQ4VHzEMuQUjD%2B45O-TP=A2U9mM8Qg@mail.gmail.com> <CAECmPwuKoQaD0M-wJagns_YCDMLy_qMnuy%2BceLF5UZtfE_1ehg@mail.gmail.com> <4d69b9f6-9406-74ba-1780-ac783adcf107@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 17, 2018 at 11:06:42AM +0300, Andriy Gapon wrote: > On 17/05/2018 10:56, Johannes Lundberg wrote: > > > > > > On Thu, May 17, 2018 at 8:46 AM, Johannes Lundberg <johalun0@gmail.com > > <mailto:johalun0@gmail.com>> wrote: > > > > > > > > On Thu, May 17, 2018 at 7:43 AM, Andriy Gapon <avg@freebsd.org > > <mailto:avg@freebsd.org>> wrote: > > > > On 17/05/2018 02:07, Johannes Lundberg wrote: > > > https://github.com/freebsd/freebsd/commit/66f063557f257baa9c8aeab9f933171eaa6e1cfa > > <https://github.com/freebsd/freebsd/commit/66f063557f257baa9c8aeab9f933171eaa6e1cfa> > > > x86 cpususpend_handler: call wbinvd after setting suspend state bits > > > > That's very interesting and surprising. > > That commit changes something that happens before suspend, it should not > > have > > any effect on the system state after resume. > > > > Does anyone have a theory of what could be wrong? > > > > > > Nope but moving > > CPU_CLR_ATOMIC(cpu, &suspended_cpus); > > back to the end of that scope fixes it. > > > > > > > > I did some further testing. > > Calling > > CPU_CLR_ATOMIC(cpu, &suspended_cpus); > > before > > pmap_init_pat(); > > is what "breaks" resume. > > > > Is this Intel only or this it happen on AMD as well (which this patch was > > intended for)? > > Not sure about the PAT part, but fpuresume/npxresume would affect all platforms. > It's a bit puzzling that doing PAT manipulations on one AP while another AP is > being brought up is problematic. Probably there is something that I am missing. Manipulating PAT might affect the cache consistency, since contradicting caching attributes are applied to the line of the suspended_cpus variable which is already cached. It might be not the variable itself that causes the final mis-operation, but some other data sharing the line.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180517091957.GF6887>