Date: Fri, 28 Oct 2016 07:43:40 -0700 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: Andriy Gapon <avg@freebsd.org> Subject: Re: pmcstat -T -P instructions -t $pid ==> NMI Message-ID: <1578218.qqa780aDhB@ralph.baldwin.cx> In-Reply-To: <a51a8817-3274-5726-8dea-00d122fa11a9@FreeBSD.org> References: <0479e273-1e86-13d0-ccd4-2fb6f57cff91@FreeBSD.org> <ab4cb175-6ab8-c1fa-e6ea-c9bfcd128d83@FreeBSD.org> <a51a8817-3274-5726-8dea-00d122fa11a9@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, October 28, 2016 04:50:40 PM Andriy Gapon wrote: > On 28/10/2016 16:01, Andriy Gapon wrote: > > I suspect that under the right conditions it's possible for wrmsr to cause a > > counter overflow, such that an interrupt (if enabled) is generated after wrmsr > > is executed, even if wrmsr disables the counter. > > > > In amd_intr() we have this code: > > wrmsr(evsel, config & ~AMD_PMC_ENABLE); > > wrmsr(perfctr, AMD_RELOAD_COUNT_TO_PERFCTR_VALUE(v)); > > > > /* Restart the counter if logging succeeded. */ > > error = pmc_process_interrupt(cpu, PMC_HR, pm, tf, > > TRAPF_USERMODE(tf)); > > if (error == 0) > > wrmsr(evsel, config | AMD_PMC_ENABLE); > > > > I suspect that in the scenario above, if it is indeed possible, the last wrmsr > > would re-enable the counter that's supposed to be stopped. > > > > I think that writing back the original value should be more correct, that is: > > wrmsr(evsel, config); > > > > I'll test if this change would help. > > So, I have tried this change: > > --- a/sys/dev/hwpmc/hwpmc_amd.c > +++ b/sys/dev/hwpmc/hwpmc_amd.c > @@ -577,6 +577,7 @@ amd_start_pmc(int cpu, int ri) > > PMCDBG1(MDP,STA,2,"amd-start config=0x%x", config); > > + KASSERT(cpu == PCPU_GET(cpuid), ("requested cpu is not current cpu")); > wrmsr(pd->pm_evsel, config); > return 0; > } > @@ -613,6 +614,7 @@ amd_stop_pmc(int cpu, int ri) > > /* turn off the PMC ENABLE bit */ > config = pm->pm_md.pm_amd.pm_amd_evsel & ~AMD_PMC_ENABLE; > + KASSERT(cpu == PCPU_GET(cpuid), ("requested cpu is not current cpu")); > wrmsr(pd->pm_evsel, config); > return 0; > } > @@ -676,6 +678,7 @@ amd_intr(int cpu, struct trapframe *tf) > perfctr = AMD_PMC_PERFCTR_0 + i; > v = pm->pm_sc.pm_reloadcount; > config = rdmsr(evsel); > + PMCDBG1(MDP,INT,2, "enabled=%d", config & AMD_PMC_ENABLE); > > KASSERT((config & ~AMD_PMC_ENABLE) == > (pm->pm_md.pm_amd.pm_amd_evsel & ~AMD_PMC_ENABLE), > @@ -689,12 +692,13 @@ amd_intr(int cpu, struct trapframe *tf) > error = pmc_process_interrupt(cpu, PMC_HR, pm, tf, > TRAPF_USERMODE(tf)); > if (error == 0) > - wrmsr(evsel, config | AMD_PMC_ENABLE); > + wrmsr(evsel, config); > } > > atomic_add_int(retval ? &pmc_stats.pm_intr_processed : > &pmc_stats.pm_intr_ignored, 1); > > + PMCDBG1(MDP,INT,3, "reval=%d", retval); > return (retval); > } > > > > And I couldn't reproduce the problem with it. > Also, in the debug log I see the following, for instance: > 315466 1 1068994822044 MDP:INT:3: reval=1 > 315465 1 1068994821176 MDP:INT:2: enabled=4194304 > 315464 1 1068994820930 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 > 315463 1 1068994796610 MDP:INT:3: reval=1 > 315462 1 1068994795833 MDP:INT:2: enabled=4194304 > 315461 1 1068994795589 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 > 315460 1 1068994771107 MDP:INT:3: reval=1 > 315459 1 1068994770176 MDP:INT:2: enabled=4194304 > 315458 1 1068994769933 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 > 315457 0 1068994766498 MDP:SWO:1: pc=0xfffff8001c6e0e00 pp=0x0 enable-msr=0 > 315456 0 1068994765449 CSW:SWO:1: cpu=0 proc=0xfffff80073767a50 (1655, > pmcstat) pp=0x0 > 315455 1 1068994742201 MDP:INT:3: reval=1 > 315454 1 1068994739535 MDP:INT:2: enabled=4194304 > 315453 1 1068994739169 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 > 315452 1 1068994700076 MDP:SWI:1: pc=0xfffff8007318ea00 > pp=0xfffff80021f56400 enable-msr=0 > 315451 1 1068994698957 MDP:STA:2: amd-start config=0x5300c0 > 315450 1 1068994698699 MDP:STA:1: amd-start cpu=1 ri=0 > 315449 1 1068994697513 MDP:WRI:1: amd-write cpu=1 ri=0 v=ffffffffffff0000 > 315448 1 1068994696676 CSW:SWI:1: cpu=1 ri=17 new=65536 > 315447 1 1068994694784 MDP:CFG:1: cpu=1 ri=0 pm=0xfffff8001c6e0900 > 315446 1 1068994691210 CSW:SWI:1: cpu=1 proc=0xfffff8017b0d3a50 (1654, > burnK7) pp=0xfffff80021f56400 > 315445 0 1068994674368 MDP:SWO:1: pc=0xfffff8001c6e0e00 > pp=0xfffff80021f56400 enable-msr=0 > 315444 0 1068994674033 MDP:CFG:1: cpu=0 ri=0 pm=0x0 > 315443 0 1068994673597 CSW:SWO:1: cpu=0 ri=17 tmp=-3205 (samp) > 315442 0 1068994673412 MDP:REA:2: amd-read (post-munge) id=0 -> 65536 > 315441 0 1068994673247 MDP:REA:2: amd-read (pre-munge) id=0 -> 281474976645120 > 315440 0 1068994673006 MDP:REA:1: amd-read id=0 class=2 > 315439 0 1068994672443 MDP:INT:3: reval=1 > 315438 0 1068994670981 MDP:INT:2: enabled=0 > 315437 0 1068994670591 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0 > 315436 0 1068994669599 MDP:STO:1: amd-stop ri=0 > 315435 0 1068994668389 CSW:SWO:1: cpu=0 proc=0xfffff8017b0d3a50 (1654, > burnK7) pp=0xfffff80021f56400 > 315434 0 1068994642304 MDP:INT:3: reval=1 > 315433 0 1068994641339 MDP:INT:2: enabled=4194304 > 315432 0 1068994640412 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1 > 315431 0 1068994598561 MDP:INT:3: reval=1 > 315430 0 1068994596854 MDP:INT:2: enabled=4194304 > 315429 0 1068994596590 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0 > 315428 0 1068994533426 MDP:INT:3: reval=1 > 315427 0 1068994532643 MDP:INT:2: enabled=4194304 > 315426 0 1068994532390 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1 > 315425 0 1068994508088 MDP:INT:3: reval=1 > 315424 0 1068994507167 MDP:INT:2: enabled=4194304 > 315423 0 1068994506915 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1 > > I think that event #315438 confirms my hypothesis and that previously we would > get a stray NMI after that event. There is a dozen such events per more than a > million of interrupts. I think your patch is probably correct. hwpmc_core.c seems to have the same issue. I think it could be fixed similarly though it doesn't seem to have ever triggered. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1578218.qqa780aDhB>