Date: Fri, 08 May 2015 19:34:29 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 198149] [hwpmc] pmcstat -P -t (top mode, process sampling) stops after a while Message-ID: <bug-198149-8-XCqSIpS9Dm@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-198149-8@https.bugs.freebsd.org/bugzilla/> References: <bug-198149-8@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198149 --- Comment #15 from John Baldwin <jhb@FreeBSD.org> --- So I don't have a good explanation of why the counters overflow without generating an interrupt, however, it does seem like the interrupt is lost, or perhaps there is a race where the counter can be disabled right at about the time that it should fire, but the counter might still count a few events after it has been disabled. In particular, I captured this sequence of events (filtered to only show CPU 1) where I got an overflow with a value of 2: 72890 1 563885152232313 MDP:REA:2: iaf-read counter overflowed: 2 2264 1 563885043095469 MDP:STO:1: iafctrl=0(0) globalctrl=0(0) 2263 1 563885043094005 MDP:STO:1: iaf-stop iafctrl=0 2262 1 563885043093777 MDP:STO:1: iaf-stop cpu=1 ri=2 2256 1 563885043091809 CSW:SWO:1: cpu=1 proc=0xfffff8005698c538 (5697, threadspin) pp=0xfffff80052fea600 2188 1 563885043029837 MDP:INT:1: cpu=1 fixedctrl=b00 globalctrl=400000000 status=0 ovf=0 2187 1 563885043028445 MDP:INT:1: cpu=1 intrenable=400000000 2186 1 563885043028165 MDP:INT:1: iaf-intr cpu=1 error=0 v=ffffffff0000(ffffffff0000) 2185 1 563885043026045 MDP:INT:1: cpu=1 intrstatus=400000000 2184 1 563885043025621 MDP:INT:1: cpu=1 tf=0x0xfffffe083a551f30 um=1 2113 1 563885042960229 MDP:INT:1: cpu=1 fixedctrl=b00 globalctrl=400000000 status=0 ovf=0 2112 1 563885042958865 MDP:INT:1: cpu=1 intrenable=400000000 2111 1 563885042958665 MDP:INT:1: iaf-intr cpu=1 error=0 v=ffffffff0000(ffffffff0000) 2110 1 563885042957205 MDP:INT:1: cpu=1 intrstatus=400000000 2109 1 563885042956765 MDP:INT:1: cpu=1 tf=0x0xfffffe083a551f30 um=0 2108 1 563885042955049 MDP:SWI:1: cr4=0x406e0 2107 1 563885042954861 MDP:SWI:1: pc=0xfffff8014d0e5300 pp=0xfffff80052fea600 enable-msr=0 2106 1 563885042954569 MDP:STA:1: iafctrl=b00(b00) globalctrl=400000000(400000000) 2103 1 563885042952493 MDP:STA:1: iaf-start cpu=1 ri=2 2102 1 563885042952109 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002 v=fffffffff7d2 iafctrl=0 pmc=fffffffff7d2 2094 1 563885042949821 CSW:SWI:1: cpu=1 ri=23 new=2094 2087 1 563885042947433 CSW:SWI:1: cpu=1 proc=0xfffff8005698c538 (5697, threadspin) pp=0xfffff80052fea600 Here you can see that the interrupts for are occurring "about" every 75 KTR events or so, so the next one would have fired at around 2250. That is when the switchout begins. Note that there is a long gap between the call to iaf_stop() from pmc_process_csw_out() and the later call to iaf_read(), not just in KTR events, but also in terms of clock cycles (109136844 cycles on a 3.3 GHz CPU, so ~33 milliseconds assuming my math is correct). All of the intervening PMC events are PMC interrupts on other CPUs during that time. I'm actually not sure what CPU 1 could have been spinning on for that duration aside from the KTR index itself. I don't see anything that should be blocking the PMC interrupt (an NMI) either, so I wonder if it's possible for the PMC to increment a few times after it has been disabled? I do find that if I change my command to only monitor user events, then this goes away, so presumably the counter was still "running" and counting clock cycles in the PMC code itself when it overflowed. Given that, I'm just inclined to apply the first hack patch as-is (though also fix the similar function for IAP). -- You are receiving this mail because: You are the assignee for the bug.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-198149-8-XCqSIpS9Dm>