Date: Tue, 3 Sep 2019 23:33:33 +0530 From: shreyank amartya <shreyankfbsd@gmail.com> To: freebsd-drivers@freebsd.org, freebsd-hackers@freebsd.org, avg@freebsd.org, mmacy@freebsd.org Subject: hwpmc: NMI cpuxx ... going to debugger Message-ID: <CAD9jf8C82rhPSr77u9nQ72_kPyL6%2Bz%2BEas5nROnZR1z%2BsdC3oQ@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, I'm running a profiler which utilizes hwpmc and am frequently running into this situation: after a process context switch out I get an NMI. The NMI handler is unable to service it due to the fact that the pmc hardware is marked as free at the end of the process_csw_out() (hwpmc_mod.c) routine. Here are some logs I've collected when this happens: 1750 231 49360676470980 MDP:CFG:1: cpu=231 ri=0 pm=0xfffff801e7b2ea00 1749 231 49360676470360 CSW:SWI:1: cpu=231 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1748 249 49360676421260 CSW:SWO:3: cpu=249 ri=33 1748 249 49360676421260 CSW:SWO:3: cpu=249 ri=33 1747 249 49360676420320 MDP:SWO:1: pc=0xfffff8010a06c400 pp=0xfffff81077179000 enable-msr=0 1746 249 49360676419000 CSW:SWO:1: cpu=249 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1745 231 49360675789440 MDP:INT:2: retval=0 isnull=16 ovrflw=0 1744 231 49360675789240 MDP:INT:1: cpu=231 tf=0xfffffe0001577f30 um=0 1743 231 49360675788260 CSW:SWO:3: cpu=231 ri=33 1742 231 49360675788080 MDP:SWO:1: pc=0xfffff80106fe1400 pp=0xfffff81077179000 enable-msr=0 1741 231 49360675787880 MDP:CFG:1: cpu=231 ri=0 pm=0x0 1740 231 49360675787020 CSW:SWO:2: cpu=231 ri=17 val=-164 (samp) 1739 231 49360675786940 MDP:REA:2: amd-read (post-munge) id=0 -> -164 1738 231 49360675786880 MDP:REA:2: amd-read (pre-munge) id=0 -> 164 1737 231 49360675786720 MDP:REA:1: amd-read id=0 class=2 1736 231 49360675785180 MDP:STO:1: amd-stop ri=0 1735 231 49360675784160 CSW:SWO:1: cpu=231 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1734 231 49360675617080 MDP:INT:2: retval=1 isnull=0 ovrflw=0 1733 231 49360675615240 MDP:INT:1: cpu=231 tf=0xfffffe0001577f30 um=0 1732 231 49360675505160 MDP:SWI:1: pc=0xfffff80106fe1400 pp=0xfffff81077179000 enable-msr=0 1731 231 49360675503600 MDP:STA:2: amd-start config=0x530076 1730 231 49360675503060 MDP:STA:1: amd-start cpu=231 ri=0 1729 231 49360675500720 MDP:WRI:1: amd-write cpu=231 ri=0 v=fffffffffffd97d4 1728 231 49360675500240 CSW:SWI:1: cpu=231 ri=17 new=157740 1727 231 49360675498680 MDP:CFG:1: cpu=231 ri=0 pm=0xfffff801e7b2ea00 1726 231 49360675498060 CSW:SWI:1: cpu=231 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1725 154 49360675363180 CSW:SWO:3: cpu=154 ri=33 1724 154 49360675362020 MDP:SWO:1: pc=0xfffff80106d59600 pp=0xfffff81077179000 enable-msr=0 1723 154 49360675360220 CSW:SWO:1: cpu=154 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1722 231 49360675009200 CSW:SWO:3: cpu=231 ri=33 1721 231 49360675009020 MDP:SWO:1: pc=0xfffff80106fe1400 pp=0xfffff81077179000 enable-msr=0 1720 231 49360675008320 MDP:CFG:1: cpu=231 ri=0 pm=0x0 1719 231 49360675006220 CSW:SWO:2: cpu=231 ri=17 val=157740 (samp) 1718 231 49360675005700 MDP:REA:2: amd-read (post-munge) id=0 -> 157740 1717 231 49360675005240 MDP:REA:2: amd-read (pre-munge) id=0 -> 281474976552916 1716 231 49360675004140 MDP:REA:1: amd-read id=0 class=2 1715 231 49360675002660 MDP:STO:1: amd-stop ri=0 1714 231 49360675001460 CSW:SWO:1: cpu=231 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1713 231 49360674941580 MDP:INT:2: retval=1 isnull=0 ovrflw=0 1712 231 49360674939500 MDP:INT:1: cpu=231 tf=0xfffffe0001577f30 um=0 1711 231 49360674793500 MDP:SWI:1: pc=0xfffff80106fe1400 pp=0xfffff81077179000 enable-msr=0 1710 231 49360674792680 MDP:STA:2: amd-start config=0x530076 1709 220 49360674792340 CSW:SWO:3: cpu=220 ri=33 1708 220 49360674791060 MDP:SWO:1: pc=0xfffff80106f8a600 pp=0xfffff81077179000 enable-msr=0 1707 231 49360674790840 MDP:STA:1: amd-start cpu=231 ri=0 The stray NMI is entry 1744 & 1745 (isnull=16). In entry 1741 when the context switch out occurs, the hardware is marked free (pm=0x0). The NMI handler is unable to find the hardware and ends up returning 0. Entry 1740 reports the val=-164, I'm not sure about the significance but the value always seems to be negative when this problem occurs. Once i get a stray NMI, I do not get any further interrupts which is results in loss of samples in the generated pmclog. I'm using hwpmc from the most recent stable/12 branch which is same as current (with a stable/12 kernel). I'm running it on a java application with a single event in sampling mode. Any idea on how to resolve/avoid this problem? Any pointers or suggestion are appreciated. Thanks Shreyank
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAD9jf8C82rhPSr77u9nQ72_kPyL6%2Bz%2BEas5nROnZR1z%2BsdC3oQ>