From owner-freebsd-hackers@freebsd.org Tue Sep 3 18:03:47 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 69C55E70A7; Tue, 3 Sep 2019 18:03:47 +0000 (UTC) (envelope-from shreyankfbsd@gmail.com) Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46NFFG4TQqz40Fl; Tue, 3 Sep 2019 18:03:46 +0000 (UTC) (envelope-from shreyankfbsd@gmail.com) Received: by mail-yb1-xb36.google.com with SMTP id a17so6189593ybc.0; Tue, 03 Sep 2019 11:03:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=xG5fyneYlZoVAFs7kHrUX8N9v7dzITYHrvmdocIMmMo=; b=s4TfUmDqSXzlV1EM81tJ0JMqVWb4VV/C5ipeWetJaj21v6RT4S3UbC6ekdYB/LlFmj YEyj0snInLhTnk5f+gjugqJlCChCamlo1T8tsevDZrlsIXx8bqnU246ic6vosZg1bEY8 JB2b/zm2SCuL+ZiOyuYGY+xNukwRv3arysjzHfhesBQTcZWfxlCmolJwQrelEs/FYgX9 N/lzO8dQz1x5bNa8v31A8t8dEeKZZchyye6Ntk5GI1ulsp7oNSRWYpZqdvccscQS5/dW biJ2uRZq9iOeQKfEi+eJKD8ouh/CHWTShVW3U61cEZuuwX7pM6Jz61+feXwUZ0c8Tldq dCAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=xG5fyneYlZoVAFs7kHrUX8N9v7dzITYHrvmdocIMmMo=; b=GuEY3U6h4X58ibnohiaiNkVpCEuVMyD8JN2r+++K3uBjHxqfysvrQscnhh6MFy4T4u 0GSSAunYZ5TUrEBwc3KgolYOY4Pz1UvtI7R6VMZI2NE788034sH0z6nHesPQmWYBv8J5 8uGvUkhGQpz3M/sdisi3y8FuE8JqjNiR+c+GXHwQ6LIPQU319dAqcJ5aFREi4Q5xbkFr EO5bt941Ar8uY/Knki3y2woxZm7+GrgEx9muLwlSECQ0SkGVNwZobyzy1L0vr/bH+V7f u9qWFn8zFliBkzqwCtTYrNZ+SoCl12JHLBGfA5cqCamOXr3fPum1L36R8szHejLkzt1T zmGA== X-Gm-Message-State: APjAAAV/d0R635qJD2HljYNnI7wTnwS0x8GBro2Ktm7awwYJjMQpVfLr /l+xcQZNeMBiwohAxavTjfNHirboXojytf39L0UkT14= X-Google-Smtp-Source: APXvYqzI/dprtxi9F6cpFaUUNq9zf9ee/YZUCP35dfTwvSFqucaqcfoL8+f2lSY+evkoRMxpR9DPowp9DSlCSvdsgeQ= X-Received: by 2002:a25:244:: with SMTP id 65mr25689678ybc.469.1567533825223; Tue, 03 Sep 2019 11:03:45 -0700 (PDT) MIME-Version: 1.0 From: shreyank amartya Date: Tue, 3 Sep 2019 23:33:33 +0530 Message-ID: Subject: hwpmc: NMI cpuxx ... going to debugger To: freebsd-drivers@freebsd.org, freebsd-hackers@freebsd.org, avg@freebsd.org, mmacy@freebsd.org X-Rspamd-Queue-Id: 46NFFG4TQqz40Fl X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=s4TfUmDq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of shreyankfbsd@gmail.com designates 2607:f8b0:4864:20::b36 as permitted sender) smtp.mailfrom=shreyankfbsd@gmail.com X-Spamd-Result: default: False [-3.99 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; TO_DN_NONE(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; IP_SCORE(0.00)[ip: (-9.48), ipnet: 2607:f8b0::/32(-2.77), asn: 15169(-2.28), country: US(-0.05)]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_IN_DNSWL_NONE(0.00)[6.3.b.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; NEURAL_HAM_SHORT(-0.99)[-0.994,0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2019 18:03:47 -0000 Hi, I'm running a profiler which utilizes hwpmc and am frequently running into this situation: after a process context switch out I get an NMI. The NMI handler is unable to service it due to the fact that the pmc hardware is marked as free at the end of the process_csw_out() (hwpmc_mod.c) routine. Here are some logs I've collected when this happens: 1750 231 49360676470980 MDP:CFG:1: cpu=231 ri=0 pm=0xfffff801e7b2ea00 1749 231 49360676470360 CSW:SWI:1: cpu=231 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1748 249 49360676421260 CSW:SWO:3: cpu=249 ri=33 1748 249 49360676421260 CSW:SWO:3: cpu=249 ri=33 1747 249 49360676420320 MDP:SWO:1: pc=0xfffff8010a06c400 pp=0xfffff81077179000 enable-msr=0 1746 249 49360676419000 CSW:SWO:1: cpu=249 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1745 231 49360675789440 MDP:INT:2: retval=0 isnull=16 ovrflw=0 1744 231 49360675789240 MDP:INT:1: cpu=231 tf=0xfffffe0001577f30 um=0 1743 231 49360675788260 CSW:SWO:3: cpu=231 ri=33 1742 231 49360675788080 MDP:SWO:1: pc=0xfffff80106fe1400 pp=0xfffff81077179000 enable-msr=0 1741 231 49360675787880 MDP:CFG:1: cpu=231 ri=0 pm=0x0 1740 231 49360675787020 CSW:SWO:2: cpu=231 ri=17 val=-164 (samp) 1739 231 49360675786940 MDP:REA:2: amd-read (post-munge) id=0 -> -164 1738 231 49360675786880 MDP:REA:2: amd-read (pre-munge) id=0 -> 164 1737 231 49360675786720 MDP:REA:1: amd-read id=0 class=2 1736 231 49360675785180 MDP:STO:1: amd-stop ri=0 1735 231 49360675784160 CSW:SWO:1: cpu=231 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1734 231 49360675617080 MDP:INT:2: retval=1 isnull=0 ovrflw=0 1733 231 49360675615240 MDP:INT:1: cpu=231 tf=0xfffffe0001577f30 um=0 1732 231 49360675505160 MDP:SWI:1: pc=0xfffff80106fe1400 pp=0xfffff81077179000 enable-msr=0 1731 231 49360675503600 MDP:STA:2: amd-start config=0x530076 1730 231 49360675503060 MDP:STA:1: amd-start cpu=231 ri=0 1729 231 49360675500720 MDP:WRI:1: amd-write cpu=231 ri=0 v=fffffffffffd97d4 1728 231 49360675500240 CSW:SWI:1: cpu=231 ri=17 new=157740 1727 231 49360675498680 MDP:CFG:1: cpu=231 ri=0 pm=0xfffff801e7b2ea00 1726 231 49360675498060 CSW:SWI:1: cpu=231 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1725 154 49360675363180 CSW:SWO:3: cpu=154 ri=33 1724 154 49360675362020 MDP:SWO:1: pc=0xfffff80106d59600 pp=0xfffff81077179000 enable-msr=0 1723 154 49360675360220 CSW:SWO:1: cpu=154 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1722 231 49360675009200 CSW:SWO:3: cpu=231 ri=33 1721 231 49360675009020 MDP:SWO:1: pc=0xfffff80106fe1400 pp=0xfffff81077179000 enable-msr=0 1720 231 49360675008320 MDP:CFG:1: cpu=231 ri=0 pm=0x0 1719 231 49360675006220 CSW:SWO:2: cpu=231 ri=17 val=157740 (samp) 1718 231 49360675005700 MDP:REA:2: amd-read (post-munge) id=0 -> 157740 1717 231 49360675005240 MDP:REA:2: amd-read (pre-munge) id=0 -> 281474976552916 1716 231 49360675004140 MDP:REA:1: amd-read id=0 class=2 1715 231 49360675002660 MDP:STO:1: amd-stop ri=0 1714 231 49360675001460 CSW:SWO:1: cpu=231 proc=0xfffff813e80ff000 (5930, java) pp=0xfffff81077179000 1713 231 49360674941580 MDP:INT:2: retval=1 isnull=0 ovrflw=0 1712 231 49360674939500 MDP:INT:1: cpu=231 tf=0xfffffe0001577f30 um=0 1711 231 49360674793500 MDP:SWI:1: pc=0xfffff80106fe1400 pp=0xfffff81077179000 enable-msr=0 1710 231 49360674792680 MDP:STA:2: amd-start config=0x530076 1709 220 49360674792340 CSW:SWO:3: cpu=220 ri=33 1708 220 49360674791060 MDP:SWO:1: pc=0xfffff80106f8a600 pp=0xfffff81077179000 enable-msr=0 1707 231 49360674790840 MDP:STA:1: amd-start cpu=231 ri=0 The stray NMI is entry 1744 & 1745 (isnull=16). In entry 1741 when the context switch out occurs, the hardware is marked free (pm=0x0). The NMI handler is unable to find the hardware and ends up returning 0. Entry 1740 reports the val=-164, I'm not sure about the significance but the value always seems to be negative when this problem occurs. Once i get a stray NMI, I do not get any further interrupts which is results in loss of samples in the generated pmclog. I'm using hwpmc from the most recent stable/12 branch which is same as current (with a stable/12 kernel). I'm running it on a java application with a single event in sampling mode. Any idea on how to resolve/avoid this problem? Any pointers or suggestion are appreciated. Thanks Shreyank