From owner-freebsd-current@freebsd.org  Tue Nov  8 07:42:19 2016
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 557DBC3675D
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Tue,  8 Nov 2016 07:42:19 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id A787E1917
 for <freebsd-current@FreeBSD.org>; Tue,  8 Nov 2016 07:42:18 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id JAA12871
 for <freebsd-current@FreeBSD.org>; Tue, 08 Nov 2016 09:42:07 +0200 (EET)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1c412l-0003yt-BZ
 for freebsd-current@FreeBSD.org; Tue, 08 Nov 2016 09:42:07 +0200
To: FreeBSD Current <freebsd-current@FreeBSD.org>
From: Andriy Gapon <avg@FreeBSD.org>
Subject: hwpmc / amd panic when stopping pmcstat
Message-ID: <15e79772-1c1d-87ce-cc44-717c8f9448ab@FreeBSD.org>
Date: Tue, 8 Nov 2016 09:41:06 +0200
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Nov 2016 07:42:19 -0000


panic: [pmc,1473] pp_pmcval outside of expected range cpu=2 ri=17
pp_pmcval=fffffffffa529f5b pm_reloadcount=10000

(kgdb) p pp->pp_pmcs[17].pp_pmc->pm_state
$2 = PMC_STATE_DELETED

Those are interesting bits.  The counter is logically stopped and the value read
from the hardware is small (become huge after "munging").  My theory is that, at
least for AMD processors, a counter keeps running after overflowing.
At the same time, amd_intr() takes an early way out if pm_state !=
PMC_STATE_RUNNING.  So, the counter is allowed to overflow if it's logically
stopped.  But that makes the assertion in pmc_process_csw_out() invalid.

It seems that the following patch fixes the problem.
But I wonder if there is a better, perhaps hardware specific, fix.

Also, maybe the condition should be pm_state == PMC_STATE_RUNNING instead of
pm_state != PMC_STATE_DELETED.

diff --git a/sys/dev/hwpmc/hwpmc_mod.c b/sys/dev/hwpmc/hwpmc_mod.c
index 55dc499b1c40e..36bcccb8c27ac 100644
--- a/sys/dev/hwpmc/hwpmc_mod.c
+++ b/sys/dev/hwpmc/hwpmc_mod.c
@@ -1431,8 +1431,8 @@ pmc_process_csw_out(struct thread *td)
 		 * save the reading.
 		 */

-		if (pp != NULL && pp->pp_pmcs[ri].pp_pmc != NULL) {
-
+		if (pm->pm_state != PMC_STATE_DELETED && pp != NULL &&
+		    pp->pp_pmcs[ri].pp_pmc != NULL) {
 			KASSERT(pm == pp->pp_pmcs[ri].pp_pmc,
 			    ("[pmc,%d] pm %p != pp_pmcs[%d] %p", __LINE__,
 				pm, ri, pp->pp_pmcs[ri].pp_pmc));

-- 
Andriy Gapon