From owner-freebsd-bugs@FreeBSD.ORG Fri May 8 10:58:46 2015 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4D4B2B3C for ; Fri, 8 May 2015 10:58:46 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1E228162C for ; Fri, 8 May 2015 10:58:46 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t48AwjPZ058392 for ; Fri, 8 May 2015 10:58:45 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 198149] [hwpmc] pmcstat -P -t (top mode, process sampling) stops after a while Date: Fri, 08 May 2015 10:58:46 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: jhb@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 May 2015 10:58:46 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198149 --- Comment #14 from John Baldwin --- This survived an overnight run with pmcstat still getting samples this morning. I added debugging to print a message each time one of these fixes was applied (and in the case of the first patch, I outputted the raw value of the PMC). Both of these conditions fired fairly consistently during the test (once every few seconds or so). In addition, when I had run with just the first patch, I had seen raw PMC counter values that in my debug messages that could be a bit large, for example: CPU 1: counter overflowed: 87516 CPU 1: counter overflowed: 22 CPU 12: counter overflowed: 2 CPU 1: counter overflowed: 2 CPU 1: counter overflowed: 13629 CPU 1: counter overflowed: 2 CPU 1: counter overflowed: 2 With both patches applied I do not see "large" values, only small ones: CPU 5: fixing zero PMC CPU 1: counter overflowed: 20 CPU 1: fixing zero PMC CPU 1: counter overflowed: 2 CPU 1: fixing zero PMC CPU 1: fixing zero PMC CPU 15: fixing zero PMC CPU 1: fixing zero PMC CPU 1: counter overflowed: 4 CPU 1: counter overflowed: 2 CPU 1: counter overflowed: 2 CPU 1: fixing zero PMC CPU 1: counter overflowed: 2 CPU 1: fixing zero PMC CPU 1: counter overflowed: 2 CPU 8: fixing zero PMC CPU 1: counter overflowed: 2 CPU 1: counter overflowed: 2 CPU 1: counter overflowed: 2 CPU 2: counter overflowed: 2 CPU 9: counter overflowed: 2 CPU 3: fixing zero PMC CPU 8: fixing zero PMC CPU 1: counter overflowed: 22 CPU 1: fixing zero PMC CPU 1: fixing zero PMC CPU 1: counter overflowed: 2 CPU 1: counter overflowed: 27 CPU 1: counter overflowed: 5 CPU 1: fixing zero PMC ..... I had wondered if the second bug (writing a PMC value of zero) could have been the source of the second bug (you can see how it would easily trigger it: if you write a PMC of zero and the event happens 2 times before your next context switch you would have a raw value of "2" when you switched out). However, whilt it seems to have fixed some of them (the "large" ones) it does not seem to have fixed all of them. I definitely think the second fix is probably legit (and has been present since sampling was added to PMC). I think the first change is also technically correct, but I'm not sure why we are seeing those values. -- You are receiving this mail because: You are the assignee for the bug.