From owner-freebsd-bugs@FreeBSD.ORG Sat Jun 22 01:50:00 2013 Return-Path: Delivered-To: freebsd-bugs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 89B584C6 for ; Sat, 22 Jun 2013 01:50:00 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 6FE141488 for ; Sat, 22 Jun 2013 01:50:00 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r5M1o0nf076244 for ; Sat, 22 Jun 2013 01:50:00 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r5M1o0xW076243; Sat, 22 Jun 2013 01:50:00 GMT (envelope-from gnats) Resent-Date: Sat, 22 Jun 2013 01:50:00 GMT Resent-Message-Id: <201306220150.r5M1o0xW076243@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, adrian chadd Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 062F24BF for ; Sat, 22 Jun 2013 01:48:35 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from oldred.freebsd.org (oldred.freebsd.org [8.8.178.121]) by mx1.freebsd.org (Postfix) with ESMTP id DD6901483 for ; Sat, 22 Jun 2013 01:48:34 +0000 (UTC) Received: from oldred.freebsd.org ([127.0.1.6]) by oldred.freebsd.org (8.14.5/8.14.7) with ESMTP id r5M1mYAp076077 for ; Sat, 22 Jun 2013 01:48:34 GMT (envelope-from nobody@oldred.freebsd.org) Received: (from nobody@localhost) by oldred.freebsd.org (8.14.5/8.14.5/Submit) id r5M1mYaJ076069; Sat, 22 Jun 2013 01:48:34 GMT (envelope-from nobody) Message-Id: <201306220148.r5M1mYaJ076069@oldred.freebsd.org> Date: Sat, 22 Jun 2013 01:48:34 GMT From: adrian chadd To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Subject: kern/179827: [hwpmc] process-mode counters aren't correctly read on multi-core machines X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jun 2013 01:50:00 -0000 >Number: 179827 >Category: kern >Synopsis: [hwpmc] process-mode counters aren't correctly read on multi-core machines >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Jun 22 01:50:00 UTC 2013 >Closed-Date: >Last-Modified: >Originator: adrian chadd >Release: 9-STABLE, r252049, amd64 (and i386) >Organization: Netflix, Inc >Environment: >Description: When using a process mode counter PMC on a multi-core machine, the pmc_read() -> PMC_OP_PMCRW event is not correctly reading the PMC. What's going on in hwpmc_mod.c is thus: * It grabs the current thread CPU; * It then checks if the PMC is active; * If it's currently active, it reads the PMC from the current CPU rather than the target CPU; * .. otherwise it returns the old counter value. When I enabled pmc debugging: DEBUG in the kernel, then: # sysctl kern.hwpmc.debugflags="csw=swo,swi pmc=ops" I'd see this: * lots of swi, swo (swapin, swapout) of the process, which updates the cached PMC value * a PMC,OPS event every second whilst it was reading the process mode counter * .. but the PMC,OPS value would always be the "old" one, rather than the updated one. The only time the counter value was updated was if the PMC,OPS (ie, the PMC_OP_PMCRW event) occured _during_ the running of the process - ie, the scheduler put the process being counted _and_ the pmcstat process on the same CPU, switched out the process under profiling (which would switch it out and update the value) and switched in pmcstat to do the read. This reads like it was really designed for reading counters from an active process - ie, where you'd put pmc_read() at specific points in a programs execution in order to count how long certain events were taking. It's totally bogus for the SMP case, where your current process being run is on another CPU and doesn't get switched out for long stretches of time, whilst pmcstat wakes up once a second on another CPU to do a counter read. >How-To-Repeat: * grab a multi-core device; * Grab some CPU chewing process that doesn't do any kind of blocking - ie, some minute long math problem; * run pmcstat in counting mode - pmcstat -p instructions -w 1 ./process * watch the counters look totally bogus. * See above for the sysctl line to enable pmc debugging and you'll see how bogus the sampling is. >Fix: I'm not sure. There may be multiple problems here: * One is ensuring the PMC stats are read from the CPU the pmc counter is actively active on, as it may be a different CPU to the thread calling pmc_read() -> PMC_OP_PMCRW; * Another is checking to see if the PMC is actually active at the present moment on _any_ CPU and if it is, fetch the statistics from said correct CPU. * I'm not sure what the deal is with running counting mode samples on a multi-core CPU with multiple threads running in the process under count. Ie, if you have 4 threads on 4 separate CPUs running and you're trying to count '-p instructions', then I think the right behaviour is to enable the instructions counter on -all- the CPUs with those threads active. So what should pmc_read() do? Does the kernel-side correctly read all the CPUs with threads active for that particular registered PMC? Or what is actually going on? >Release-Note: >Audit-Trail: >Unformatted: