From owner-freebsd-hackers@FreeBSD.ORG Wed Mar 13 23:11:05 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 68251C50 for ; Wed, 13 Mar 2013 23:11:05 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id 56E6BA56 for ; Wed, 13 Mar 2013 23:11:05 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id r2DN33Kf091699 for ; Wed, 13 Mar 2013 16:03:03 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <514105A6.40800@rawbw.com> Date: Wed, 13 Mar 2013 16:03:02 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130226 Thunderbird/17.0.3 MIME-Version: 1.0 To: FreeBSD Hackers Subject: top(1) doesn't report the correct CPU time for a multithreaded process Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Mar 2013 23:11:05 -0000 I have a process that is CPU bound with 1 thread in its first 5 seconds, then it creates 200 threads that are all reading/writing from the network, and becomes network bound for the other 6.5min. When I look at this process in top(1), right after 200 threads are created, I see WCPU and CPU values around 3400% and then it goes down to the values below 1% for the rest of the run: 50619 yuri 206 20 0 621M 555M uwait 7 0:31 0.68% myapp In the end, after all threads have quit, process measures its resources with getrusage(RUSAGE_SELF, &u); and it shows that CPU time consumed was like this: user=104609ms sys=8758ms wall=395938ms So "real" CPU percentage wasn't ~0.68%, but was more like 25%. Or maybe it is 6% if to consider 400% the max (there are 4 cores). I am inclined to trust getrusage(2). It was this PR, that is now marked as closed with patch checked in: http://www.freebsd.org/cgi/query-pr.cgi?pr=127331 But it doesn't seem like this code from the patch is even in usr.bin/top/machine.c now (9.1-STABLE). My original PR, considered a duplicate, is also closed: http://www.freebsd.org/cgi/query-pr.cgi?pr=135823 Why top(1) doesn't show the correct CPU time, aggregate for all threads? Is this a regression of the patch in the above PR#127331? Also, why do I ever see 3400% CPU time? This doesn't seem right in any case. Yuri