From owner-freebsd-hackers@FreeBSD.ORG Mon Aug 22 16:21:40 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 21707106567A; Mon, 22 Aug 2011 16:21:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id D68998FC08; Mon, 22 Aug 2011 16:21:39 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 59DFF46B09; Mon, 22 Aug 2011 12:21:39 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E4A698A02F; Mon, 22 Aug 2011 12:21:38 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Mon, 22 Aug 2011 12:21:34 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E3CC033.6070604@rawbw.com> <4E3D808F.1030101@rawbw.com> <201108160925.20568.jhb@freebsd.org> In-Reply-To: <201108160925.20568.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108221221.34447.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 22 Aug 2011 12:21:39 -0400 (EDT) Cc: Yuri , Alexander Best Subject: Re: top(1) loses process user time count when threads end X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Aug 2011 16:21:40 -0000 On Tuesday, August 16, 2011 9:25:20 am John Baldwin wrote: > On Saturday, August 06, 2011 1:57:35 pm Yuri wrote: > > On 08/06/2011 02:11, Alexander Best wrote: > > > On Fri Aug 5 11, Yuri wrote: > > >> I have the process that first runs in 3 threads but later two active > > >> threads exit. > > >> > > >> top(1) shows this moment this way (1 sec intervals): > > >> 30833 yuri 3 76 0 4729M 4225M nanslp 4 0:32 88.62% app > > >> 30833 yuri 3 76 0 4729M 4225M nanslp 6 0:34 90.92% app > > >> 30833 yuri 1 96 0 4729M 4225M CPU1 1 0:03 1.17% app > > >> 30833 yuri 1 98 0 4729M 4226M CPU1 1 0:04 12.89% app > > >> > > >> Process time goes down: 0:34 -> 0:03. Also WCPU goes down 90.92% -> > > >> 1.17% even though this process is CPU bound and does intense things > > >> right after threads exit. > > >> > > >> getrusage(2) though, called in the process, shows the correct user time. > > >> > > >> I think this is the major bug in the process time accounting. > > > could you check, whether kern/128177 or kern/140892 describe your situation? > > > > I have ULE scheduler. kern/128177 talks about single thread with ULE > > scheduler, and my issue is with threads. So I am not sure if it is > > related. There have been no motion on kern/128177 since Feb 9, 2009. > > kern/140892 is probably the same as mine. > > > > In any case, both these PRs have to be fixed since they are very user > > visible, not just some obscure issues. I am not able to reproduce this issue on 8-stable (this is without my suggested patch). I wrote a simple program to fire off 4 CPU bound threads that were signalled to exit when I hit a key on stdin. After that the main thread started doing CPU intensive work. While my CPU % dropped (the main thread was idle in getchar() while the worker threads ran), I did not see a similar drop in the process time: 7644 jhb 5 83 0 10664K 1492K CPU3 3 0:18 150.29% threadtime 7644 jhb 5 85 0 10664K 1492K CPU3 3 0:22 175.05% threadtime 7644 jhb 5 86 0 10664K 1492K CPU3 3 0:26 197.46% threadtime 7644 jhb 5 88 0 10664K 1492K CPU3 3 0:30 217.97% threadtime 7644 jhb 5 89 0 10664K 1492K CPU3 3 0:34 236.72% threadtime 7644 jhb 5 91 0 10664K 1492K CPU3 3 0:38 253.52% threadtime 7644 jhb 5 92 0 10664K 1492K CPU3 3 0:42 269.14% threadtime 7644 jhb 1 73 0 10664K 1432K CPU0 0 0:45 6.40% threadtime 7644 jhb 1 76 0 10664K 1432K CPU4 0 0:46 15.28% threadtime 7644 jhb 1 78 0 10664K 1432K CPU0 0 0:46 23.39% threadtime 7644 jhb 1 81 0 10664K 1432K CPU0 0 0:47 30.76% threadtime 7644 jhb 1 83 0 10664K 1432K CPU0 0 0:48 37.50% threadtime 7644 jhb 1 85 0 10664K 1432K CPU0 0 0:49 43.55% threadtime 7644 jhb 1 86 0 10664K 1432K CPU0 0 0:50 49.07% threadtime Can you come up with a simple test case that shows the bug for you? Also, can you try the patch I posted earlier? -- John Baldwin