From owner-freebsd-hackers@FreeBSD.ORG Tue Jun 12 20:30:18 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7C9CE106566B for ; Tue, 12 Jun 2012 20:30:18 +0000 (UTC) (envelope-from andrey@zonov.org) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id E62B38FC1C for ; Tue, 12 Jun 2012 20:30:17 +0000 (UTC) Received: by lbon10 with SMTP id n10so886153lbo.13 for ; Tue, 12 Jun 2012 13:30:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=RQVP3zGTQVRCimjQN8fugAicEby7N1IZY3I8WP7Hql4=; b=LAxmEAcWAq1r29jqtStMQz+W9Pm+UzQ8y38/m8zHjtZVA4xm/aIBvhVs1UwtIK7in8 +YbWz0mJPKaUja8spTod5PrB/wfItWSfsz+Z4TatGsgDh2d8dCC8V8SXoPQwe3bxerxw G0q3PCPscSWXYJtb4J3G8ZUzv1jtG3SmKVEvry0pnHBobgcFAmMq/73XJYEZvFakhit8 N9QRTuDUc0z4l2zC9X0JSltIchhIXmwEN/ZeppR4zGTn4WFAX43jfTrKe0phKeWbpST9 dz4rpLtP/yK82rTRmRU5cnyL5vC9EXNhrrn41hTS5xnNG+nDPu3pniHsaHWEeiFPzABT TCiA== Received: by 10.152.147.33 with SMTP id th1mr22299124lab.9.1339533015829; Tue, 12 Jun 2012 13:30:15 -0700 (PDT) Received: from zont-osx.local (ppp95-165-137-174.pppoe.spdop.ru. [95.165.137.174]) by mx.google.com with ESMTPS id lv13sm30908337lab.8.2012.06.12.13.30.09 (version=SSLv3 cipher=OTHER); Tue, 12 Jun 2012 13:30:10 -0700 (PDT) Message-ID: <4FD7A6D0.1080002@zonov.org> Date: Wed, 13 Jun 2012 00:30:08 +0400 From: Andrey Zonov User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Eric van Gyzen References: <4FC6748B.5030708@zonov.org> <4FC71F13.6040008@zonov.org> <4FD60FC2.3060206@vangyzen.net> In-Reply-To: <4FD60FC2.3060206@vangyzen.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQnnasoFQemilVzfKfmIxA8uMNHuNpcFVJ0SU3p2EnuLqS0dby9cqnyJeVUKJ5nF6HGsJQnA Cc: freebsd-hackers@freebsd.org Subject: Re: usertime stale at about 371k seconds X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jun 2012 20:30:18 -0000 On 6/11/12 7:33 PM, Eric van Gyzen wrote: > On 05/31/2012 02:34, Andrey Zonov wrote: >> On 5/30/12 11:27 PM, Andrey Zonov wrote: >>> Hi, >>> >>> I have long running process for which `ps -o usertime -p $pid' shows >>> always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same >>> process continue to grow and now it's 21538:53.61. It looks like >>> overflow in resource usage code or something. >>> >> >> I reproduced that problem with attached program. I ran it with 23 >> threads on machine with 24 CPUs and after night I see this: >> >> $ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p >> 24134 >> USERTIME TIME >> 6351:24.74 14977:35.19 >> USERTIME TIME >> 6351:24.74 15000:34.53 >> >> Per thread user-time counts correct: >> >> $ ps -H -o usertime,time -p 24134 >> USERTIME TIME >> 0:00.00 0:00.00 >> 652:35.84 652:38.59 >> 652:34.75 652:37.97 >> 652:50.46 652:51.97 >> 652:38.93 652:43.08 >> 652:39.73 652:43.36 >> 652:44.09 652:47.36 >> 652:56.49 652:57.94 >> 652:51.84 652:54.41 >> 652:37.48 652:41.57 >> 652:36.61 652:40.90 >> 652:39.41 652:42.52 >> 653:03.72 653:06.72 >> 652:49.96 652:53.25 >> 652:45.92 652:49.03 >> 652:40.33 652:42.05 >> 652:46.53 652:49.31 >> 652:44.77 652:47.33 >> 653:00.54 653:02.24 >> 652:33.31 652:36.13 >> 652:51.03 652:52.91 >> 652:50.73 652:52.71 >> 652:41.32 652:44.64 >> 652:59.86 653:03.25 >> >> (kgdb) p $my->p_rux >> $14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093, >> rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su = >> 65773652, rux_tu = 904571706136} >> (kgdb) p $my->p_rux >> $15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087, >> rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su = >> 66458587, rux_tu = 913099969825} >> >> As you can see rux_uu stale, but rux_uticks still ticks. I think the >> problem is in calcru1(). This expression >> >> uu = (tu * ut) / tt >> >> overflows. >> >> I applied the following patch: >> >> Index: /usr/src/sys/kern/kern_resource.c >> =================================================================== >> --- /usr/src/sys/kern/kern_resource.c (revision 235394) >> +++ /usr/src/sys/kern/kern_resource.c (working copy) >> @@ -885,7 +885,7 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s >> struct timeval *sp) >> { >> /* {user, system, interrupt, total} {ticks, usec}: */ >> - uint64_t ut, uu, st, su, it, tt, tu; >> + uint64_t ut, uu, st, su, it, tt, tu, tmp; >> >> ut = ruxp->rux_uticks; >> st = ruxp->rux_sticks; >> @@ -909,10 +909,20 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s >> * The normal case, time increased. >> * Enforce monotonicity of bucketed numbers. >> */ >> - uu = (tu * ut) / tt; >> + if (ut == 0) >> + uu = 0; >> + else { >> + tmp = tt / ut; >> + uu = tmp ? tu / tmp : 0; >> + } >> if (uu < ruxp->rux_uu) >> uu = ruxp->rux_uu; >> >> and now ran test again. > > This looks related to, and possibly identical to, PR kern/76972: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=76972 Yes, that's the same. > > If you filed a PR, please submit a follow-up to both PRs so they > reference each other. No, I didn't. I want to fix the problem not just file a PR and wait for years. > > Thanks, > Thank you. > Eric -- Andrey Zonov