Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Jun 2012 00:30:08 +0400
From:      Andrey Zonov <andrey@zonov.org>
To:        Eric van Gyzen <eric@vangyzen.net>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: usertime stale at about 371k seconds
Message-ID:  <4FD7A6D0.1080002@zonov.org>
In-Reply-To: <4FD60FC2.3060206@vangyzen.net>
References:  <4FC6748B.5030708@zonov.org> <4FC71F13.6040008@zonov.org> <4FD60FC2.3060206@vangyzen.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 6/11/12 7:33 PM, Eric van Gyzen wrote:
> On 05/31/2012 02:34, Andrey Zonov wrote:
>> On 5/30/12 11:27 PM, Andrey Zonov wrote:
>>> Hi,
>>>
>>> I have long running process for which `ps -o usertime -p $pid' shows
>>> always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same
>>> process continue to grow and now it's 21538:53.61. It looks like
>>> overflow in resource usage code or something.
>>>
>>
>> I reproduced that problem with attached program. I ran it with 23
>> threads on machine with 24 CPUs and after night I see this:
>>
>> $ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p
>> 24134
>> USERTIME TIME
>> 6351:24.74 14977:35.19
>> USERTIME TIME
>> 6351:24.74 15000:34.53
>>
>> Per thread user-time counts correct:
>>
>> $ ps -H -o usertime,time -p 24134
>> USERTIME TIME
>> 0:00.00 0:00.00
>> 652:35.84 652:38.59
>> 652:34.75 652:37.97
>> 652:50.46 652:51.97
>> 652:38.93 652:43.08
>> 652:39.73 652:43.36
>> 652:44.09 652:47.36
>> 652:56.49 652:57.94
>> 652:51.84 652:54.41
>> 652:37.48 652:41.57
>> 652:36.61 652:40.90
>> 652:39.41 652:42.52
>> 653:03.72 653:06.72
>> 652:49.96 652:53.25
>> 652:45.92 652:49.03
>> 652:40.33 652:42.05
>> 652:46.53 652:49.31
>> 652:44.77 652:47.33
>> 653:00.54 653:02.24
>> 652:33.31 652:36.13
>> 652:51.03 652:52.91
>> 652:50.73 652:52.71
>> 652:41.32 652:44.64
>> 652:59.86 653:03.25
>>
>> (kgdb) p $my->p_rux
>> $14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093,
>> rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su =
>> 65773652, rux_tu = 904571706136}
>> (kgdb) p $my->p_rux
>> $15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087,
>> rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su =
>> 66458587, rux_tu = 913099969825}
>>
>> As you can see rux_uu stale, but rux_uticks still ticks. I think the
>> problem is in calcru1(). This expression
>>
>> uu = (tu * ut) / tt
>>
>> overflows.
>>
>> I applied the following patch:
>>
>> Index: /usr/src/sys/kern/kern_resource.c
>> ===================================================================
>> --- /usr/src/sys/kern/kern_resource.c (revision 235394)
>> +++ /usr/src/sys/kern/kern_resource.c (working copy)
>> @@ -885,7 +885,7 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
>> struct timeval *sp)
>> {
>> /* {user, system, interrupt, total} {ticks, usec}: */
>> - uint64_t ut, uu, st, su, it, tt, tu;
>> + uint64_t ut, uu, st, su, it, tt, tu, tmp;
>>
>> ut = ruxp->rux_uticks;
>> st = ruxp->rux_sticks;
>> @@ -909,10 +909,20 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
>> * The normal case, time increased.
>> * Enforce monotonicity of bucketed numbers.
>> */
>> - uu = (tu * ut) / tt;
>> + if (ut == 0)
>> + uu = 0;
>> + else {
>> + tmp = tt / ut;
>> + uu = tmp ? tu / tmp : 0;
>> + }
>> if (uu < ruxp->rux_uu)
>> uu = ruxp->rux_uu;
>>
>> and now ran test again.
>
> This looks related to, and possibly identical to, PR kern/76972:
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=76972

Yes, that's the same.

>
> If you filed a PR, please submit a follow-up to both PRs so they
> reference each other.

No, I didn't.  I want to fix the problem not just file a PR and wait for 
years.

>
> Thanks,
>

Thank you.

> Eric


-- 
Andrey Zonov



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FD7A6D0.1080002>