Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jun 2012 10:33:22 -0500
From:      Eric van Gyzen <eric@vangyzen.net>
To:        Andrey Zonov <andrey@zonov.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: usertime stale at about 371k seconds
Message-ID:  <4FD60FC2.3060206@vangyzen.net>
In-Reply-To: <4FC71F13.6040008@zonov.org>
References:  <4FC6748B.5030708@zonov.org> <4FC71F13.6040008@zonov.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 05/31/2012 02:34, Andrey Zonov wrote:
> On 5/30/12 11:27 PM, Andrey Zonov wrote:
>> Hi,
>>
>> I have long running process for which `ps -o usertime -p $pid' shows
>> always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same
>> process continue to grow and now it's 21538:53.61. It looks like
>> overflow in resource usage code or something.
>>
>
> I reproduced that problem with attached program. I ran it with 23
> threads on machine with 24 CPUs and after night I see this:
>
> $ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p 24134
> USERTIME TIME
> 6351:24.74 14977:35.19
> USERTIME TIME
> 6351:24.74 15000:34.53
>
> Per thread user-time counts correct:
>
> $ ps -H -o usertime,time -p 24134
> USERTIME TIME
> 0:00.00 0:00.00
> 652:35.84 652:38.59
> 652:34.75 652:37.97
> 652:50.46 652:51.97
> 652:38.93 652:43.08
> 652:39.73 652:43.36
> 652:44.09 652:47.36
> 652:56.49 652:57.94
> 652:51.84 652:54.41
> 652:37.48 652:41.57
> 652:36.61 652:40.90
> 652:39.41 652:42.52
> 653:03.72 653:06.72
> 652:49.96 652:53.25
> 652:45.92 652:49.03
> 652:40.33 652:42.05
> 652:46.53 652:49.31
> 652:44.77 652:47.33
> 653:00.54 653:02.24
> 652:33.31 652:36.13
> 652:51.03 652:52.91
> 652:50.73 652:52.71
> 652:41.32 652:44.64
> 652:59.86 653:03.25
>
> (kgdb) p $my->p_rux
> $14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093,
> rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su =
> 65773652, rux_tu = 904571706136}
> (kgdb) p $my->p_rux
> $15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087,
> rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su =
> 66458587, rux_tu = 913099969825}
>
> As you can see rux_uu stale, but rux_uticks still ticks. I think the
> problem is in calcru1(). This expression
>
> uu = (tu * ut) / tt
>
> overflows.
>
> I applied the following patch:
>
> Index: /usr/src/sys/kern/kern_resource.c
> ===================================================================
> --- /usr/src/sys/kern/kern_resource.c (revision 235394)
> +++ /usr/src/sys/kern/kern_resource.c (working copy)
> @@ -885,7 +885,7 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
> struct timeval *sp)
> {
> /* {user, system, interrupt, total} {ticks, usec}: */
> - uint64_t ut, uu, st, su, it, tt, tu;
> + uint64_t ut, uu, st, su, it, tt, tu, tmp;
>
> ut = ruxp->rux_uticks;
> st = ruxp->rux_sticks;
> @@ -909,10 +909,20 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
> * The normal case, time increased.
> * Enforce monotonicity of bucketed numbers.
> */
> - uu = (tu * ut) / tt;
> + if (ut == 0)
> + uu = 0;
> + else {
> + tmp = tt / ut;
> + uu = tmp ? tu / tmp : 0;
> + }
> if (uu < ruxp->rux_uu)
> uu = ruxp->rux_uu;
>
> and now ran test again.

This looks related to, and possibly identical to, PR kern/76972:

	http://www.freebsd.org/cgi/query-pr.cgi?pr=76972

If you filed a PR, please submit a follow-up to both PRs so they 
reference each other.

Thanks,

Eric



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FD60FC2.3060206>