Date: Mon, 11 Jun 2012 10:33:22 -0500 From: Eric van Gyzen <eric@vangyzen.net> To: Andrey Zonov <andrey@zonov.org> Cc: freebsd-hackers@freebsd.org Subject: Re: usertime stale at about 371k seconds Message-ID: <4FD60FC2.3060206@vangyzen.net> In-Reply-To: <4FC71F13.6040008@zonov.org> References: <4FC6748B.5030708@zonov.org> <4FC71F13.6040008@zonov.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 05/31/2012 02:34, Andrey Zonov wrote: > On 5/30/12 11:27 PM, Andrey Zonov wrote: >> Hi, >> >> I have long running process for which `ps -o usertime -p $pid' shows >> always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same >> process continue to grow and now it's 21538:53.61. It looks like >> overflow in resource usage code or something. >> > > I reproduced that problem with attached program. I ran it with 23 > threads on machine with 24 CPUs and after night I see this: > > $ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p 24134 > USERTIME TIME > 6351:24.74 14977:35.19 > USERTIME TIME > 6351:24.74 15000:34.53 > > Per thread user-time counts correct: > > $ ps -H -o usertime,time -p 24134 > USERTIME TIME > 0:00.00 0:00.00 > 652:35.84 652:38.59 > 652:34.75 652:37.97 > 652:50.46 652:51.97 > 652:38.93 652:43.08 > 652:39.73 652:43.36 > 652:44.09 652:47.36 > 652:56.49 652:57.94 > 652:51.84 652:54.41 > 652:37.48 652:41.57 > 652:36.61 652:40.90 > 652:39.41 652:42.52 > 653:03.72 653:06.72 > 652:49.96 652:53.25 > 652:45.92 652:49.03 > 652:40.33 652:42.05 > 652:46.53 652:49.31 > 652:44.77 652:47.33 > 653:00.54 653:02.24 > 652:33.31 652:36.13 > 652:51.03 652:52.91 > 652:50.73 652:52.71 > 652:41.32 652:44.64 > 652:59.86 653:03.25 > > (kgdb) p $my->p_rux > $14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093, > rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su = > 65773652, rux_tu = 904571706136} > (kgdb) p $my->p_rux > $15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087, > rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su = > 66458587, rux_tu = 913099969825} > > As you can see rux_uu stale, but rux_uticks still ticks. I think the > problem is in calcru1(). This expression > > uu = (tu * ut) / tt > > overflows. > > I applied the following patch: > > Index: /usr/src/sys/kern/kern_resource.c > =================================================================== > --- /usr/src/sys/kern/kern_resource.c (revision 235394) > +++ /usr/src/sys/kern/kern_resource.c (working copy) > @@ -885,7 +885,7 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s > struct timeval *sp) > { > /* {user, system, interrupt, total} {ticks, usec}: */ > - uint64_t ut, uu, st, su, it, tt, tu; > + uint64_t ut, uu, st, su, it, tt, tu, tmp; > > ut = ruxp->rux_uticks; > st = ruxp->rux_sticks; > @@ -909,10 +909,20 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s > * The normal case, time increased. > * Enforce monotonicity of bucketed numbers. > */ > - uu = (tu * ut) / tt; > + if (ut == 0) > + uu = 0; > + else { > + tmp = tt / ut; > + uu = tmp ? tu / tmp : 0; > + } > if (uu < ruxp->rux_uu) > uu = ruxp->rux_uu; > > and now ran test again. This looks related to, and possibly identical to, PR kern/76972: http://www.freebsd.org/cgi/query-pr.cgi?pr=76972 If you filed a PR, please submit a follow-up to both PRs so they reference each other. Thanks, Eric
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FD60FC2.3060206>