Date: Thu, 31 May 2012 11:34:43 +0400 From: Andrey Zonov <andrey@zonov.org> To: freebsd-hackers@freebsd.org Subject: Re: usertime stale at about 371k seconds Message-ID: <4FC71F13.6040008@zonov.org> In-Reply-To: <4FC6748B.5030708@zonov.org> References: <4FC6748B.5030708@zonov.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------080506020903050202040100 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 5/30/12 11:27 PM, Andrey Zonov wrote: > Hi, > > I have long running process for which `ps -o usertime -p $pid' shows > always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same > process continue to grow and now it's 21538:53.61. It looks like > overflow in resource usage code or something. > I reproduced that problem with attached program. I ran it with 23 threads on machine with 24 CPUs and after night I see this: $ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p 24134 USERTIME TIME 6351:24.74 14977:35.19 USERTIME TIME 6351:24.74 15000:34.53 Per thread user-time counts correct: $ ps -H -o usertime,time -p 24134 USERTIME TIME 0:00.00 0:00.00 652:35.84 652:38.59 652:34.75 652:37.97 652:50.46 652:51.97 652:38.93 652:43.08 652:39.73 652:43.36 652:44.09 652:47.36 652:56.49 652:57.94 652:51.84 652:54.41 652:37.48 652:41.57 652:36.61 652:40.90 652:39.41 652:42.52 653:03.72 653:06.72 652:49.96 652:53.25 652:45.92 652:49.03 652:40.33 652:42.05 652:46.53 652:49.31 652:44.77 652:47.33 653:00.54 653:02.24 652:33.31 652:36.13 652:51.03 652:52.91 652:50.73 652:52.71 652:41.32 652:44.64 652:59.86 653:03.25 (kgdb) p $my->p_rux $14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093, rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su = 65773652, rux_tu = 904571706136} (kgdb) p $my->p_rux $15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087, rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su = 66458587, rux_tu = 913099969825} As you can see rux_uu stale, but rux_uticks still ticks. I think the problem is in calcru1(). This expression uu = (tu * ut) / tt overflows. I applied the following patch: Index: /usr/src/sys/kern/kern_resource.c =================================================================== --- /usr/src/sys/kern/kern_resource.c (revision 235394) +++ /usr/src/sys/kern/kern_resource.c (working copy) @@ -885,7 +885,7 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s struct timeval *sp) { /* {user, system, interrupt, total} {ticks, usec}: */ - uint64_t ut, uu, st, su, it, tt, tu; + uint64_t ut, uu, st, su, it, tt, tu, tmp; ut = ruxp->rux_uticks; st = ruxp->rux_sticks; @@ -909,10 +909,20 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s * The normal case, time increased. * Enforce monotonicity of bucketed numbers. */ - uu = (tu * ut) / tt; + if (ut == 0) + uu = 0; + else { + tmp = tt / ut; + uu = tmp ? tu / tmp : 0; + } if (uu < ruxp->rux_uu) uu = ruxp->rux_uu; and now ran test again. -- Andrey Zonov --------------080506020903050202040100 Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0"; name="usertime.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="usertime.c" /* * Andrey Zonov (c) 2012 */ #include <err.h> #include <pthread.h> #include <stdlib.h> void *func(void *arg); int main(int argc, char **argv) { int i; int threads; int *tid; pthread_t *tds; if (argc != 2) errx(1, "usage: usertime <threads>"); threads = atoi(argv[1]); tid = malloc(sizeof(int) * threads); tds = malloc(sizeof(pthread_t) * threads); for (i = 0; i < threads; i++) { tid[i] = i; if (pthread_create(&tds[i], NULL, func, &tid[i]) != 0) err(1, "pthread_create(%d)", i); } for (i = 0; i < threads; i++) if (pthread_join(tds[i], NULL) != 0) err(1, "pthread_join(%d)", i); exit(0); } void * func(void *arg __unused) { int i; #define MAX (1<<20) for (i = 0; i < MAX; i++) { if ((i % (MAX - 1)) == 0) { i = 0; /*usleep(1);*/ } } pthread_exit(NULL); } --------------080506020903050202040100--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FC71F13.6040008>