Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 31 May 2012 11:34:43 +0400
From:      Andrey Zonov <andrey@zonov.org>
To:        freebsd-hackers@freebsd.org
Subject:   Re: usertime stale at about 371k seconds
Message-ID:  <4FC71F13.6040008@zonov.org>
In-Reply-To: <4FC6748B.5030708@zonov.org>
References:  <4FC6748B.5030708@zonov.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------080506020903050202040100
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

On 5/30/12 11:27 PM, Andrey Zonov wrote:
> Hi,
>
> I have long running process for which `ps -o usertime -p $pid' shows
> always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same
> process continue to grow and now it's 21538:53.61. It looks like
> overflow in resource usage code or something.
>

I reproduced that problem with attached program.  I ran it with 23 
threads on machine with 24 CPUs and after night I see this:

$ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p 24134
   USERTIME        TIME
6351:24.74 14977:35.19
   USERTIME        TIME
6351:24.74 15000:34.53

Per thread user-time counts correct:

$ ps -H -o usertime,time -p 24134
  USERTIME      TIME
   0:00.00   0:00.00
652:35.84 652:38.59
652:34.75 652:37.97
652:50.46 652:51.97
652:38.93 652:43.08
652:39.73 652:43.36
652:44.09 652:47.36
652:56.49 652:57.94
652:51.84 652:54.41
652:37.48 652:41.57
652:36.61 652:40.90
652:39.41 652:42.52
653:03.72 653:06.72
652:49.96 652:53.25
652:45.92 652:49.03
652:40.33 652:42.05
652:46.53 652:49.31
652:44.77 652:47.33
653:00.54 653:02.24
652:33.31 652:36.13
652:51.03 652:52.91
652:50.73 652:52.71
652:41.32 652:44.64
652:59.86 653:03.25

(kgdb) p $my->p_rux
$14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093, 
rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su = 
65773652, rux_tu = 904571706136}
(kgdb) p $my->p_rux
$15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087, 
rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su = 
66458587, rux_tu = 913099969825}

As you can see rux_uu stale, but rux_uticks still ticks.  I think the 
problem is in calcru1().  This expression

uu = (tu * ut) / tt

overflows.

I applied the following patch:

Index: /usr/src/sys/kern/kern_resource.c
===================================================================
--- /usr/src/sys/kern/kern_resource.c	(revision 235394)
+++ /usr/src/sys/kern/kern_resource.c	(working copy)
@@ -885,7 +885,7 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
      struct timeval *sp)
  {
  	/* {user, system, interrupt, total} {ticks, usec}: */
-	uint64_t ut, uu, st, su, it, tt, tu;
+	uint64_t ut, uu, st, su, it, tt, tu, tmp;

  	ut = ruxp->rux_uticks;
  	st = ruxp->rux_sticks;
@@ -909,10 +909,20 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
  		 * The normal case, time increased.
  		 * Enforce monotonicity of bucketed numbers.
  		 */
-		uu = (tu * ut) / tt;
+		if (ut == 0)
+			uu = 0;
+		else {
+			tmp = tt / ut;
+			uu = tmp ? tu / tmp : 0;
+		}
  		if (uu < ruxp->rux_uu)
  			uu = ruxp->rux_uu;

and now ran test again.

-- 
Andrey Zonov

--------------080506020903050202040100
Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0";
	name="usertime.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="usertime.c"

/*
 * Andrey Zonov (c) 2012
 */

#include <err.h>
#include <pthread.h>
#include <stdlib.h>

void *func(void *arg);

int
main(int argc, char **argv)
{
	int i;
	int threads;
	int *tid;
	pthread_t *tds;

	if (argc != 2)
		errx(1, "usage: usertime <threads>");

	threads = atoi(argv[1]);
	tid = malloc(sizeof(int) * threads);
	tds = malloc(sizeof(pthread_t) * threads);

	for (i = 0; i < threads; i++) {
		tid[i] = i;
		if (pthread_create(&tds[i], NULL, func, &tid[i]) != 0)
			err(1, "pthread_create(%d)", i);
	}

	for (i = 0; i < threads; i++)
		if (pthread_join(tds[i], NULL) != 0)
			err(1, "pthread_join(%d)", i);

	exit(0);
}

void *
func(void *arg __unused)
{
	int i;

#define	MAX	(1<<20)

	for (i = 0; i < MAX; i++) {
		if ((i % (MAX - 1)) == 0) {
			i = 0;
			/*usleep(1);*/
		}
	}

	pthread_exit(NULL);
}

--------------080506020903050202040100--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FC71F13.6040008>