From owner-freebsd-threads@FreeBSD.ORG  Mon Jun 21 13:38:11 2004
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id D4C7916A4CE; Mon, 21 Jun 2004 13:38:11 +0000 (GMT)
Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 5AD2543D1D; Mon, 21 Jun 2004 13:38:11 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])i5LDcA4u019202;	Mon, 21 Jun 2004 23:38:10 +1000
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	i5LDc7nl002033;	Mon, 21 Jun 2004 23:38:08 +1000
Date: Mon, 21 Jun 2004 23:38:06 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Don Lewis <truckman@FreeBSD.org>
In-Reply-To: <20040621220455.T9194@gamplex.bde.org>
Message-ID: <20040621232654.S873@gamplex.bde.org>
References: <200406211023.i5LANcmF048049@gw.catspoiler.org>
 <20040621220455.T9194@gamplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: threads@FreeBSD.org
cc: rwatson@FreeBSD.org
cc: current@FreeBSD.org
Subject: Re: calcru: negative time ... followed by freeze
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Jun 2004 13:38:12 -0000

On Mon, 21 Jun 2004, Bruce Evans wrote:

> On Mon, 21 Jun 2004, Don Lewis wrote:
> > It looks like the bug is in the exit code.  I tweaked the printf() in
> > calcru() to print out p_state, p_flag, and p_sflag in addition to the
> > other info.  In all cases, the processes that trigger the printf were
> > zombies, and show up as [running] in ttyinfo() on a uniprocessor box.
>
> There is a PR about this (#52490).  The oops in my test fixes it for bogus
> reasons.  I'm currently adding similar printfs to help figure out what is
> going wrong.
>
> > Jun 21 03:17:03 hairball kernel: calcru: negative time of 179 usec for
> > pid 4543 (sh) p_state=0x2 p_flag=0x2002 p_sflag=0x1
> >
> > load: 0.71  cmd: sh 4543 [running] 0.00u 0.00s 3% 0k
> >
> > p_runtime only gets updated in mi_switch(), and it appears that it never
> > gets updated after the calcru() call in exit1().
>
> That explains the problem.  The calcru() values sets up to date (final)
> values for the components of ptu.  When we look at the process after it
> has become a zombie, we use only p_runtime since the process is not
> running, but p_runtime is stale.

I happened to have fixed this already without really noticing.  (My kernel
doesn't call calcru() in exit1() or in wait1(); it accumulates p_runtime
and tick counts (instead of those values converted to timevals by calcru())
in the child stats, so it has to get the final p_runtime right.)

%%%
Index: kern_exit.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_exit.c,v
retrieving revision 1.236
diff -u -2 -r1.236 kern_exit.c
--- kern_exit.c	18 Jun 2004 11:13:49 -0000	1.236
+++ kern_exit.c	21 Jun 2004 13:04:31 -0000
@@ -104,4 +104,5 @@
 exit1(struct thread *td, int rv)
 {
+ 	struct bintime new_switchtime;
 	struct proc *p, *nq, *q;
 	struct tty *tp;
@@ -518,8 +519,14 @@
 	mtx_lock_spin(&sched_lock);
 	critical_exit();
-	cnt.v_swtch++;
-	binuptime(PCPU_PTR(switchtime));
+
+	/* Do the same timestamp bookkeeping that mi_switch() would do. */
+	binuptime(&new_switchtime);
+	bintime_add(&p->p_runtime, &new_switchtime);
+	bintime_sub(&p->p_runtime, PCPU_PTR(switchtime));
+	PCPU_SET(switchtime, new_switchtime);
 	PCPU_SET(switchticks, ticks);

+	cnt.v_swtch++;
+
 	/*
 	 * Allow the scheduler to adjust the priority of the
%%%

I will commit this soon.

Workaround for the main problem:

%%%
Index: kern_resource.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_resource.c,v
retrieving revision 1.139
diff -u -2 -r1.139 kern_resource.c
--- kern_resource.c	16 Jun 2004 00:26:29 -0000	1.139
+++ kern_resource.c	21 Jun 2004 12:55:24 -0000
@@ -702,8 +702,10 @@
 	struct timeval *ip;
 {
-	struct bintime bt;
+	struct bintime bt, rt;
 	struct timeval tv;
+	struct thread *td;
 	/* {user, system, interrupt, total} {ticks, usec}; previous tu: */
 	u_int64_t ut, uu, st, su, it, iu, tt, tu, ptu;
+	int problemcase;

 	mtx_assert(&sched_lock, MA_OWNED);
@@ -719,22 +721,40 @@
 		tt = 1;
 	}
-	if (p == curthread->td_proc) {
+	rt = p->p_runtime;
+	problemcase = 0;
+	FOREACH_THREAD_IN_PROC(p, td) {
 		/*
 		 * Adjust for the current time slice.  This is actually fairly
 		 * important since the error here is on the order of a time
 		 * quantum, which is much greater than the sampling error.
-		 * XXXKSE use a different test due to threads on other
-		 * processors also being 'current'.
 		 */
-		binuptime(&bt);
-		bintime_sub(&bt, PCPU_PTR(switchtime));
-		bintime_add(&bt, &p->p_runtime);
-	} else
-		bt = p->p_runtime;
-	bintime2timeval(&bt, &tv);
+		if (td == curthread) {
+			binuptime(&bt);
+			bintime_sub(&bt, PCPU_PTR(switchtime));
+			bintime_add(&rt, &bt);
+		} else if (TD_IS_RUNNING(td)) {
+			/*
+			 * This case should add the current time less the
+			 * switch time as above, but the switch time is
+			 * inaccessible.  So we might end up with rt too
+			 * small and then the monotonicity check might detect
+			 * the problem.  Just set a flag to avoid warning
+			 * about this known problem.
+			 */
+			problemcase = 1;
+		}
+	}
+	bintime2timeval(&rt, &tv);
 	tu = (u_int64_t)tv.tv_sec * 1000000 + tv.tv_usec;
 	ptu = p->p_uu + p->p_su + p->p_iu;
-	if (tu < ptu || (int64_t)tu < 0) {
-		printf("calcru: negative time of %jd usec for pid %d (%s)\n",
+	if (tu < ptu) {
+		if (!problemcase)
+			printf(
+"calcru: runtime went backwards from %ju usec to %ju usec for pid %d (%s)\n",
+			    (uintmax_t)ptu, (uintmax_t)tu, p->p_pid, p->p_comm);
+		tu = ptu;
+	}
+	if ((int64_t)tu < 0) {
+		printf("calcru: negative runtime of %jd usec for pid %d (%s)\n",
 		    (intmax_t)tu, p->p_pid, p->p_comm);
 		tu = ptu;
%%%

Bruce