From owner-freebsd-threads@FreeBSD.ORG  Mon Jun 21 10:24:02 2004
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0092E16A4CE; Mon, 21 Jun 2004 10:24:02 +0000 (GMT)
Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 6F9BC43D53; Mon, 21 Jun 2004 10:24:01 +0000 (GMT)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.12.11/8.12.11) with ESMTP id i5LANcmF048049;
	Mon, 21 Jun 2004 03:23:43 -0700 (PDT)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200406211023.i5LANcmF048049@gw.catspoiler.org>
Date: Mon, 21 Jun 2004 03:23:38 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
To: bde@zeta.org.au
In-Reply-To: <200406210801.i5L81K5u047553@gw.catspoiler.org>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
cc: threads@FreeBSD.org
cc: rwatson@FreeBSD.org
cc: current@FreeBSD.org
Subject: Re: calcru: negative time ... followed by freeze
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Jun 2004 10:24:02 -0000

On 21 Jun, Don Lewis wrote:
> On 21 Jun, Bruce Evans wrote:
>> On Sun, 20 Jun 2004, Don Lewis wrote:
>> 
>>> On 20 Jun, Robert Watson wrote:
>>> > FYI, this is a Xeon box with two physical processors, each with two
>>> > logical processors, and the problem could well be SMP-related.
>>>
>>> The hang might be SMP-related, but I just started getting the calcru
>>> messages on my UP Athlon XP box.  I just upgraded to today's
>>> -CURRENT from a week-old version, and my console got spammed by a bunch
>>> of these messages while I was running portupgrade.
>>>
>>> calcru: negative time of 0 usec for pid 21261 (sh)
>>> calcru: negative time of 0 usec for pid 21261 (sh)
>>> ...
>>> calcru: negative time of 3917 usec for pid 49504 (sh)
>>> calcru: negative time of 3917 usec for pid 49504 (sh)
>> 
>> Hmm, these are nonnegative times.
>> 
>> The extra messages in -current might be caused by fill_kinfo() now calling
>> calcru().
>> 
>> Misreporting of negative times is by the following too-simple dianostic:
>> 
>> % 	tu = (u_int64_t)tv.tv_sec * 1000000 + tv.tv_usec;
>> % 	ptu = p->p_uu + p->p_su + p->p_iu;
>> % 	if (tu < ptu || (int64_t)tu < 0) {
>> % 		printf("calcru: negative time of %jd usec for pid %d (%s)\n",
>> % 		    (intmax_t)tu, p->p_pid, p->p_comm);
>> % 		tu = ptu;
>> % 	}
>> 
>> The message is also printed for the tu < ptu case, which is what you are
>> getting.
>> 
>> I fixed the messages when I got a lot of them due to a local bug.  The local
>> bug was from double rounding for calcru() on child times (which -current
>> doesn't do).  I can't see how this could be the problem in -current, since
>> the components of ptu are rounded down and there is a KASSERT that they
>> added up to less than tu in the previous call.
>> 
>> Ah, here is a likely cause of the bug in -current:
>> 
>> % 	if (p == curthread->td_proc) {
>> % 		/*
>> % 		 * Adjust for the current time slice.  This is actually fairly
>> % 		 * important since the error here is on the order of a time
>> % 		 * quantum, which is much greater than the sampling error.
>> % 		 * XXXKSE use a different test due to threads on other
>> % 		 * processors also being 'current'.
>> % 		 */
>> % 		binuptime(&bt);
>> % 		bintime_sub(&bt, PCPU_PTR(switchtime));
>> % 		bintime_add(&bt, &p->p_runtime);
>> % 	} else
>> % 		bt = p->p_runtime;
>> 
>> The XXXKSE comment is correct that this might be broken.  If the (p
>> != curthread->td_proc) case happens at all for a running process, then
>> it gives a wrong (out of date) timestamp in bt.  This wrongness will
>> be detected if calcru() is was called called earlier in the current
>> timeslice and took the other path here.
>> 
>> The recent change to fill_kinfo() is quite likely to trigger detection
>> of this bug.  fill_kinfo() is often used to iterate over all processes
>> for ps, so it will call calcru() with (p != curthread->td_proc) for
>> all processes other than the one running it, and give a bt that is out
>> of date for all such processes that are actually running.  Since there
>> can be at most one running process per CPU, this bug only affects SMP.
>> 
>> The call to calcru() from ttyinfo() may be the only other trigger.
>> ttyinfo() picks a process and should rarely or never pick the ithread
>> running it, so it will almost always take the (p != curthread->td_proc)
>> path.  Again, this is only a problem for the SMP case since in the !SMP
>> case the picked process must have been switched away from to run the
>> ithread, so it cannot be running.

It looks like another way to trigger this in the SMP case would be to
have two threads of the same process running at the same time, and for
the second thread to call calcru() to have been running for a shorter
period of time than when the first thread called calcru().

In the SMP cases, it probably makes sense to just silently to do
	if (tu < ptu)
		tu = ptu
because of the complications of attempting to do an accurate
calculation.


> There must be some !SMP trigger for this as well.  I just checked and I
> was able to trigger this on my Pentium-M laptop as well by leaning on
> the ^T key while I was logged on via ssh and running 'portupgrade -aP'.
> 
> Jun 21 00:41:31 hairball kernel: calcru: negative time of 23169 usec for pid 44653 (sh)
> Jun 21 00:41:32 hairball kernel: calcru: negative time of 21990 usec for pid 44665 (sh)
> 
> I didn't use ^T on my Athlon box.  I might have had top running, though.
> 
> It's interesting that this bug only seems to get triggered on /bin/sh.
> Maybe it is fork()/exit()/wait() related?

It looks like the bug is in the exit code.  I tweaked the printf() in
calcru() to print out p_state, p_flag, and p_sflag in addition to the
other info.  In all cases, the processes that trigger the printf were
zombies, and show up as [running] in ttyinfo() on a uniprocessor box.

Jun 21 03:17:03 hairball kernel: calcru: negative time of 179 usec for
pid 4543 (sh) p_state=0x2 p_flag=0x2002 p_sflag=0x1

load: 0.71  cmd: sh 4543 [running] 0.00u 0.00s 3% 0k


p_runtime only gets updated in mi_switch(), and it appears that it never
gets updated after the calcru() call in exit1().

It also looks like a bug that a zombie remains in the [running] state
and thus looks interesting to ttyinfo().