From owner-freebsd-current@FreeBSD.ORG Mon Jun 21 07:44:37 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0B5A916A4CE; Mon, 21 Jun 2004 07:44:37 +0000 (GMT) Received: from rwcrmhc12.comcast.net (rwcrmhc12.comcast.net [216.148.227.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5E10C43D55; Mon, 21 Jun 2004 07:44:35 +0000 (GMT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([24.7.73.28]) by comcast.net (rwcrmhc12) with ESMTP id <200406210744120140020tn8e>; Mon, 21 Jun 2004 07:44:12 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id AAA30764; Mon, 21 Jun 2004 00:44:10 -0700 (PDT) Date: Mon, 21 Jun 2004 00:44:09 -0700 (PDT) From: Julian Elischer To: Bruce Evans In-Reply-To: <20040621132119.Q8596@gamplex.bde.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: threads@freebsd.org cc: Don Lewis cc: rwatson@freebsd.org cc: current@freebsd.org Subject: Re: calcru: negative time ... followed by freeze X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jun 2004 07:44:37 -0000 On Mon, 21 Jun 2004, Bruce Evans wrote: > Ah, here is a likely cause of the bug in -current: > > % if (p == curthread->td_proc) { > % /* > % * Adjust for the current time slice. This is actually fairly > % * important since the error here is on the order of a time > % * quantum, which is much greater than the sampling error. > % * XXXKSE use a different test due to threads on other > % * processors also being 'current'. > % */ > % binuptime(&bt); > % bintime_sub(&bt, PCPU_PTR(switchtime)); > % bintime_add(&bt, &p->p_runtime); > % } else > % bt = p->p_runtime; > > The XXXKSE comment is correct that this might be broken. If the (p > != curthread->td_proc) case happens at all for a running process, then > it gives a wrong (out of date) timestamp in bt. This wrongness will > be detected if calcru() is was called called earlier in the current > timeslice and took the other path here. It should be fairly easy as there is now a thread state that indicates that it is actually running now.. > > The recent change to fill_kinfo() is quite likely to trigger detection > of this bug. fill_kinfo() is often used to iterate over all processes > for ps, so it will call calcru() with (p != curthread->td_proc) for > all processes other than the one running it, and give a bt that is out > of date for all such processes that are actually running. Since there > can be at most one running process per CPU, this bug only affects SMP. > > The call to calcru() from ttyinfo() may be the only other trigger. > ttyinfo() picks a process and should rarely or never pick the ithread > running it, so it will almost always take the (p != curthread->td_proc) > path. Again, this is only a problem for the SMP case since in the !SMP > case the picked process must have been switched away from to run the > ithread, so it cannot be running. > > Bruce > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >