From owner-freebsd-current  Thu Aug 20 14:44:23 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id OAA00480
          for freebsd-current-outgoing; Thu, 20 Aug 1998 14:44:23 -0700 (PDT)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from awfulhak.org (awfulhak.force9.co.uk [195.166.136.63])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA00372
          for <freebsd-current@FreeBSD.ORG>; Thu, 20 Aug 1998 14:44:09 -0700 (PDT)
          (envelope-from brian@Awfulhak.org)
Received: from gate.lan.awfulhak.org (brian@localhost [127.0.0.1])
	by awfulhak.org (8.8.8/8.8.8) with ESMTP id WAA03270;
	Thu, 20 Aug 1998 22:37:35 +0100 (BST)
	(envelope-from brian@gate.lan.awfulhak.org)
Message-Id: <199808202137.WAA03270@awfulhak.org>
X-Mailer: exmh version 2.0.2 2/24/98
To: Brian Feldman <green@unixhelp.org>
cc: Poul-Henning Kamp <phk@critter.freebsd.dk>,
        Terry Lambert <tlambert@primenet.com>, bde@zeta.org.au,
        freebsd-current@FreeBSD.ORG, jwd@unx.sas.com
Subject: Re: 13 months of user time? 
In-reply-to: Your message of "Thu, 20 Aug 1998 02:08:08 EDT."
             <Pine.BSF.4.02.9808200203190.24018-100000@zone.syracuse.net> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 20 Aug 1998 22:37:34 +0100
From: Brian Somers <brian@Awfulhak.org>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> Okay, how about we try out Mike's idea? Someone who experiences the
> SIGXCPU kill problem could try putting the following in kern/kern_synch.c
> line 638:
> if (switchtime.tv_usec < p->p_switchtime.tv_usec ||
>     switchtime.tv_sec < p->p_switchtime.tv_sec)
> 	panic("bogus microuptime twiddling");

I had a ``if I was going to SIGXCPU, output the above values'' 
diagnostic in my kernel, and in all cases, switchtime.tv_usec was 
less than p->p_switchtime.tv_usec (tv_sec was the same for each var). 
Also (just for the record), the tv_usec values were *never* >1000000.

>From what I can see, and given that the tv_sec values != 0 (which my 
diagnostics confirmed), p->p_switchtime is being copied from 
switchtime in mi_switch(), and then being compared at a later point 
(also in mi_switch()).  ``switchtime'' at this point HAS GONE 
BACKWARDS.  This means that successive calls to microuptime() are 
filling the passed variables with non-increasing values.  This is 
confirmed by the only other call to microuptime() in /sys/kern as 
others are seeing the ``calcru: negative time...'' error which is 
impossible if microuptime() only ever increases (isn't it?).

*If* microuptime() is returning non-increasing values under certain 
circumstances, then that means that either the timecounter pointer is 
being mis-optimised because it's not volatile (phk has pooh-poohed 
that idea though - I'm not sure why, but he's probably right, as 
tc[1] and tc[2] are the only values that *should* be getting pointed 
at as actual time values), *OR* that the amount that tv_usec 
is adjusted by is > LONG_MAX or < 0 (I think this is impossible as 
tc_scale_micro is assigned as something divided by 1000) *OR* 
tco_delta() is returning non-increasing values...... hmm

In /sys/i386/isa/clock.c, should i8254_offset be reset after it's 
added to ``count'' ?  What happens when i8254_offset wraps ?  Might 
this be the problem ?  Would it only be a problem for machines that 
have an irregular clock heart-beat, sometimes allowing loads of calls 
to i8254_get_timecount() before clkintr() happens ??

I reckon a diagnostic in microuptime() that compares the value 
assigned to *tv with the previous value and moans if they decrease 
may prove informative.... and maybe a similar thing in 
i8254_get_timecount() - the machine I was having problems with was 
running apm, so it used the i8254 timecounter rather than the tsc 
counter.

> And see if we get some nice panics and cores. Is it worth a shot? I've
> never gotten a SIGXCPU out of place, so my machine wouldn't be the one to
> test this on.

Same here.  The machine I had that did this was given back to the 
shop.

> Cheers,
> Brian Feldman
> green@unixhelp.org

-- 
Brian <brian@Awfulhak.org>, <brian@FreeBSD.org>, <brian@OpenBSD.org>
      <http://www.Awfulhak.org>
Don't _EVER_ lose your sense of humour....


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message