Date: Thu, 20 Aug 1998 22:37:34 +0100 From: Brian Somers <brian@Awfulhak.org> To: Brian Feldman <green@unixhelp.org> Cc: Poul-Henning Kamp <phk@critter.freebsd.dk>, Terry Lambert <tlambert@primenet.com>, bde@zeta.org.au, freebsd-current@FreeBSD.ORG, jwd@unx.sas.com Subject: Re: 13 months of user time? Message-ID: <199808202137.WAA03270@awfulhak.org> In-Reply-To: Your message of "Thu, 20 Aug 1998 02:08:08 EDT." <Pine.BSF.4.02.9808200203190.24018-100000@zone.syracuse.net>
next in thread | previous in thread | raw e-mail | index | archive | help
> Okay, how about we try out Mike's idea? Someone who experiences the
> SIGXCPU kill problem could try putting the following in kern/kern_synch.c
> line 638:
> if (switchtime.tv_usec < p->p_switchtime.tv_usec ||
> switchtime.tv_sec < p->p_switchtime.tv_sec)
> panic("bogus microuptime twiddling");
I had a ``if I was going to SIGXCPU, output the above values''
diagnostic in my kernel, and in all cases, switchtime.tv_usec was
less than p->p_switchtime.tv_usec (tv_sec was the same for each var).
Also (just for the record), the tv_usec values were *never* >1000000.
>From what I can see, and given that the tv_sec values != 0 (which my
diagnostics confirmed), p->p_switchtime is being copied from
switchtime in mi_switch(), and then being compared at a later point
(also in mi_switch()). ``switchtime'' at this point HAS GONE
BACKWARDS. This means that successive calls to microuptime() are
filling the passed variables with non-increasing values. This is
confirmed by the only other call to microuptime() in /sys/kern as
others are seeing the ``calcru: negative time...'' error which is
impossible if microuptime() only ever increases (isn't it?).
*If* microuptime() is returning non-increasing values under certain
circumstances, then that means that either the timecounter pointer is
being mis-optimised because it's not volatile (phk has pooh-poohed
that idea though - I'm not sure why, but he's probably right, as
tc[1] and tc[2] are the only values that *should* be getting pointed
at as actual time values), *OR* that the amount that tv_usec
is adjusted by is > LONG_MAX or < 0 (I think this is impossible as
tc_scale_micro is assigned as something divided by 1000) *OR*
tco_delta() is returning non-increasing values...... hmm
In /sys/i386/isa/clock.c, should i8254_offset be reset after it's
added to ``count'' ? What happens when i8254_offset wraps ? Might
this be the problem ? Would it only be a problem for machines that
have an irregular clock heart-beat, sometimes allowing loads of calls
to i8254_get_timecount() before clkintr() happens ??
I reckon a diagnostic in microuptime() that compares the value
assigned to *tv with the previous value and moans if they decrease
may prove informative.... and maybe a similar thing in
i8254_get_timecount() - the machine I was having problems with was
running apm, so it used the i8254 timecounter rather than the tsc
counter.
> And see if we get some nice panics and cores. Is it worth a shot? I've
> never gotten a SIGXCPU out of place, so my machine wouldn't be the one to
> test this on.
Same here. The machine I had that did this was given back to the
shop.
> Cheers,
> Brian Feldman
> green@unixhelp.org
--
Brian <brian@Awfulhak.org>, <brian@FreeBSD.org>, <brian@OpenBSD.org>
<http://www.Awfulhak.org>
Don't _EVER_ lose your sense of humour....
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808202137.WAA03270>
