Date: Thu, 20 Aug 1998 22:37:34 +0100 From: Brian Somers <brian@Awfulhak.org> To: Brian Feldman <green@unixhelp.org> Cc: Poul-Henning Kamp <phk@critter.freebsd.dk>, Terry Lambert <tlambert@primenet.com>, bde@zeta.org.au, freebsd-current@FreeBSD.ORG, jwd@unx.sas.com Subject: Re: 13 months of user time? Message-ID: <199808202137.WAA03270@awfulhak.org> In-Reply-To: Your message of "Thu, 20 Aug 1998 02:08:08 EDT." <Pine.BSF.4.02.9808200203190.24018-100000@zone.syracuse.net>
next in thread | previous in thread | raw e-mail | index | archive | help
> Okay, how about we try out Mike's idea? Someone who experiences the > SIGXCPU kill problem could try putting the following in kern/kern_synch.c > line 638: > if (switchtime.tv_usec < p->p_switchtime.tv_usec || > switchtime.tv_sec < p->p_switchtime.tv_sec) > panic("bogus microuptime twiddling"); I had a ``if I was going to SIGXCPU, output the above values'' diagnostic in my kernel, and in all cases, switchtime.tv_usec was less than p->p_switchtime.tv_usec (tv_sec was the same for each var). Also (just for the record), the tv_usec values were *never* >1000000. >From what I can see, and given that the tv_sec values != 0 (which my diagnostics confirmed), p->p_switchtime is being copied from switchtime in mi_switch(), and then being compared at a later point (also in mi_switch()). ``switchtime'' at this point HAS GONE BACKWARDS. This means that successive calls to microuptime() are filling the passed variables with non-increasing values. This is confirmed by the only other call to microuptime() in /sys/kern as others are seeing the ``calcru: negative time...'' error which is impossible if microuptime() only ever increases (isn't it?). *If* microuptime() is returning non-increasing values under certain circumstances, then that means that either the timecounter pointer is being mis-optimised because it's not volatile (phk has pooh-poohed that idea though - I'm not sure why, but he's probably right, as tc[1] and tc[2] are the only values that *should* be getting pointed at as actual time values), *OR* that the amount that tv_usec is adjusted by is > LONG_MAX or < 0 (I think this is impossible as tc_scale_micro is assigned as something divided by 1000) *OR* tco_delta() is returning non-increasing values...... hmm In /sys/i386/isa/clock.c, should i8254_offset be reset after it's added to ``count'' ? What happens when i8254_offset wraps ? Might this be the problem ? Would it only be a problem for machines that have an irregular clock heart-beat, sometimes allowing loads of calls to i8254_get_timecount() before clkintr() happens ?? I reckon a diagnostic in microuptime() that compares the value assigned to *tv with the previous value and moans if they decrease may prove informative.... and maybe a similar thing in i8254_get_timecount() - the machine I was having problems with was running apm, so it used the i8254 timecounter rather than the tsc counter. > And see if we get some nice panics and cores. Is it worth a shot? I've > never gotten a SIGXCPU out of place, so my machine wouldn't be the one to > test this on. Same here. The machine I had that did this was given back to the shop. > Cheers, > Brian Feldman > green@unixhelp.org -- Brian <brian@Awfulhak.org>, <brian@FreeBSD.org>, <brian@OpenBSD.org> <http://www.Awfulhak.org> Don't _EVER_ lose your sense of humour.... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808202137.WAA03270>