From owner-freebsd-current Thu Aug 20 14:44:23 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id OAA00480 for freebsd-current-outgoing; Thu, 20 Aug 1998 14:44:23 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from awfulhak.org (awfulhak.force9.co.uk [195.166.136.63]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA00372 for ; Thu, 20 Aug 1998 14:44:09 -0700 (PDT) (envelope-from brian@Awfulhak.org) Received: from gate.lan.awfulhak.org (brian@localhost [127.0.0.1]) by awfulhak.org (8.8.8/8.8.8) with ESMTP id WAA03270; Thu, 20 Aug 1998 22:37:35 +0100 (BST) (envelope-from brian@gate.lan.awfulhak.org) Message-Id: <199808202137.WAA03270@awfulhak.org> X-Mailer: exmh version 2.0.2 2/24/98 To: Brian Feldman cc: Poul-Henning Kamp , Terry Lambert , bde@zeta.org.au, freebsd-current@FreeBSD.ORG, jwd@unx.sas.com Subject: Re: 13 months of user time? In-reply-to: Your message of "Thu, 20 Aug 1998 02:08:08 EDT." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 20 Aug 1998 22:37:34 +0100 From: Brian Somers Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Okay, how about we try out Mike's idea? Someone who experiences the > SIGXCPU kill problem could try putting the following in kern/kern_synch.c > line 638: > if (switchtime.tv_usec < p->p_switchtime.tv_usec || > switchtime.tv_sec < p->p_switchtime.tv_sec) > panic("bogus microuptime twiddling"); I had a ``if I was going to SIGXCPU, output the above values'' diagnostic in my kernel, and in all cases, switchtime.tv_usec was less than p->p_switchtime.tv_usec (tv_sec was the same for each var). Also (just for the record), the tv_usec values were *never* >1000000. >From what I can see, and given that the tv_sec values != 0 (which my diagnostics confirmed), p->p_switchtime is being copied from switchtime in mi_switch(), and then being compared at a later point (also in mi_switch()). ``switchtime'' at this point HAS GONE BACKWARDS. This means that successive calls to microuptime() are filling the passed variables with non-increasing values. This is confirmed by the only other call to microuptime() in /sys/kern as others are seeing the ``calcru: negative time...'' error which is impossible if microuptime() only ever increases (isn't it?). *If* microuptime() is returning non-increasing values under certain circumstances, then that means that either the timecounter pointer is being mis-optimised because it's not volatile (phk has pooh-poohed that idea though - I'm not sure why, but he's probably right, as tc[1] and tc[2] are the only values that *should* be getting pointed at as actual time values), *OR* that the amount that tv_usec is adjusted by is > LONG_MAX or < 0 (I think this is impossible as tc_scale_micro is assigned as something divided by 1000) *OR* tco_delta() is returning non-increasing values...... hmm In /sys/i386/isa/clock.c, should i8254_offset be reset after it's added to ``count'' ? What happens when i8254_offset wraps ? Might this be the problem ? Would it only be a problem for machines that have an irregular clock heart-beat, sometimes allowing loads of calls to i8254_get_timecount() before clkintr() happens ?? I reckon a diagnostic in microuptime() that compares the value assigned to *tv with the previous value and moans if they decrease may prove informative.... and maybe a similar thing in i8254_get_timecount() - the machine I was having problems with was running apm, so it used the i8254 timecounter rather than the tsc counter. > And see if we get some nice panics and cores. Is it worth a shot? I've > never gotten a SIGXCPU out of place, so my machine wouldn't be the one to > test this on. Same here. The machine I had that did this was given back to the shop. > Cheers, > Brian Feldman > green@unixhelp.org -- Brian , , Don't _EVER_ lose your sense of humour.... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message