Date: Sun, 8 Nov 1998 19:16:28 +1100 From: Bruce Evans <bde@zeta.org.au> To: phk@critter.freebsd.dk, wpaul@skynet.ctr.columbia.edu Cc: current@FreeBSD.ORG Subject: Re: Grrr... calcru: negative time blah blah blah Message-ID: <199811080816.TAA18086@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>> >debug.tc_diag_buffer: 11932 11932 11932 11937 [...]
>> >[...] 11932 11932 49982 50184 50259 11934 11930 [...]
>>
>> This is bad, really really bad. Best case sequence is:
>>
>> 11932: hardclock()
>> 11932: hardclock()
>> 49982: {micro|nano}[run]time()
>> 50184: {micro|nano}[run]time()
>> 50259: hardclock()
>> 11934: hardclock()
>>
>> At least 4 calls to hardclock() is missing here.
>
>Hm...
Timecounter strangeness seems to be only a symptom here. I've seen
similar strangeness caused by ddb masking interrupts. However,
{micro|nano}[run]time() only have obvious problems when tco_delta()
overflows a u_int. Overflow occurs after only about 10 seconds for
a 400MHz tsc timecounter, but not for one hour for an i8254 timecounter.
>> Does the problem also exist for a !SMP case ?
>
>Okay, I built a kernel with no apm0 device and tried again. It seems
>that now I don't get the calcru error messages and the X server actually
>runs without exploding. No other processes die, at least not during
>the time I had it running. However, the system did get sluggish again
>after the X server started.
>sysctl kern.timecounter shows this:
>kern.timecounter.frequency: 448623175
>kern.timecounter.adjustment: 0
"Lost" (probably actually blocked) interrupts account for the sluggishness,
and the the timecounter problems apparently don't occur because the tsc
timecounter advances even when you don't look at it.
>Running sysctl debug _BEFORE_ triggering the problem by running the
>X server shows this:
>...
>debug.tc_diag_maxforward: -692379806
This apparently gets set to a bad value during initialization. There
must be some sign extension bugs for a negative value to be the maximum.
>Running sysctl debug _AFTER_ running and stopping the X server shows
>this:
>
>debug.elf_trace: 0
>debug.tc_diag_buffer: 24640675 24640697 24640635 24640686 24640672 24640673 24641091 4938 12316 24640342 24640621 24640649 24640659 24640686 19566403 19641525 19642603 19757035 20530799 20631303 20644237 20684791 20703487 24641328 24640255 24640435 246410
>77 24640268 24640686 24640690 24640656 24640671 24640659 24640790 24640596 24641023 24640295 24640672 24640673 24640672 24640673 24640686 24640659 24640699 24641102 24640216 24640686 24640694 24640638 24640686 24640672 24640673 24640686 24641077 24640254
>Everything seems to have been divided in half. The machine still runs
I think that's just because one of {micro|nano}[run]time() is now called
about twice per clock tick.
>Isn't there any way I can mask a particular interrupt so the dispatcher
>just ignores it? Not a great fix I grant you, but it would help prove
>the theory.
intr_handler[n] = null_routine;
Bruce
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811080816.TAA18086>
