Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 8 Nov 1998 19:16:28 +1100
From:      Bruce Evans <bde@zeta.org.au>
To:        phk@critter.freebsd.dk, wpaul@skynet.ctr.columbia.edu
Cc:        current@FreeBSD.ORG
Subject:   Re: Grrr... calcru: negative time blah blah blah
Message-ID:  <199811080816.TAA18086@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
>> >debug.tc_diag_buffer: 11932 11932 11932 11937 [...]
>> >[...] 11932 11932 49982 50184 50259 11934 11930 [...]
>> 
>> This is bad, really really bad.  Best case sequence is:
>> 
>> 	11932: hardclock()
>> 	11932: hardclock()
>> 	49982: {micro|nano}[run]time()
>> 	50184: {micro|nano}[run]time()
>> 	50259: hardclock()
>> 	11934: hardclock()
>> 
>> At least 4 calls to hardclock() is missing here.  
>
>Hm...

Timecounter strangeness seems to be only a symptom here.  I've seen
similar strangeness caused by ddb masking interrupts.  However,
{micro|nano}[run]time() only have obvious problems when tco_delta()
overflows a u_int.   Overflow occurs after only about 10 seconds for
a 400MHz tsc timecounter, but not for one hour for an i8254 timecounter.

>> Does the problem also exist for a !SMP case ?
>
>Okay, I built a kernel with no apm0 device and tried again. It seems
>that now I don't get the calcru error messages and the X server actually
>runs without exploding. No other processes die, at least not during
>the time I had it running. However, the system did get sluggish again
>after the X server started.

>sysctl kern.timecounter shows this:
>kern.timecounter.frequency: 448623175
>kern.timecounter.adjustment: 0

"Lost" (probably actually blocked) interrupts account for the sluggishness,
and the the timecounter problems apparently don't occur because the tsc
timecounter advances even when you don't look at it.

>Running sysctl debug _BEFORE_ triggering the problem by running the
>X server shows this:
>...
>debug.tc_diag_maxforward: -692379806

This apparently gets set to a bad value during initialization.  There
must be some sign extension bugs for a negative value to be the maximum.

>Running sysctl debug _AFTER_ running and stopping the X server shows
>this:
>
>debug.elf_trace: 0
>debug.tc_diag_buffer: 24640675 24640697 24640635 24640686 24640672 24640673 24641091 4938 12316 24640342 24640621 24640649 24640659 24640686 19566403 19641525 19642603 19757035 20530799 20631303 20644237 20684791 20703487 24641328 24640255 24640435 246410
>77 24640268 24640686 24640690 24640656 24640671 24640659 24640790 24640596 24641023 24640295 24640672 24640673 24640672 24640673 24640686 24640659 24640699 24641102 24640216 24640686 24640694 24640638 24640686 24640672 24640673 24640686 24641077 24640254 

>Everything seems to have been divided in half. The machine still runs

I think that's just because one of {micro|nano}[run]time() is now called
about twice per clock tick.

>Isn't there any way I can mask a particular interrupt so the dispatcher
>just ignores it? Not a great fix I grant you, but it would help prove
>the theory.

	intr_handler[n] = null_routine;

Bruce

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811080816.TAA18086>