Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Nov 1998 15:27:18 +1100
From:      Bruce Evans <bde@zeta.org.au>
To:        bde@zeta.org.au, phk@critter.freebsd.dk
Cc:        current@FreeBSD.ORG, eivind@yes.no, garman@earthling.net, terbart@aye.net
Subject:   Re: more dying daemons
Message-ID:  <199811220427.PAA22732@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
>>>It is as predicted caused by hardclock() interrupts being disabled
>>>for far too long.  This seems to happen on some specific types of
>>>hardware, the PLIP code for the parallel port being the most readily
>>>available.
>>
>>Erm, it is caused by _non_-hardclock() interrupts being disabled for
>>for too long.
>
>No, it is caused by hardclock being called with much smaller than
>1/hz in between.

Not normally.  The normal failure mode is:

	[running at spl0()]
	tc = timecounter;
	use part of tc
		[hardware, non-hardclock interrupt]
		run in interrupt mode
			[hardclock interrupt]
			change timecounter
			on return, check if we can handle pending interrupts;
			perhaps we can, but we can't run softclock
			[perhaps another type of hardware interrupt]
		run some more in interrupt mode
			[hardclock interrupt]
			change timecounter, corrupting tc if NTIMECOUNTER = 2
			...
		finally finish hardware interrupt processing
		handle pending interrupts, including softclock
			[possibly more hardware interrupts]
				[possibly more hardclock interrupts]
		finally finish interrupt processing
	use another, now inconsistent part of tc

This can easily happen without anything being broken.  It just takes a
transient high interrupt load.

Hardclock can only be called soon after the previous call if something
is broken.  E.g., masking hardclock using splhigh() for (N - epsilon)/hz
seconds can cause 2 (not N) hardclock calls separated by about epsilon/hz
seconds. Since the number of calls is limited to 2, this bug can be
stopped from corrupting the timecounter by using NTIMECOUNTER = 3.
However, this form of the bug is unstable -- it is a small step from
running the buggy interrupt handler and hardclock for 1/hz seconds to
running in interrupt mode for 2/hz seconds.

Bruce

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811220427.PAA22732>