From owner-freebsd-current  Sat Nov 21 20:28:08 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id UAA26823
          for freebsd-current-outgoing; Sat, 21 Nov 1998 20:28:08 -0800 (PST)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.15.68.22])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA26818
          for <current@FreeBSD.ORG>; Sat, 21 Nov 1998 20:28:05 -0800 (PST)
          (envelope-from bde@godzilla.zeta.org.au)
Received: (from bde@localhost)
	by godzilla.zeta.org.au (8.8.7/8.8.7) id PAA22732;
	Sun, 22 Nov 1998 15:27:18 +1100
Date: Sun, 22 Nov 1998 15:27:18 +1100
From: Bruce Evans <bde@zeta.org.au>
Message-Id: <199811220427.PAA22732@godzilla.zeta.org.au>
To: bde@zeta.org.au, phk@critter.freebsd.dk
Subject: Re: more dying daemons
Cc: current@FreeBSD.ORG, eivind@yes.no, garman@earthling.net, terbart@aye.net
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>>>It is as predicted caused by hardclock() interrupts being disabled
>>>for far too long.  This seems to happen on some specific types of
>>>hardware, the PLIP code for the parallel port being the most readily
>>>available.
>>
>>Erm, it is caused by _non_-hardclock() interrupts being disabled for
>>for too long.
>
>No, it is caused by hardclock being called with much smaller than
>1/hz in between.

Not normally.  The normal failure mode is:

	[running at spl0()]
	tc = timecounter;
	use part of tc
		[hardware, non-hardclock interrupt]
		run in interrupt mode
			[hardclock interrupt]
			change timecounter
			on return, check if we can handle pending interrupts;
			perhaps we can, but we can't run softclock
			[perhaps another type of hardware interrupt]
		run some more in interrupt mode
			[hardclock interrupt]
			change timecounter, corrupting tc if NTIMECOUNTER = 2
			...
		finally finish hardware interrupt processing
		handle pending interrupts, including softclock
			[possibly more hardware interrupts]
				[possibly more hardclock interrupts]
		finally finish interrupt processing
	use another, now inconsistent part of tc

This can easily happen without anything being broken.  It just takes a
transient high interrupt load.

Hardclock can only be called soon after the previous call if something
is broken.  E.g., masking hardclock using splhigh() for (N - epsilon)/hz
seconds can cause 2 (not N) hardclock calls separated by about epsilon/hz
seconds. Since the number of calls is limited to 2, this bug can be
stopped from corrupting the timecounter by using NTIMECOUNTER = 3.
However, this form of the bug is unstable -- it is a small step from
running the buggy interrupt handler and hardclock for 1/hz seconds to
running in interrupt mode for 2/hz seconds.

Bruce

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message