From owner-freebsd-current Sat Nov 21 20:28:08 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id UAA26823 for freebsd-current-outgoing; Sat, 21 Nov 1998 20:28:08 -0800 (PST) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.15.68.22]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA26818 for ; Sat, 21 Nov 1998 20:28:05 -0800 (PST) (envelope-from bde@godzilla.zeta.org.au) Received: (from bde@localhost) by godzilla.zeta.org.au (8.8.7/8.8.7) id PAA22732; Sun, 22 Nov 1998 15:27:18 +1100 Date: Sun, 22 Nov 1998 15:27:18 +1100 From: Bruce Evans Message-Id: <199811220427.PAA22732@godzilla.zeta.org.au> To: bde@zeta.org.au, phk@critter.freebsd.dk Subject: Re: more dying daemons Cc: current@FreeBSD.ORG, eivind@yes.no, garman@earthling.net, terbart@aye.net Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >>>It is as predicted caused by hardclock() interrupts being disabled >>>for far too long. This seems to happen on some specific types of >>>hardware, the PLIP code for the parallel port being the most readily >>>available. >> >>Erm, it is caused by _non_-hardclock() interrupts being disabled for >>for too long. > >No, it is caused by hardclock being called with much smaller than >1/hz in between. Not normally. The normal failure mode is: [running at spl0()] tc = timecounter; use part of tc [hardware, non-hardclock interrupt] run in interrupt mode [hardclock interrupt] change timecounter on return, check if we can handle pending interrupts; perhaps we can, but we can't run softclock [perhaps another type of hardware interrupt] run some more in interrupt mode [hardclock interrupt] change timecounter, corrupting tc if NTIMECOUNTER = 2 ... finally finish hardware interrupt processing handle pending interrupts, including softclock [possibly more hardware interrupts] [possibly more hardclock interrupts] finally finish interrupt processing use another, now inconsistent part of tc This can easily happen without anything being broken. It just takes a transient high interrupt load. Hardclock can only be called soon after the previous call if something is broken. E.g., masking hardclock using splhigh() for (N - epsilon)/hz seconds can cause 2 (not N) hardclock calls separated by about epsilon/hz seconds. Since the number of calls is limited to 2, this bug can be stopped from corrupting the timecounter by using NTIMECOUNTER = 3. However, this form of the bug is unstable -- it is a small step from running the buggy interrupt handler and hardclock for 1/hz seconds to running in interrupt mode for 2/hz seconds. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message