From owner-freebsd-current  Fri Nov 17 18:50:17 2000
Delivered-To: freebsd-current@freebsd.org
Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1])
	by hub.freebsd.org (Postfix) with ESMTP id 7921437B4C5
	for <current@freebsd.org>; Fri, 17 Nov 2000 18:50:14 -0800 (PST)
Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30])
	by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id VAA21976
	for <current@freebsd.org>; Fri, 17 Nov 2000 21:50:13 -0500 (EST)
Received: (from gallatin@localhost)
	by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eAI2oDe51018;
	Fri, 17 Nov 2000 21:50:13 -0500 (EST)
	(envelope-from gallatin@cs.duke.edu)
From: Andrew Gallatin <gallatin@cs.duke.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Fri, 17 Nov 2000 21:50:13 -0500 (EST)
To: current@freebsd.org
Reply-To: current@freebsd.org
Subject: missing interrupts (was Re: CURRENT is freezing again ...)
In-Reply-To: <Pine.BSF.4.21.0011160945310.3172-100000@valcho.net>
References: <Pine.BSF.4.21.0011160945310.3172-100000@valcho.net>
X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs  Lucid
Message-ID: <14869.58648.34403.679348@grasshopper.cs.duke.edu>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Valentin Chopov writes:
 > Hi,
 > 
 > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in
 > 10 min after boot...
 > 

I've seen one similar problem on an alpha UP1000 that I'd like some
input about.

The UP1000 is essentially an alpha 21264 stuffed into an AMD Athlon
system.  It has an AMD-751 chipset and handles all device interrupts
via an isa interrupt controller.

I've noticed that under "heavy" load (gdb -k kernel.debug /dev/mem on
an NFS filesystem), the network interface goes away, never to
reappear.  All I see is "fxp0: device timeout" on console.
This started with SMPng.

After a little bit of investigation with ddb, I discovered that
the NIC's irq was pending.  Eg:

	login: fxp0: device timeout
	Stopped at      siointr1+0x17c: br      zero,siointr1+0x32c     <zero=0x0>
	db> call isa_irq_pending()
	0x410

The fxp interface is at ir10, so 0x410 means there's an irq 10
pending.    

I then wrote a hack which sends an eoi.  If I call my hack from ddb
and send an eoi for irq10, everything goes back to normal and the
network interface is back.

So, is it a race in the interrupt code, or is it something about how
the code is structured?

On the alpha at least, we get the irq, mask the irq and set the
ithread runnable.  When the (isa) ithread runs, it calls the interrupt
handler and then sends an eoi.  The interrupt is then unmasked.

I've peeked at the linux code and noticed that they do things
differently.  They first mask the interrupt, and then send the eoi
immediately -- before the handler runs.  They then run the handler
and unmask the interrupt.  The seem to do this both on i386 and
alpha.  

Does anybody have any ideas about this?  Does something bad
happen if you don't send an eoi in a reasonable amount of time?


Drew
------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer	http://www.cs.duke.edu/~gallatin
Duke University				Email: gallatin@cs.duke.edu
Department of Computer Science		Phone: (919) 660-6590


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message