From owner-freebsd-current Fri Nov 17 18:50:17 2000 Delivered-To: freebsd-current@freebsd.org Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by hub.freebsd.org (Postfix) with ESMTP id 7921437B4C5 for ; Fri, 17 Nov 2000 18:50:14 -0800 (PST) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id VAA21976 for ; Fri, 17 Nov 2000 21:50:13 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eAI2oDe51018; Fri, 17 Nov 2000 21:50:13 -0500 (EST) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Fri, 17 Nov 2000 21:50:13 -0500 (EST) To: current@freebsd.org Reply-To: current@freebsd.org Subject: missing interrupts (was Re: CURRENT is freezing again ...) In-Reply-To: References: X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <14869.58648.34403.679348@grasshopper.cs.duke.edu> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Valentin Chopov writes: > Hi, > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in > 10 min after boot... > I've seen one similar problem on an alpha UP1000 that I'd like some input about. The UP1000 is essentially an alpha 21264 stuffed into an AMD Athlon system. It has an AMD-751 chipset and handles all device interrupts via an isa interrupt controller. I've noticed that under "heavy" load (gdb -k kernel.debug /dev/mem on an NFS filesystem), the network interface goes away, never to reappear. All I see is "fxp0: device timeout" on console. This started with SMPng. After a little bit of investigation with ddb, I discovered that the NIC's irq was pending. Eg: login: fxp0: device timeout Stopped at siointr1+0x17c: br zero,siointr1+0x32c db> call isa_irq_pending() 0x410 The fxp interface is at ir10, so 0x410 means there's an irq 10 pending. I then wrote a hack which sends an eoi. If I call my hack from ddb and send an eoi for irq10, everything goes back to normal and the network interface is back. So, is it a race in the interrupt code, or is it something about how the code is structured? On the alpha at least, we get the irq, mask the irq and set the ithread runnable. When the (isa) ithread runs, it calls the interrupt handler and then sends an eoi. The interrupt is then unmasked. I've peeked at the linux code and noticed that they do things differently. They first mask the interrupt, and then send the eoi immediately -- before the handler runs. They then run the handler and unmask the interrupt. The seem to do this both on i386 and alpha. Does anybody have any ideas about this? Does something bad happen if you don't send an eoi in a reasonable amount of time? Drew ------------------------------------------------------------------------------ Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: gallatin@cs.duke.edu Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message