From owner-freebsd-hackers@FreeBSD.ORG Wed Nov 30 15:49:52 2005 Return-Path: X-Original-To: hackers@freebsd.org Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8F08216A41F; Wed, 30 Nov 2005 15:49:52 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail6.speedfactory.net [66.23.216.219]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8213043D7C; Wed, 30 Nov 2005 15:49:46 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.5b3) with ESMTP id 2877016 for multiple; Wed, 30 Nov 2005 10:47:42 -0500 Received: from localhost (john@localhost [127.0.0.1]) by server.baldwin.cx (8.13.1/8.13.1) with ESMTP id jAUFnda9081472; Wed, 30 Nov 2005 10:49:39 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Wed, 30 Nov 2005 10:35:13 -0500 User-Agent: KMail/1.8.2 References: <20051130020734.GA6577@nowhere> In-Reply-To: <20051130020734.GA6577@nowhere> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200511301035.14284.jhb@freebsd.org> X-Spam-Status: No, score=-2.8 required=4.2 tests=ALL_TRUSTED autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on server.baldwin.cx X-Server: High Performance Mail Server - http://surgemail.com r=1653887525 Cc: Craig Boston , hackers@freebsd.org Subject: Re: Weird PCI interrupt delivery problem X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Nov 2005 15:49:52 -0000 On Tuesday 29 November 2005 09:07 pm, Craig Boston wrote: > Hi, > > I'm working on getting this laptop up and running and need some advice > from PCI gurus. I am running into a really odd problem with PCI > interrupts. After a while they simply stop being delivered. ACPI makes > the problem much worse, but it happens eventually without ACPI too. > > The system looks like this: > pcib0 > pci0 > ohci0 > pcib2 > pci9 > cbb0 > rl0 > ath0 > > However, the problem affects ohci0 as well so I don't think the PCI > bridge is the culprit. Actually, the only PCI device in the system that > doesn't seem to be affected is the ATA controller, and I think that's > because it uses ISA interrupts 14-15. > > With both ACPI & APIC enabled, it only lasts a few seconds. Each pin on > the I/O APIC manages about 10-50 interrupts before they simply stop > coming. The number of interrupts seems to be the deciding factor rather > than time -- I can wait a minute and ohci0 will work until I move a USB > mouse around for a while. You didn't have to futz with the routing in this case? > With ACPI disabled, the system panics because the mptable is broken. > However, I was able to hack the kernel to override the mptable and route > the interrupts to the correct pins (actually it rewrites parts of the > mptable as it's being parsed). In this configuration, everything works > fine for a while, but it eventually dies. ath0 is usually the first to > go since it generates a steady stream of interrupts, but given enough > time they eventually all stop. Sometimes it happens after 50,000 > sometimes 500,000. You know that you can override individual routings just using tunables without having to hack the table. Just use something like: hw.pci0.2.INTA.irq=17 to route pci bus 0, slot 2, pin A# to IRQ 17 (apic 0, intpin 17). Determining the correct intpins can be tricky though. > I also tried ACPI enabled but APIC disabled. The FreeBSD ACPI code > seems to assume APIC interrupt model for i386, so it took some > modifications to get this working. Everything ends up on IRQ 11, though > I'm not sure if it's getting reprogrammed to be level triggered or not. > Symptoms are the same as with APIC on -- after 10-50 interrupts it just > stops. The code does not assume APIC at all. What does an unmolested kernel actually do with ACPI enabled but APIC disabled? > The final thing I tried is both APIC & ACPI disabled -- route everything > through the 8259. In this mode, cbb0 fails to attach (Unable to map > IRQ). Everything else ends up on IRQ 11, however it does seem to work > indefinitely. Do you have a dmesg from this? Preferably a verbose one to see if your $PIR has routing info for cbb0. > Oh, when APIC is being used, vmstat -i reports the lapic timer interrupt > happily churning away without problem. Yes, it's a interrupt internal to the CPU. > I've checked everything I can think of -- no reports of interrupt > storms, everything looks normal in verbose boot. I was just going to > run in PIC mode until I discovered that cardbus didn't work. > > Any ideas on things to try to debug this? First thing that comes to > mind is to see if the IRQ is being intentionally masked for some reason, > but I can't think of an easy way to check that. We mask the IRQs in the PIC while their ithread runs. If your routing is all screwed up that might result in the problems you are seeing. Can you boot into Windows and jot down the IRQs it uses for each device and then (if you are up to it), provide verbose dmesg's of an unpatched kernel for the 4 cases of + ACPI + APIC, - ACPI + APIC, + ACPI - APIC, - ACPI - APIC? -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org