Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Nov 2005 10:35:13 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-hackers@freebsd.org
Cc:        Craig Boston <craig@tobuj.gank.org>, hackers@freebsd.org
Subject:   Re: Weird PCI interrupt delivery problem
Message-ID:  <200511301035.14284.jhb@freebsd.org>
In-Reply-To: <20051130020734.GA6577@nowhere>
References:  <20051130020734.GA6577@nowhere>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 29 November 2005 09:07 pm, Craig Boston wrote:
> Hi,
>
> I'm working on getting this laptop up and running and need some advice
> from PCI gurus.  I am running into a really odd problem with PCI
> interrupts.  After a while they simply stop being delivered.  ACPI makes
> the problem much worse, but it happens eventually without ACPI too.
>
> The system looks like this:
> pcib0
>   pci0
>     ohci0
>   pcib2
>     pci9
>       cbb0
>       rl0
>       ath0
>
> However, the problem affects ohci0 as well so I don't think the PCI
> bridge is the culprit.  Actually, the only PCI device in the system that
> doesn't seem to be affected is the ATA controller, and I think that's
> because it uses ISA interrupts 14-15.
>
> With both ACPI & APIC enabled, it only lasts a few seconds.  Each pin on
> the I/O APIC manages about 10-50 interrupts before they simply stop
> coming.  The number of interrupts seems to be the deciding factor rather
> than time -- I can wait a minute and ohci0 will work until I move a USB
> mouse around for a while.

You didn't have to futz with the routing in this case?

> With ACPI disabled, the system panics because the mptable is broken.
> However, I was able to hack the kernel to override the mptable and route
> the interrupts to the correct pins (actually it rewrites parts of the
> mptable as it's being parsed).  In this configuration, everything works
> fine for a while, but it eventually dies.  ath0 is usually the first to
> go since it generates a steady stream of interrupts, but given enough
> time they eventually all stop.  Sometimes it happens after 50,000
> sometimes 500,000.

You know that you can override individual routings just using tunables without 
having to hack the table.  Just use something like:

hw.pci0.2.INTA.irq=17  to route pci bus 0, slot 2, pin A# to IRQ 17 (apic 0, 
intpin 17).  Determining the correct intpins can be tricky though.

> I also tried ACPI enabled but APIC disabled.  The FreeBSD ACPI code
> seems to assume APIC interrupt model for i386, so it took some
> modifications to get this working.  Everything ends up on IRQ 11, though
> I'm not sure if it's getting reprogrammed to be level triggered or not.
> Symptoms are the same as with APIC on -- after 10-50 interrupts it just
> stops.

The code does not assume APIC at all.  What does an unmolested kernel actually 
do with ACPI enabled but APIC disabled?

> The final thing I tried is both APIC & ACPI disabled -- route everything
> through the 8259.  In this mode, cbb0 fails to attach (Unable to map
> IRQ).  Everything else ends up on IRQ 11, however it does seem to work
> indefinitely.

Do you have a dmesg from this?  Preferably a verbose one to see if your $PIR 
has routing info for cbb0.

> Oh, when APIC is being used, vmstat -i reports the lapic timer interrupt
> happily churning away without problem.

Yes, it's a interrupt internal to the CPU.

> I've checked everything I can think of -- no reports of interrupt
> storms, everything looks normal in verbose boot.  I was just going to
> run in PIC mode until I discovered that cardbus didn't work.
>
> Any ideas on things to try to debug this?  First thing that comes to
> mind is to see if the IRQ is being intentionally masked for some reason,
> but I can't think of an easy way to check that.

We mask the IRQs in the PIC while their ithread runs.  If your routing is all 
screwed up that might result in the problems you are seeing.  Can you boot 
into Windows and jot down the IRQs it uses for each device and then (if you 
are up to it), provide verbose dmesg's of an unpatched kernel for the 4 cases 
of + ACPI + APIC, - ACPI + APIC, + ACPI - APIC, - ACPI - APIC?

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511301035.14284.jhb>