Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Jun 2010 09:55:54 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        Alexander Motin <mav@freebsd.org>
Subject:   Re: ioapic_assign_cpu() on active level-triggered interrupt
Message-ID:  <201006070955.54445.jhb@freebsd.org>
In-Reply-To: <4C094635.40002@FreeBSD.org>
References:  <4C094635.40002@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 04 June 2010 2:30:13 pm Alexander Motin wrote:
> Hi.
> 
> I am working on driver for HPET event timers. It works mostly fine,
> except after some cases when ioapic_assign_cpu() called while timer is
> active. Under interrupt rate of 10KHz it is enough a dozen cpuset runs
> to break it (with 1KHz - few dozens). When it happens, I can see that
> timer is still running, interrupt status register is changing, but no
> interrupts received.
> 
> Timer uses level-triggered interrupts, so it is tolerant to interrupt
> losses. I have tried to not acknowledge some, and they have immediately
> got back to me again, as expected for level-triggering. Timer runs in
> periodic mode, so it doesn't need handling to continue counting.
> 
> I have reproduced it on two different i386 SMP systems: Core2Duo+ICH10
> and Core i5+PCH. With more experiments I have found that I can't trigger
> this issue if following patch applied:
> 
> --- io_apic.c.prev      2010-06-02 10:55:56.000000000 +0300
> +++ io_apic.c   2010-06-04 17:45:51.000000000 +0300
> @@ -363,7 +366,10 @@ ioapic_assign_cpu(struct intsrc *isrc, u
>                 printf(") to lapic %u vector %u\n", intpin->io_cpu,
>                     intpin->io_vector);
>         }
> +       ioapic_disable_source(isrc, PIC_NO_EOI);
> +       DELAY(10);
>         ioapic_program_intpin(intpin);
> +       ioapic_enable_source(isrc);
>         /*
>          * Free the old vector after the new one is established.  This
> is done
>          * to prevent races where we could miss an interrupt.
> 
> It is is almost a hack and 10us is completely experimental. But it looks
> like changing interrupt's APIC and vector in some moments of interrupt
> processing may be not a good idea.
> 
> Can somebody explain this behavior and propose some solution? Have
> somebody seen it for regular PCI devices?

It probably would be best to disable the source, however, you can't just re-
enable it as it might already be disabled when you move it.  It is also 
probably a bug that io_masked can be read w/o holding the icu_lock in 
ioapic_program_intpin().  I think the icu_lock should be pushed up to callers 
of ioapic_program_intpin(), and that you should explicitly do a simple write 
to mask the source a bit earlier.  Something like this perhaps:

Index: io_apic.c
===================================================================
--- io_apic.c	(revision 208714)
+++ io_apic.c	(working copy)
@@ -261,16 +261,15 @@
 	 * If a pin is completely invalid or if it is valid but hasn't
 	 * been enabled yet, just ensure that the pin is masked.
 	 */
+	mtx_assert(&icu_lock, MA_OWNED);
 	if (intpin->io_irq == IRQ_DISABLED || (intpin->io_irq < NUM_IO_INTS &&
 	    intpin->io_vector == 0)) {
-		mtx_lock_spin(&icu_lock);
 		low = ioapic_read(io->io_addr,
 		    IOAPIC_REDTBL_LO(intpin->io_intpin));
 		if ((low & IOART_INTMASK) == IOART_INTMCLR)
 			ioapic_write(io->io_addr,
 			    IOAPIC_REDTBL_LO(intpin->io_intpin),
 			    low | IOART_INTMSET);
-		mtx_unlock_spin(&icu_lock);
 		return;
 	}
 
@@ -312,14 +311,12 @@
 	}
 
 	/* Write the values to the APIC. */
-	mtx_lock_spin(&icu_lock);
 	intpin->io_lowreg = low;
 	ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin), low);
 	value = ioapic_read(io->io_addr, IOAPIC_REDTBL_HI(intpin->io_intpin));
 	value &= ~IOART_DEST;
 	value |= high;
 	ioapic_write(io->io_addr, IOAPIC_REDTBL_HI(intpin->io_intpin), value);
-	mtx_unlock_spin(&icu_lock);
 }
 
 static int
@@ -352,6 +349,12 @@
 	if (new_vector == 0)
 		return (ENOSPC);
 
+	/* Mask the old intpin if it is enabled while it is migrated. */
+	mtx_lock_spin(&icu_lock);
+	if (!intpin->io_masked)
+		ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin),
+		    intpin->io_lowreg | IOART_INTMSET);
+
 	intpin->io_cpu = apic_id;
 	intpin->io_vector = new_vector;
 	if (isrc->is_handlers > 0)
@@ -364,6 +367,8 @@
 		    intpin->io_vector);
 	}
 	ioapic_program_intpin(intpin);
+	mtx_unlock_spin(&icu_lock);
+
 	/*
 	 * Free the old vector after the new one is established.  This is done
 	 * to prevent races where we could miss an interrupt.
@@ -399,9 +404,11 @@
 		/* Mask this interrupt pin and free its APIC vector. */
 		vector = intpin->io_vector;
 		apic_disable_vector(intpin->io_cpu, vector);
+		mtx_lock_spin(&icu_lock);
 		intpin->io_masked = 1;
 		intpin->io_vector = 0;
 		ioapic_program_intpin(intpin);
+		mtx_unlock_spin(&icu_lock);
 		apic_free_vector(intpin->io_cpu, vector, intpin->io_irq);
 	}
 }
@@ -443,6 +450,7 @@
 	 * XXX: Should we write to the ELCR if the trigger mode changes for
 	 * an EISA IRQ or an ISA IRQ with the ELCR present?
 	 */
+	mtx_lock_spin(&icu_lock);
 	if (intpin->io_bus == APIC_BUS_EISA)
 		pol = INTR_POLARITY_HIGH;
 	changed = 0;
@@ -464,6 +472,7 @@
 	}
 	if (changed)
 		ioapic_program_intpin(intpin);
+	mtx_unlock_spin(&icu_lock);
 	return (0);
 }
 
@@ -473,8 +482,10 @@
 	struct ioapic *io = (struct ioapic *)pic;
 	int i;
 
+	mtx_lock_spin(&icu_lock);
 	for (i = 0; i < io->io_numintr; i++)
 		ioapic_program_intpin(&io->io_pins[i]);
+	mtx_unlock_spin(&icu_lock);
 }
 
 /*

If you find that you still need the DELAY(10), you could place it in the 
conditional block that masks the interrupt perhaps.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201006070955.54445.jhb>