From owner-freebsd-current@FreeBSD.ORG Mon Jun 7 14:22:42 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5CA461065670; Mon, 7 Jun 2010 14:22:42 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2F7A88FC14; Mon, 7 Jun 2010 14:22:42 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id D5C7B46B53; Mon, 7 Jun 2010 10:22:41 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id D2AD88A021; Mon, 7 Jun 2010 10:22:40 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Date: Mon, 7 Jun 2010 09:55:54 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <4C094635.40002@FreeBSD.org> In-Reply-To: <4C094635.40002@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Message-Id: <201006070955.54445.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 07 Jun 2010 10:22:40 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Alexander Motin Subject: Re: ioapic_assign_cpu() on active level-triggered interrupt X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 14:22:42 -0000 On Friday 04 June 2010 2:30:13 pm Alexander Motin wrote: > Hi. > > I am working on driver for HPET event timers. It works mostly fine, > except after some cases when ioapic_assign_cpu() called while timer is > active. Under interrupt rate of 10KHz it is enough a dozen cpuset runs > to break it (with 1KHz - few dozens). When it happens, I can see that > timer is still running, interrupt status register is changing, but no > interrupts received. > > Timer uses level-triggered interrupts, so it is tolerant to interrupt > losses. I have tried to not acknowledge some, and they have immediately > got back to me again, as expected for level-triggering. Timer runs in > periodic mode, so it doesn't need handling to continue counting. > > I have reproduced it on two different i386 SMP systems: Core2Duo+ICH10 > and Core i5+PCH. With more experiments I have found that I can't trigger > this issue if following patch applied: > > --- io_apic.c.prev 2010-06-02 10:55:56.000000000 +0300 > +++ io_apic.c 2010-06-04 17:45:51.000000000 +0300 > @@ -363,7 +366,10 @@ ioapic_assign_cpu(struct intsrc *isrc, u > printf(") to lapic %u vector %u\n", intpin->io_cpu, > intpin->io_vector); > } > + ioapic_disable_source(isrc, PIC_NO_EOI); > + DELAY(10); > ioapic_program_intpin(intpin); > + ioapic_enable_source(isrc); > /* > * Free the old vector after the new one is established. This > is done > * to prevent races where we could miss an interrupt. > > It is is almost a hack and 10us is completely experimental. But it looks > like changing interrupt's APIC and vector in some moments of interrupt > processing may be not a good idea. > > Can somebody explain this behavior and propose some solution? Have > somebody seen it for regular PCI devices? It probably would be best to disable the source, however, you can't just re- enable it as it might already be disabled when you move it. It is also probably a bug that io_masked can be read w/o holding the icu_lock in ioapic_program_intpin(). I think the icu_lock should be pushed up to callers of ioapic_program_intpin(), and that you should explicitly do a simple write to mask the source a bit earlier. Something like this perhaps: Index: io_apic.c =================================================================== --- io_apic.c (revision 208714) +++ io_apic.c (working copy) @@ -261,16 +261,15 @@ * If a pin is completely invalid or if it is valid but hasn't * been enabled yet, just ensure that the pin is masked. */ + mtx_assert(&icu_lock, MA_OWNED); if (intpin->io_irq == IRQ_DISABLED || (intpin->io_irq < NUM_IO_INTS && intpin->io_vector == 0)) { - mtx_lock_spin(&icu_lock); low = ioapic_read(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin)); if ((low & IOART_INTMASK) == IOART_INTMCLR) ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin), low | IOART_INTMSET); - mtx_unlock_spin(&icu_lock); return; } @@ -312,14 +311,12 @@ } /* Write the values to the APIC. */ - mtx_lock_spin(&icu_lock); intpin->io_lowreg = low; ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin), low); value = ioapic_read(io->io_addr, IOAPIC_REDTBL_HI(intpin->io_intpin)); value &= ~IOART_DEST; value |= high; ioapic_write(io->io_addr, IOAPIC_REDTBL_HI(intpin->io_intpin), value); - mtx_unlock_spin(&icu_lock); } static int @@ -352,6 +349,12 @@ if (new_vector == 0) return (ENOSPC); + /* Mask the old intpin if it is enabled while it is migrated. */ + mtx_lock_spin(&icu_lock); + if (!intpin->io_masked) + ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin), + intpin->io_lowreg | IOART_INTMSET); + intpin->io_cpu = apic_id; intpin->io_vector = new_vector; if (isrc->is_handlers > 0) @@ -364,6 +367,8 @@ intpin->io_vector); } ioapic_program_intpin(intpin); + mtx_unlock_spin(&icu_lock); + /* * Free the old vector after the new one is established. This is done * to prevent races where we could miss an interrupt. @@ -399,9 +404,11 @@ /* Mask this interrupt pin and free its APIC vector. */ vector = intpin->io_vector; apic_disable_vector(intpin->io_cpu, vector); + mtx_lock_spin(&icu_lock); intpin->io_masked = 1; intpin->io_vector = 0; ioapic_program_intpin(intpin); + mtx_unlock_spin(&icu_lock); apic_free_vector(intpin->io_cpu, vector, intpin->io_irq); } } @@ -443,6 +450,7 @@ * XXX: Should we write to the ELCR if the trigger mode changes for * an EISA IRQ or an ISA IRQ with the ELCR present? */ + mtx_lock_spin(&icu_lock); if (intpin->io_bus == APIC_BUS_EISA) pol = INTR_POLARITY_HIGH; changed = 0; @@ -464,6 +472,7 @@ } if (changed) ioapic_program_intpin(intpin); + mtx_unlock_spin(&icu_lock); return (0); } @@ -473,8 +482,10 @@ struct ioapic *io = (struct ioapic *)pic; int i; + mtx_lock_spin(&icu_lock); for (i = 0; i < io->io_numintr; i++) ioapic_program_intpin(&io->io_pins[i]); + mtx_unlock_spin(&icu_lock); } /* If you find that you still need the DELAY(10), you could place it in the conditional block that masks the interrupt perhaps. -- John Baldwin