From owner-freebsd-current@FreeBSD.ORG Tue Jun 8 07:08:01 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32C691065673; Tue, 8 Jun 2010 07:08:01 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 868958FC17; Tue, 8 Jun 2010 07:08:00 +0000 (UTC) Received: by fxm20 with SMTP id 20so3258203fxm.13 for ; Tue, 08 Jun 2010 00:07:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type; bh=KB34dioEtjCmgZYw5Zbx3NYaI+TqNjr9kdCsjdlDzDs=; b=fnGYhIp8iTK8pgqYlklZBX2mXuBOs5tOycnjB2avoN7BeH0WCTxhT2nG1fbxgA6ckh CbNxFv8g5S+XKBT07+XinQi3vpQHjkgKTYJw7J0z4x4ef2B2bW/SpHOMHl2AIIYXXmnr MBPtEJblB6bbatWpnznFypyCcPG9DtgQUlHps= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type; b=eNrb7sY4KkugXgF3b5LQzSJgWIOMCQw1OJu2Rd6KECd928G3J8PTReCZX93vlrezEr qJQQBwzbkE3tRRkb2VtuBOMNhIuqAaIyVegfTYrRI+mPptAVDkZQ0HrH29SjTtB0ZT/8 3vZaC1iPuzHXk65EKIQWCFa9SygcztIxK57Ao= Received: by 10.223.92.152 with SMTP id r24mr603698fam.74.1275980879451; Tue, 08 Jun 2010 00:07:59 -0700 (PDT) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id 15sm24072837fad.10.2010.06.08.00.07.57 (version=SSLv3 cipher=RC4-MD5); Tue, 08 Jun 2010 00:07:57 -0700 (PDT) Sender: Alexander Motin Message-ID: <4C0DEC2D.20109@FreeBSD.org> Date: Tue, 08 Jun 2010 10:07:25 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: John Baldwin References: <4C094635.40002@FreeBSD.org> <201006070955.54445.jhb@freebsd.org> In-Reply-To: <201006070955.54445.jhb@freebsd.org> X-Enigmail-Version: 0.96.0 Content-Type: multipart/mixed; boundary="------------070008030807000207020202" Cc: freebsd-current@freebsd.org Subject: Re: ioapic_assign_cpu() on active level-triggered interrupt X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 07:08:01 -0000 This is a multi-part message in MIME format. --------------070008030807000207020202 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Hi. John Baldwin wrote: > On Friday 04 June 2010 2:30:13 pm Alexander Motin wrote: >> I am working on driver for HPET event timers. It works mostly fine, >> except after some cases when ioapic_assign_cpu() called while timer is >> active. Under interrupt rate of 10KHz it is enough a dozen cpuset runs >> to break it (with 1KHz - few dozens). When it happens, I can see that >> timer is still running, interrupt status register is changing, but no >> interrupts received. >> >> Can somebody explain this behavior and propose some solution? Have >> somebody seen it for regular PCI devices? > > It probably would be best to disable the source, however, you can't just re- > enable it as it might already be disabled when you move it. It is also > probably a bug that io_masked can be read w/o holding the icu_lock in > ioapic_program_intpin(). I think the icu_lock should be pushed up to callers > of ioapic_program_intpin(), and that you should explicitly do a simple write > to mask the source a bit earlier. Something like this perhaps: > > Index: io_apic.c > =================================================================== > --- io_apic.c (revision 208714) > +++ io_apic.c (working copy) > > If you find that you still need the DELAY(10), you could place it in the > conditional block that masks the interrupt perhaps. Thank you! This patch seems to reduce stuck probability for level-triggered interrupts, especially when printf works as DELAY() on bootverbose=1. Without DELAY() and bootverbose I can trigger the issue by few dozens of spuset runs. With DELAY() I still was able to do it twice, but only after several hundreds of tries. Also I have found that unconditional pre-disabling interrupt hurts RTC timer. As I understand, IRQ8 is edge-triggered, and it's loss during switch makes RTC timer stuck. I've tried to not mask edge-triggered interrupts - and it helped. But even so I am able to hang RTC timer with hundred cpusets, so may be we have to just choose between bad and worse. Updated patch attached. PS: For the global timers CPU binding is probably not so important, as interrupts are any way redistributed to other CPUs. So I have found that this issue can be workarounded by binding interrupts to BSP during attach to avoid migration on SMP start. -- Alexander Motin --------------070008030807000207020202 Content-Type: text/plain; name="io_apic.c.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="io_apic.c.patch" --- io_apic.c.prev 2010-06-05 00:53:55.000000000 +0300 +++ io_apic.c 2010-06-08 09:21:56.000000000 +0300 @@ -261,16 +261,15 @@ ioapic_program_intpin(struct ioapic_ints * If a pin is completely invalid or if it is valid but hasn't * been enabled yet, just ensure that the pin is masked. */ + mtx_assert(&icu_lock, MA_OWNED); if (intpin->io_irq == IRQ_DISABLED || (intpin->io_irq < NUM_IO_INTS && intpin->io_vector == 0)) { - mtx_lock_spin(&icu_lock); low = ioapic_read(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin)); if ((low & IOART_INTMASK) == IOART_INTMCLR) ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin), low | IOART_INTMSET); - mtx_unlock_spin(&icu_lock); return; } @@ -312,14 +311,12 @@ ioapic_program_intpin(struct ioapic_ints } /* Write the values to the APIC. */ - mtx_lock_spin(&icu_lock); intpin->io_lowreg = low; ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin), low); value = ioapic_read(io->io_addr, IOAPIC_REDTBL_HI(intpin->io_intpin)); value &= ~IOART_DEST; value |= high; ioapic_write(io->io_addr, IOAPIC_REDTBL_HI(intpin->io_intpin), value); - mtx_unlock_spin(&icu_lock); } static int @@ -352,6 +349,14 @@ ioapic_assign_cpu(struct intsrc *isrc, u if (new_vector == 0) return (ENOSPC); + /* Mask the old intpin if it is enabled while it is migrated. */ + mtx_lock_spin(&icu_lock); + if (!intpin->io_masked && !intpin->io_edgetrigger) { + ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin), + intpin->io_lowreg | IOART_INTMSET); + DELAY(100); + } + intpin->io_cpu = apic_id; intpin->io_vector = new_vector; if (isrc->is_handlers > 0) @@ -364,6 +369,8 @@ ioapic_assign_cpu(struct intsrc *isrc, u intpin->io_vector); } ioapic_program_intpin(intpin); + mtx_unlock_spin(&icu_lock); + /* * Free the old vector after the new one is established. This is done * to prevent races where we could miss an interrupt. @@ -399,9 +406,11 @@ ioapic_disable_intr(struct intsrc *isrc) /* Mask this interrupt pin and free its APIC vector. */ vector = intpin->io_vector; apic_disable_vector(intpin->io_cpu, vector); + mtx_lock_spin(&icu_lock); intpin->io_masked = 1; intpin->io_vector = 0; ioapic_program_intpin(intpin); + mtx_unlock_spin(&icu_lock); apic_free_vector(intpin->io_cpu, vector, intpin->io_irq); } } @@ -443,6 +452,7 @@ ioapic_config_intr(struct intsrc *isrc, * XXX: Should we write to the ELCR if the trigger mode changes for * an EISA IRQ or an ISA IRQ with the ELCR present? */ + mtx_lock_spin(&icu_lock); if (intpin->io_bus == APIC_BUS_EISA) pol = INTR_POLARITY_HIGH; changed = 0; @@ -464,6 +474,7 @@ ioapic_config_intr(struct intsrc *isrc, } if (changed) ioapic_program_intpin(intpin); + mtx_unlock_spin(&icu_lock); return (0); } @@ -473,8 +484,10 @@ ioapic_resume(struct pic *pic) struct ioapic *io = (struct ioapic *)pic; int i; + mtx_lock_spin(&icu_lock); for (i = 0; i < io->io_numintr; i++) ioapic_program_intpin(&io->io_pins[i]); + mtx_unlock_spin(&icu_lock); } /* --------------070008030807000207020202--