From owner-freebsd-stable@FreeBSD.ORG Thu Oct 16 08:49:13 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E817816A4B3; Thu, 16 Oct 2003 08:49:12 -0700 (PDT) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8C4C043FE5; Thu, 16 Oct 2003 08:49:01 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.9p1/8.12.3) with ESMTP id h9GFmvsd026379; Thu, 16 Oct 2003 08:48:57 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.9p1/8.12.3/Submit) id h9GFmvog026378; Thu, 16 Oct 2003 08:48:57 -0700 (PDT) (envelope-from rizzo) Date: Thu, 16 Oct 2003 08:48:57 -0700 From: Luigi Rizzo To: Michael Marchetti Message-ID: <20031016084857.A26357@xorpc.icir.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from mmarchetti@sandvine.com on Thu, Oct 16, 2003 at 11:17:50AM -0400 cc: "'stable@freebsd.org'" cc: "'hackers@freebsd.org'" Subject: Re: hardclock interrupt deadlock X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Oct 2003 15:49:13 -0000 On Thu, Oct 16, 2003 at 11:17:50AM -0400, Michael Marchetti wrote: > Hi, > > We have encountered a problem where the system hangs. We are running a 4.7 > SMP kernel using kernel polling on a Dual Xeon with hyperthreading enabled puzzled on what you mean by "kernel polling" ... DEVICE_POLLING, if that is what you mean, cannot work with SMP -- it should not even build unless you manually disabled the check. luigi > (essentially a 4 processor system). As a result, the only HW interrupts in > the system are hardclock (8254), the rtc, serial console and scsi. The > synchronous interrupts are (8254 and rtc). When the system is hung, I have > found that the ipending and iactive bits for the 8254 and rtc are set > (meaning the interrupt is pending and active) although giant lock is not > held and all processors are idle (and halted). This lead me to believe that > somehow the ipending bit was set "just before" the last interrupt returned. > The only way the system would be able to run that interrupt again is if > another interrupt would run and it would notice that ipending is set, and it > would run (an interrupt delay would be seen). In a non-polling system, I > imagine the ethernet interrupts would wake it up. I believe I found a > potential hole where this could happen. > > In i386/isa/ipl.s: > > #ifdef SMP > cli /* early to prevent INT deadlock */ > doreti_next2: > #endif > movl %eax,%ecx > notl %ecx /* set bit = unmasked level */ > #ifndef SMP > cli > #endif > andl _ipending,%ecx /* set bit = unmasked pending INT */ > jne doreti_unpend > movl %eax,_cpl > > I'm concerned in the instance the ipending is checked and deemed to be not > set, but just after another interrupt occurs causing ipending to be set. > Because CPL is not yet unmasked, that interrupt is not forwarded. In > Particular, in i386/isa/apic_vector.s: > > 3: ; /* other cpu has isr lock */ \ > APIC_ITRACE(apic_itrace_noisrlock, irq_num, APIC_ITRACE_NOISRLOCK) > ;\ > lock ; \ > orl $IRQ_BIT(irq_num), _ipending ; \ > testl $IRQ_BIT(irq_num), _cpl ; \ > jne 4f ; /* this INT masked */ \ > call forward_irq ; /* forward irq to lock holder */ \ > POP_FRAME ; /* and return */ \ > iret ; \ > ALIGN_TEXT ; \ > > The check for _cpl occurs right after the ipending, thus causing a potential > race for checking/modifying the cpl. > > One quick solution that I thought might correct this would be in ipl.s, > right after modifying the cpl, recheck the ipending again to see if it > changed, such as: > > > #ifdef SMP > cli /* early to prevent INT deadlock */ > doreti_next2: > #endif > movl %eax,%ecx > notl %ecx /* set bit = unmasked level */ > #ifndef SMP > cli > #endif > andl _ipending,%ecx /* set bit = unmasked pending INT */ > jne doreti_unpend > movl %eax,_cpl > andl _ipending,%ecx /* set bit = unmasked pending INT */ > jne doreti_unpend > > > Any opinions/insight? > > thanks. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"