Date: Thu, 16 Oct 2003 11:17:50 -0400 From: Michael Marchetti <mmarchetti@sandvine.com> To: "'hackers@freebsd.org'" <hackers@freebsd.org>, "'stable@freebsd.org'" <stable@freebsd.org> Subject: hardclock interrupt deadlock Message-ID: <FE045D4D9F7AED4CBFF1B3B813C8533701ED5EC9@mail.sandvine.com>
next in thread | raw e-mail | index | archive | help
Hi, We have encountered a problem where the system hangs. We are running a 4.7 SMP kernel using kernel polling on a Dual Xeon with hyperthreading enabled (essentially a 4 processor system). As a result, the only HW interrupts in the system are hardclock (8254), the rtc, serial console and scsi. The synchronous interrupts are (8254 and rtc). When the system is hung, I have found that the ipending and iactive bits for the 8254 and rtc are set (meaning the interrupt is pending and active) although giant lock is not held and all processors are idle (and halted). This lead me to believe that somehow the ipending bit was set "just before" the last interrupt returned. The only way the system would be able to run that interrupt again is if another interrupt would run and it would notice that ipending is set, and it would run (an interrupt delay would be seen). In a non-polling system, I imagine the ethernet interrupts would wake it up. I believe I found a potential hole where this could happen. In i386/isa/ipl.s: #ifdef SMP cli /* early to prevent INT deadlock */ doreti_next2: #endif movl %eax,%ecx notl %ecx /* set bit = unmasked level */ #ifndef SMP cli #endif andl _ipending,%ecx /* set bit = unmasked pending INT */ jne doreti_unpend movl %eax,_cpl I'm concerned in the instance the ipending is checked and deemed to be not set, but just after another interrupt occurs causing ipending to be set. Because CPL is not yet unmasked, that interrupt is not forwarded. In Particular, in i386/isa/apic_vector.s: 3: ; /* other cpu has isr lock */ \ APIC_ITRACE(apic_itrace_noisrlock, irq_num, APIC_ITRACE_NOISRLOCK) ;\ lock ; \ orl $IRQ_BIT(irq_num), _ipending ; \ testl $IRQ_BIT(irq_num), _cpl ; \ jne 4f ; /* this INT masked */ \ call forward_irq ; /* forward irq to lock holder */ \ POP_FRAME ; /* and return */ \ iret ; \ ALIGN_TEXT ; \ The check for _cpl occurs right after the ipending, thus causing a potential race for checking/modifying the cpl. One quick solution that I thought might correct this would be in ipl.s, right after modifying the cpl, recheck the ipending again to see if it changed, such as: #ifdef SMP cli /* early to prevent INT deadlock */ doreti_next2: #endif movl %eax,%ecx notl %ecx /* set bit = unmasked level */ #ifndef SMP cli #endif andl _ipending,%ecx /* set bit = unmasked pending INT */ jne doreti_unpend movl %eax,_cpl andl _ipending,%ecx /* set bit = unmasked pending INT */ jne doreti_unpend Any opinions/insight? thanks.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C8533701ED5EC9>