From owner-freebsd-smp Sat Apr 19 06:20:48 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id GAA16376 for smp-outgoing; Sat, 19 Apr 1997 06:20:48 -0700 (PDT) Received: from corona.jcmax.com (corona.jcmax.com [204.69.248.2]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id GAA16367 for ; Sat, 19 Apr 1997 06:20:45 -0700 (PDT) Received: by corona.jcmax.com (5.65/2.49G/4.1.3_U1) id AA18511; Sat, 19 Apr 97 09:20:39 -0400 Date: Sat, 19 Apr 97 09:20:39 -0400 From: cr@jcmax.com (Cyrus Rahman) Message-Id: <9704191320.AA18511@corona.jcmax.com> To: smp@csn.net, smp@freebsd.org Subject: SMP kernel deadlocks Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I've previously described a situation in which the freebsd SMP kernel appeared to deadlock under heavy load. I finally got another chunk of time to look into the problem. **** Problem summary (in Steve's words): Summary of the problem: code: 3-0.970209-SNAP, -current SMP src APIC_IO and all recommended options for same. symptom: heavily loaded system (ie lots of INTs happening) "freezes" reason: cpu0 is trying to service an INT, spin-locks attempting to get the mp_lock, which evidently is permanently held by some process on cpu1. the lock count that is being held is usually 2, but sometimes only 1. reproducing the problem: although I have never seen this before, I can easily reproduce it by disabling the loprio code by changing TEST_LOPRIO to TEST_LOPRIO_NOT in smptests.h. The effect of this is to cause ALL INTs to be serviced by cpu0. **** At the time there was some question about whether there was a true deadlock. As it turns out, there is. The trouble occurs when a page fault occurs on one processor, and, during a critical interval while that page fault is being serviced, an interrupt occurs on the other processor. Defining TEST_LOPRIO decreases the frequency with which this happens, but does not eliminate the problem. The details: During the page fault, it generally happens that at some point smp_invltlb() gets called to flush the TLB on the other CPU's. smp_invltlb() calls allButSelfIPI() and sends an IPI to the other processor, which, unfortunately, is sometimes already processing an interrupt of a higher priority. This interrupt routine now spends its time trying to obtain the mp_lock spin lock so it can enter the kernel, but the processor which has this lock is also in a spin loop in apicIPI() waiting for the IPI to be delivered. Clearly the solution we originally considered, routing the stalled interrupt to the processor with the mp_lock, isn't going to work here. I haven't had time to think through any of the other ways to get around the problem, (and since I need to be in Baltimore in a few hours I probably shouldn't start), but I'd be very interested in any ideas. Cyrus