From owner-freebsd-smp  Sat Apr 19 06:20:48 1997
Return-Path: <owner-smp>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id GAA16376
          for smp-outgoing; Sat, 19 Apr 1997 06:20:48 -0700 (PDT)
Received: from corona.jcmax.com (corona.jcmax.com [204.69.248.2])
          by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id GAA16367
          for <smp@freebsd.org>; Sat, 19 Apr 1997 06:20:45 -0700 (PDT)
Received: by corona.jcmax.com (5.65/2.49G/4.1.3_U1)
	id AA18511; Sat, 19 Apr 97 09:20:39 -0400
Date: Sat, 19 Apr 97 09:20:39 -0400
From: cr@jcmax.com (Cyrus Rahman)
Message-Id: <9704191320.AA18511@corona.jcmax.com>
To: smp@csn.net, smp@freebsd.org
Subject: SMP kernel deadlocks
Sender: owner-smp@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

I've previously described a situation in which the freebsd SMP kernel
appeared to deadlock under heavy load.  I finally got another chunk of time
to look into the problem.

****

Problem summary (in Steve's words):

Summary of the problem:

 code:
        3-0.970209-SNAP, -current SMP src
        APIC_IO and all recommended options for same.

 symptom:
        heavily loaded system (ie lots of INTs happening) "freezes"
 
 reason:
        cpu0 is trying to service an INT, spin-locks attempting to get the
        mp_lock, which evidently is permanently held by some process on cpu1.
        the lock count that is being held is usually 2, but sometimes only 1.

reproducing the problem:
        although I have never seen this before, I can easily reproduce it
        by disabling the loprio code by changing TEST_LOPRIO to TEST_LOPRIO_NOT
        in smptests.h.  The effect of this is to cause ALL INTs to be serviced
        by cpu0.


****

At the time there was some question about whether there was a true deadlock.
As it turns out, there is.

The trouble occurs when a page fault occurs on one processor, and, during a
critical interval while that page fault is being serviced, an interrupt
occurs on the other processor.  Defining TEST_LOPRIO decreases the frequency
with which this happens, but does not eliminate the problem.

The details:

	During the page fault, it generally happens that at some point
	smp_invltlb() gets called to flush the TLB on the other CPU's.
	smp_invltlb() calls allButSelfIPI() and sends an IPI to the other
	processor, which, unfortunately, is sometimes already processing an
	interrupt of a higher priority.  This interrupt routine now spends
	its time trying to obtain the mp_lock spin lock so it can enter the
	kernel, but the processor which has this lock is also in a spin loop
	in apicIPI() waiting for the IPI to be delivered.


Clearly the solution we originally considered, routing the stalled interrupt
to the processor with the mp_lock, isn't going to work here.  I haven't
had time to think through any of the other ways to get around the problem,
(and since I need to be in Baltimore in a few hours I probably shouldn't
start), but I'd be very interested in any ideas.

Cyrus