Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 06 Apr 1997 01:20:40 +0800
From:      Peter Wemm <peter@spinner.dialix.com>
To:        cr@jcmax.com (Cyrus Rahman)
Cc:        smp@freebsd.org
Subject:   Re: Deadlocking in SMP kernel 
Message-ID:  <199704051720.BAA18561@spinner.DIALix.COM>
In-Reply-To: Your message of "Sat, 05 Apr 1997 11:33:04 EST." <9704051633.AA05399@corona.jcmax.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Cyrus Rahman wrote:
> There appears to be a situation in which the SMP kernel deadlocks on
> mp_lock.  With much help from Steve Passe, I've come up with the following
> (still tentative) scenario:
> 
>  A process, running on cpu1, enters the kernel and obtains a lock.  While it
>  has the lock, but before interrupts are redirected to cpu1 (or any time, if
>  TEST_LOPRIO isn't defined), an interrupt goes to cpu0, blocking (until it
>  obtains the lock) all lower priority interrupts.

No..  Lower priority interupts will get into the kernel because the kernel 
entry lock is recursive.  cpu0 will get and see the interupt.  If it's 
masked at the time, it will be recorded for later.  As cpu#0 lowers the 
masking, if the interrupt then becomes visible it will be serviced.  cpu#0 
will not release the lock until all known interrupts are unmasked and 
serviced.

If the interupt is switched to cpu#1, there will be a problem since it will
block waiting for cpu#0 to finish.  This means that cpu#0 could service an
interrupt of lower priority before the one that cpu#1 is aware of.

It's been a while since I worked on the smp kernel, from memory the lowpri 
mode was to try and arrange for the cpu that has the kernel lock to 
be preferable to recieve interrupts from the apic[s], in order to get 
better irq latency.  I don't remember if it was ever finished, I seem to 
recall Steve telling me that there was a major flaw in what we had in mind 
for some reason.

>  If for some reason the kernel now waits for an interrupt, there will be a
>  deadlock.

I don't know anywhere that this happens, but yes, it could be a problem if 
it happens.  What normally happens is that the kernel will sleep and 
switch out to another process on return to user mode.

> Are there any places where the kernel waits for an interrupt to occur?
> There are three places I found where software interrupts are generated by
> the kernel - but I don't think any of them are relevant (two in icu.s, one
> in locore.s).
> 
> I suspect that understanding my previous question about why mp_lock needs
> to be stored during cpu_switch() might be helpful - for there's clearly some
> reason why mp_lock isn't always 1 in that routine, but I can't figure it out.
> 
> For some reason the deadlock only seems to occur with APIC_IO defined, if
> that provides any additional clues.

Hmm..  several possibilities spring to mind...

1:  There's a race somehow that we've missed in the apic masking code. This
is not exceptionally unlikely since there is lazy masking happening and the
i386 icu code is extremely 8259-pic aligned and doesn't really map to the
apic very well.

2: There's a problem with having two cpu's taking an IRQ at very close
intervals. 

3: There's other cases where the enter/leave kernel locking is botched (eg:
the fpu one that was missed up until a week or so ago).

4: You're using floating point..  I have my doubts about the fpu context
switching and operating mode control, but others seem to have it working in
spite of my grim expectations.. :-]

> Cyrus

Cheers,
-Peter





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199704051720.BAA18561>