Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Jan 2003 06:34:32 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        smp@freebsd.org
Subject:   a deadlock stopping cpus
Message-ID:  <20030116055505.A292-100000@gamplex.bde.org>

next in thread | raw e-mail | index | archive | help
Suppose that one CPU calls stop_cpus_() while sched_lock is held.  This
happens 100% of the time if a CPU that holds sched_lock is in ddb.
Suppose that another CPU begins handling an interrupt while sched_lock
is held but the IPI to stop the second CPU has not been delivered.
Then the second CPU spins endlessly (*) in mtx_lock_spin() called from
ithread_schedule() because all its interrupts are masked (at least on
i386's) so the IPI can't be delivered, and the first CPU spins endlessly
(**) in stop_cpus() because the second CPU can't be stopped.

The fix probably involves not masking all interrupts while handling
ordinary interrupts.  Masking hardware interrupts in the APIC only
should be sufficient on i386's.  ISTR that old versions attempted to
do this but had bugs like allowing software interrupts to preempt
hardware interrupts (not just the hardware interrupt handler but its
scheduling).  Hopefully we now need only sched_lock for the scheduling.
Spinlocks no longer mask all interrupts on at least i386's.

(*) The spin actually times out after too long.  It panics with a bogus
message about timing out after > 5 seconds when the timeout is more
like 500 seconds.

(**) The spin actually times out after just 100000 atomic_load_acq_int()s
in the DIAGNOSTIC case.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030116055505.A292-100000>