Date: Thu, 16 Jan 2003 06:34:32 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: smp@freebsd.org Subject: a deadlock stopping cpus Message-ID: <20030116055505.A292-100000@gamplex.bde.org>
next in thread | raw e-mail | index | archive | help
Suppose that one CPU calls stop_cpus_() while sched_lock is held. This happens 100% of the time if a CPU that holds sched_lock is in ddb. Suppose that another CPU begins handling an interrupt while sched_lock is held but the IPI to stop the second CPU has not been delivered. Then the second CPU spins endlessly (*) in mtx_lock_spin() called from ithread_schedule() because all its interrupts are masked (at least on i386's) so the IPI can't be delivered, and the first CPU spins endlessly (**) in stop_cpus() because the second CPU can't be stopped. The fix probably involves not masking all interrupts while handling ordinary interrupts. Masking hardware interrupts in the APIC only should be sufficient on i386's. ISTR that old versions attempted to do this but had bugs like allowing software interrupts to preempt hardware interrupts (not just the hardware interrupt handler but its scheduling). Hopefully we now need only sched_lock for the scheduling. Spinlocks no longer mask all interrupts on at least i386's. (*) The spin actually times out after too long. It panics with a bogus message about timing out after > 5 seconds when the timeout is more like 500 seconds. (**) The spin actually times out after just 100000 atomic_load_acq_int()s in the DIAGNOSTIC case. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030116055505.A292-100000>