Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Nov 2010 08:33:29 +0200
From:      Andriy Gapon <avg@freebsd.org>
To:        freebsd-hackers@freebsd.org, "Robert N. M. Watson" <rwatson@freebsd.org>
Subject:   dtrace/cyclic deadlock
Message-ID:  <4CEB6039.2040700@freebsd.org>

next in thread | raw e-mail | index | archive | help

I think that I've run into the known issue of dtrace/cyclic deadlock.
Just would like to run my understanding and ideas by you.

The problem is that the cyclic_fire() callback is executed in the interrupt
filter context (and thus with interrupts disabled) and it tries to obtain a spin
mutex lock in the cyclic code.
At the same time other CPU may execute a thread that holds that spin mutex and
uses smp_rendezvous_cpus() to perform a synchronous function invocation on the
first CPU.
So, CPU #1 can not make forward progress because it is spinning on the spin-lock
and CPU #2 can not make forward progress because it can not interrupt CPU #1.

I think that the problem was introduced during the porting of the code.
On (Open)Solaris there are no spin-locks in this code, all data structures are
per-CPU and data coherency is ensured by (1) accessing the data only from the
CPU to which it belongs; and (2) using some modern-day spl*() equivalent[?] to
block interrupts.

I think that this is quite similar to what we do for per-CPU caches in UMA and
so the same approach should work here.
That is, as in (Open)Solaris, the data should be accessed only from the owning
CPU and spinlock_enter()/spinlock_exit() should be used to prevent races between
non-interrupt code and nested interrupt code.

What do you think?
Thanks!
-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CEB6039.2040700>