Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Oct 2011 10:07:13 -0700
From:      mdf@FreeBSD.org
To:        Ryan Stone <rysto32@gmail.com>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>, mlaier@freebsd.org
Subject:   Re: smp_rendezvous runs with interrupts and preemption enabled on unicore systems
Message-ID:  <CAMBSHm9fxL23Bydnw6_Fned1sNPCq1aYr7As4=a91vVKcW3u1w@mail.gmail.com>
In-Reply-To: <CAFMmRNwWBE6idGJCsNPago4-N1ohAaCD02_D-LCMzDg8pbjE1A@mail.gmail.com>
References:  <CAFMmRNwWBE6idGJCsNPago4-N1ohAaCD02_D-LCMzDg8pbjE1A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 28, 2011 at 8:37 AM, Ryan Stone <rysto32@gmail.com> wrote:
> I'm seeing issues on a unicore systems running a derivative of FreeBSD
> 8.2-RELEASE if something calls mem_range_attr_set. =A0It turns out that
> the root cause is a bug in smp_rendezvous_cpus. =A0The first part of
> smp_rendezvous_cpus attempts to short-circuit the non-SMP case(note
> that smp_started is never set to 1 on a unicore system):
>
> =A0 =A0 =A0 =A0if (!smp_started) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (setup_func !=3D NULL)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0setup_func(arg);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (action_func !=3D NULL)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0action_func(arg);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (teardown_func !=3D NULL)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0teardown_func(arg);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return;
> =A0 =A0 =A0 =A0}
>
> The problem is that this runs with interrupts enabled, outside of a
> critical section. =A0My system runs with device_polling enabled with hz
> set to 2500, so its quite easy to wedge the system by having a thread
> run mem_range_attr_set. =A0That has to do a smp_rendezvous, and if a
> timer interrupt happens to go off half-way through the action_func and
> preempt this thread, the system ends up deadlocked(although once it's
> wedged, typing at the serial console stands a good chance of unwedging
> the system. =A0Go figure).
>
> I know that smp_rendezvous was reworked substantially on HEAD, but by
> inspection it looks like the bug is still present, as the
> short-circuit behaviour is still there.
>
> I am not entirely sure of the best way to fix this. =A0Is it as simple
> as doing a spinlock_enter before setup_func and a spinlock_exit after
> teardown_func? =A0It seems to boot fine, but I'm not at all confident
> that I understand the nuances of smp_rendezvous to be sure that there
> aren't side effects that I don't know about.

Looks like Max didn't have time to upstream this fix:

 struct mtx smp_ipi_mtx;
+MTX_SYSINIT(smp_ipi_mtx, &smp_ipi_mtx, "smp rendezvous", MTX_SPIN);

...

 static void
 mp_start(void *dummy)
 {

-	mtx_init(&smp_ipi_mtx, "smp rendezvous", NULL, MTX_SPIN);

...

 	if (!smp_started) {
+		mtx_lock_spin(&smp_ipi_mtx);
 		if (setup_func !=3D NULL)
 			setup_func(arg);
 		if (action_func !=3D NULL)
 			action_func(arg);
 		if (teardown_func !=3D NULL)
 			teardown_func(arg);
+		mtx_unlock_spin(&smp_ipi_mtx);
 		return;
 	}

Cheers,
matthew



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMBSHm9fxL23Bydnw6_Fned1sNPCq1aYr7As4=a91vVKcW3u1w>