Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 May 2011 14:05:25 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        src-committers@FreeBSD.org, neel@FreeBSD.org, svn-src-all@FreeBSD.org, Stanislav Sedov <stas@FreeBSD.org>, svn-src-head@FreeBSD.org, Jung-uk Kim <jkim@FreeBSD.org>
Subject:   Re: svn commit: r221703 - in head/sys: amd64/include i386/include x86/isa x86/x86
Message-ID:  <4DCBBEF5.4090004@FreeBSD.org>
In-Reply-To: <4DCBBCBE.5020004@FreeBSD.org>
References:  <201105091734.p49HY0P3006180@svn.freebsd.org>	<20110512024956.996cd973.stas@FreeBSD.org>	<4DCBB9EE.8070809@FreeBSD.org> <20110512035522.e42b379c.stas@FreeBSD.org> <4DCBBCBE.5020004@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
on 12/05/2011 13:55 John Baldwin said the following:
> On 5/12/11 6:55 AM, Stanislav Sedov wrote:
>> On Thu, 12 May 2011 13:43:58 +0300
>> Andriy Gapon<avg@FreeBSD.org>  mentioned:
>>
>>>
>>> Theory:
>>> - smp_rv_waiters[2] becomes equal to smp_rv_ncpus
>>> - [at least] one slave CPU is still in the last call to cpu_spinwait() in
>>> smp_rendezvous_action()
>>> - master CPU notices that the condition is true, exits smp_rendezvous_cpus() and
>>> calls it again
>>> - the slave CPU is still in spinwait
>>> - the master CPU resets smp_rv_waiters[2] to zero
>>> - the slave CPU exits spinwait, see smp_rv_waiters[2] with zero value
>>> - endless loop
>>>
>>
>> That might explain it.
>> Do you have a patch for me to try?
>>
>> Thanks!
>>
> 
> The NetApp folks working on BHyVe also ran into this.  They have a fix that I
> think sounds reasonable which is to add a generation count to the smp rendezvous
> "structure" and have waiting CPUs stop waiting if the generation count changes.
> 

This is an adaption of my patch in xcpu branch to head (not tested):

Index: sys/kern/subr_smp.c
===================================================================
--- sys/kern/subr_smp.c	(revision 221521)
+++ sys/kern/subr_smp.c	(working copy)
@@ -110,7 +110,7 @@
 static void (*volatile smp_rv_action_func)(void *arg);
 static void (*volatile smp_rv_teardown_func)(void *arg);
 static void *volatile smp_rv_func_arg;
-static volatile int smp_rv_waiters[3];
+static volatile int smp_rv_waiters[4];

 /*
  * Shared mutex to restrict busywaits between smp_rendezvous() and
@@ -338,11 +338,15 @@

 	/* spin on exit rendezvous */
 	atomic_add_int(&smp_rv_waiters[2], 1);
-	if (local_teardown_func == smp_no_rendevous_barrier)
+	if (local_teardown_func == smp_no_rendevous_barrier) {
+		atomic_add_int(&smp_rv_waiters[3], 1);
                 return;
+	}
 	while (smp_rv_waiters[2] < smp_rv_ncpus)
 		cpu_spinwait();

+	atomic_add_int(&smp_rv_waiters[3], 1);
+
 	/* teardown function */
 	if (local_teardown_func != NULL)
 		local_teardown_func(local_func_arg);
@@ -385,6 +389,7 @@
 	smp_rv_func_arg = arg;
 	smp_rv_waiters[1] = 0;
 	smp_rv_waiters[2] = 0;
+	smp_rv_waiters[3] = 0;
 	atomic_store_rel_int(&smp_rv_waiters[0], 0);

 	/* signal other processors, which will enter the IPI with interrupts off */
@@ -395,7 +400,7 @@
 		smp_rendezvous_action();

 	if (teardown_func == smp_no_rendevous_barrier)
-		while (atomic_load_acq_int(&smp_rv_waiters[2]) < ncpus)
+		while (atomic_load_acq_int(&smp_rv_waiters[3]) < ncpus)
 			cpu_spinwait();

 	/* release lock */

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DCBBEF5.4090004>