From owner-svn-src-head@FreeBSD.ORG Thu May 12 11:05:29 2011 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E746E1065670; Thu, 12 May 2011 11:05:29 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id BEF358FC16; Thu, 12 May 2011 11:05:27 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA24238; Thu, 12 May 2011 14:05:25 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4DCBBEF5.4090004@FreeBSD.org> Date: Thu, 12 May 2011 14:05:25 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110504 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: John Baldwin References: <201105091734.p49HY0P3006180@svn.freebsd.org> <20110512024956.996cd973.stas@FreeBSD.org> <4DCBB9EE.8070809@FreeBSD.org> <20110512035522.e42b379c.stas@FreeBSD.org> <4DCBBCBE.5020004@FreeBSD.org> In-Reply-To: <4DCBBCBE.5020004@FreeBSD.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: src-committers@FreeBSD.org, neel@FreeBSD.org, svn-src-all@FreeBSD.org, Stanislav Sedov , svn-src-head@FreeBSD.org, Jung-uk Kim Subject: Re: svn commit: r221703 - in head/sys: amd64/include i386/include x86/isa x86/x86 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 11:05:30 -0000 on 12/05/2011 13:55 John Baldwin said the following: > On 5/12/11 6:55 AM, Stanislav Sedov wrote: >> On Thu, 12 May 2011 13:43:58 +0300 >> Andriy Gapon mentioned: >> >>> >>> Theory: >>> - smp_rv_waiters[2] becomes equal to smp_rv_ncpus >>> - [at least] one slave CPU is still in the last call to cpu_spinwait() in >>> smp_rendezvous_action() >>> - master CPU notices that the condition is true, exits smp_rendezvous_cpus() and >>> calls it again >>> - the slave CPU is still in spinwait >>> - the master CPU resets smp_rv_waiters[2] to zero >>> - the slave CPU exits spinwait, see smp_rv_waiters[2] with zero value >>> - endless loop >>> >> >> That might explain it. >> Do you have a patch for me to try? >> >> Thanks! >> > > The NetApp folks working on BHyVe also ran into this. They have a fix that I > think sounds reasonable which is to add a generation count to the smp rendezvous > "structure" and have waiting CPUs stop waiting if the generation count changes. > This is an adaption of my patch in xcpu branch to head (not tested): Index: sys/kern/subr_smp.c =================================================================== --- sys/kern/subr_smp.c (revision 221521) +++ sys/kern/subr_smp.c (working copy) @@ -110,7 +110,7 @@ static void (*volatile smp_rv_action_func)(void *arg); static void (*volatile smp_rv_teardown_func)(void *arg); static void *volatile smp_rv_func_arg; -static volatile int smp_rv_waiters[3]; +static volatile int smp_rv_waiters[4]; /* * Shared mutex to restrict busywaits between smp_rendezvous() and @@ -338,11 +338,15 @@ /* spin on exit rendezvous */ atomic_add_int(&smp_rv_waiters[2], 1); - if (local_teardown_func == smp_no_rendevous_barrier) + if (local_teardown_func == smp_no_rendevous_barrier) { + atomic_add_int(&smp_rv_waiters[3], 1); return; + } while (smp_rv_waiters[2] < smp_rv_ncpus) cpu_spinwait(); + atomic_add_int(&smp_rv_waiters[3], 1); + /* teardown function */ if (local_teardown_func != NULL) local_teardown_func(local_func_arg); @@ -385,6 +389,7 @@ smp_rv_func_arg = arg; smp_rv_waiters[1] = 0; smp_rv_waiters[2] = 0; + smp_rv_waiters[3] = 0; atomic_store_rel_int(&smp_rv_waiters[0], 0); /* signal other processors, which will enter the IPI with interrupts off */ @@ -395,7 +400,7 @@ smp_rendezvous_action(); if (teardown_func == smp_no_rendevous_barrier) - while (atomic_load_acq_int(&smp_rv_waiters[2]) < ncpus) + while (atomic_load_acq_int(&smp_rv_waiters[3]) < ncpus) cpu_spinwait(); /* release lock */ -- Andriy Gapon