From owner-freebsd-current@FreeBSD.ORG Sun May 15 07:10:19 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 28B651065672; Sun, 15 May 2011 07:10:19 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 3CF858FC12; Sun, 15 May 2011 07:10:17 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA09849; Sun, 15 May 2011 10:10:15 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QLVSw-0004wB-U8; Sun, 15 May 2011 10:10:14 +0300 Message-ID: <4DCF7C55.3030404@FreeBSD.org> Date: Sun, 15 May 2011 10:10:13 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110503 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: John Baldwin References: <4DCD357D.6000109@FreeBSD.org> <4DCE9EF0.3050803@FreeBSD.org> In-Reply-To: <4DCE9EF0.3050803@FreeBSD.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD current , Peter Grehan Subject: Re: proposed smp_rendezvous change X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 May 2011 07:10:19 -0000 on 14/05/2011 18:25 John Baldwin said the following: > On 5/13/11 9:43 AM, Andriy Gapon wrote: >> >> This is a change in vein of what I've been doing in the xcpu branch and it's >> supposed to fix the issue by the recent commit that (probably unintentionally) >> stress-tests smp_rendezvous in TSC code. >> >> Non-essential changes: >> - ditch initial, and in my opinion useless, pre-setup rendezvous in >> smp_rendezvous_action() > > As long as IPIs ensure all data is up to date (I think this is certainly true on > x86) that is fine. Presumably sending an IPI has an implicit store barrier on > all other platforms as well? Well, one certainly can use IPIs as memory barrier, but my point was that we have other ways to have a memory barrier and using IPI for that was not necessary (and a little bit harmful to performance) in this case. >> Essential changes (the fix): >> - re-use freed smp_rv_waiters[2] to indicate that a slave/target is really fully >> done with rendezvous (i.e. it's not going to access any members of smp_rv_* >> pseudo-structure) >> - spin on smp_rv_waiters[2] upon _entry_ to smp_rendezvous_cpus() to not re-use >> the smp_rv_* pseudo-structure too early > > Hmmm, so this is not actually sufficient. NetApp ran into a very similar race > with virtual CPUs in BHyVe. In their case because virtual CPUs are threads that > can be preempted, they have a chance at a longer race. Just a quick question - have you noticed that because of the change above the smp_rv_waiters[2] of which I spoke was not the same smp_rv_waiters[2] as in the original cod? Because I "removed" smp_rv_waiters[0], smp_rv_waiters[2] is actually some new smp_rv_waiters[3]. And well, I think I described exactly the same scenario as you did in my email on the svn mailing list. So of course I had it in mind: http://www.mail-archive.com/svn-src-all@freebsd.org/msg38637.html My problem, I should have not mixed different changes into the same patch, for clarity. I should have provided two patches: one that adds smp_rv_waiters[3] and its handling and one that "removes" smp_rv_waiters[0]. I would to see my proposed patch actually tested, if possible, before it's dismissed :-) -- Andriy Gapon