Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 May 2011 10:10:13 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        FreeBSD current <freebsd-current@FreeBSD.org>, Peter Grehan <grehan@FreeBSD.org>
Subject:   Re: proposed smp_rendezvous change
Message-ID:  <4DCF7C55.3030404@FreeBSD.org>
In-Reply-To: <4DCE9EF0.3050803@FreeBSD.org>
References:  <4DCD357D.6000109@FreeBSD.org> <4DCE9EF0.3050803@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
on 14/05/2011 18:25 John Baldwin said the following:
> On 5/13/11 9:43 AM, Andriy Gapon wrote:
>>
>> This is a change in vein of what I've been doing in the xcpu branch and it's
>> supposed to fix the issue by the recent commit that (probably unintentionally)
>> stress-tests smp_rendezvous in TSC code.
>>
>> Non-essential changes:
>> - ditch initial, and in my opinion useless, pre-setup rendezvous in
>> smp_rendezvous_action()
> 
> As long as IPIs ensure all data is up to date (I think this is certainly true on
> x86) that is fine.  Presumably sending an IPI has an implicit store barrier on
> all other platforms as well?

Well, one certainly can use IPIs as memory barrier, but my point was that we
have other ways to have a memory barrier and using IPI for that was not
necessary (and a little bit harmful to performance) in this case.

>> Essential changes (the fix):
>> - re-use freed smp_rv_waiters[2] to indicate that a slave/target is really fully
>> done with rendezvous (i.e. it's not going to access any members of smp_rv_*
>> pseudo-structure)
>> - spin on smp_rv_waiters[2] upon _entry_ to smp_rendezvous_cpus() to not re-use
>> the smp_rv_* pseudo-structure too early
> 
> Hmmm, so this is not actually sufficient.  NetApp ran into a very similar race
> with virtual CPUs in BHyVe.  In their case because virtual CPUs are threads that
> can be preempted, they have a chance at a longer race.

Just a quick question - have you noticed that because of the change above the
smp_rv_waiters[2] of which I spoke was not the same smp_rv_waiters[2] as in the
original cod?  Because I "removed" smp_rv_waiters[0], smp_rv_waiters[2] is
actually some new smp_rv_waiters[3].

And well, I think I described exactly the same scenario as you did in my email
on the svn mailing list.  So of course I had it in mind:
http://www.mail-archive.com/svn-src-all@freebsd.org/msg38637.html

My problem, I should have not mixed different changes into the same patch, for
clarity.  I should have provided two patches: one that adds smp_rv_waiters[3]
and its handling and one that "removes" smp_rv_waiters[0].

I would to see my proposed patch actually tested, if possible, before it's
dismissed :-)

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DCF7C55.3030404>