From owner-freebsd-current@FreeBSD.ORG Mon Jul 18 16:20:34 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B79A9106566B; Mon, 18 Jul 2011 16:20:34 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 740A88FC19; Mon, 18 Jul 2011 16:20:34 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 0E9BF46B2A; Mon, 18 Jul 2011 12:20:34 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 9D2348A02C; Mon, 18 Jul 2011 12:20:33 -0400 (EDT) From: John Baldwin To: Andriy Gapon Date: Mon, 18 Jul 2011 12:20:32 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4DCD357D.6000109@FreeBSD.org> <4DCFFE52.1090002@FreeBSD.org> <4E241A33.2000009@FreeBSD.org> In-Reply-To: <4E241A33.2000009@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201107181220.33052.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 18 Jul 2011 12:20:33 -0400 (EDT) Cc: Max Laier , FreeBSD current , neel@freebsd.org Subject: Re: proposed smp_rendezvous change X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jul 2011 16:20:34 -0000 On Monday, July 18, 2011 7:34:11 am Andriy Gapon wrote: > on 15/05/2011 19:24 Andriy Gapon said the following: > > on 15/05/2011 19:09 Max Laier said the following: > >> > >> I don't think we ever intended to synchronize the local teardown part, and I > >> believe that is the correct behavior for this API. > >> > >> This version is sufficiently close to what I have, so I am resonably sure that > >> it will work for us. It seems, however, that if we move to check to after > >> picking up the lock anyway, the generation approach has even less impact and I > >> am starting to prefer that solution. > >> > >> Andriy, is there any reason why you'd prefer your approach over the generation > >> version? > > > > No reason. And I even haven't said that I prefer it :-) > > I just wanted to show and explain it as apparently there was some > > misunderstanding about it. I think that generation count approach could even > > have a little bit better performance while perhaps being a tiny bit less obvious. > > Resurrecting this conversation. > Here's a link to the thread on gmane for your convenience: > http://article.gmane.org/gmane.os.freebsd.current/132682 > > It seems that we haven't fully accounted for the special case of master CPU > behavior in neither my proposed fix that added a new wait count nor in the > committed fix that has added a generation count. > This has been pointed out by kib at the recent mini-DevSummit in Kyiv. > > The problem can be illustrated by the following example: > - the teardown function is a "real" function; that is, not NULL and not > smp_rendevous_no_ barrier (sic) > - the arg parameter is a non-NULL pointer > - the teardown function actually accesses memory pointed to by the arg > - the master CPU deallocates memory pointed by the arg right after > smp_rendezvous() call returns (either via free(9) or by stack management, etc) > > Quoting the code: > MPASS(generation == smp_rv_generation); > atomic_add_int(&smp_rv_waiters[2], 1); > if (local_teardown_func != smp_no_rendevous_barrier) { > while (smp_rv_waiters[2] < smp_rv_ncpus && > generation == smp_rv_generation) > cpu_spinwait(); > > if (local_teardown_func != NULL) > local_teardown_func(local_func_arg); > } > > Generation count helps slave CPUs to not get stuck at the while loop, but it > doesn't prevent stale/different arg being passed to the teardown function. > > So here is a patch, which is a variation of my earlier proposal, that should fix > this issue: > http://people.freebsd.org/~avg/smp_rendezvous.diff > The patch is not tested yet. > > The patch undoes the smp_rv_generation change and introduces a new > smp_rv_waiters[] element. Perhaps the code could be optimized by making waiting > on smp_rv_waiters[3] conditional on some combination of values for the teardown > and the arg, but I decided to follow the KISS principle here to make the code a > little bit more robust. Yes, I prefer the more robust case actually as it is clearer. This is fine with me. It would be good to get Max's feedback and possibly Neel's as well. -- John Baldwin