Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Mar 2020 21:23:26 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r358733 - head/sys/sys
Message-ID:  <20200310192326.GC1992@kib.kiev.ua>
In-Reply-To: <CAGudoHHxRvrABfNThT=9_bv1dXsSExKrnFTScrHBDY1Z6GzGgg@mail.gmail.com>
References:  <202003080022.0280MX9j055805@repo.freebsd.org> <CAGudoHE4hTN5niCAsA4tfggywkEnGxiaoPz=rTX3AHA=fgZfkQ@mail.gmail.com> <20200309213512.GS98340@kib.kiev.ua> <CAGudoHHxRvrABfNThT=9_bv1dXsSExKrnFTScrHBDY1Z6GzGgg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 10, 2020 at 12:22:09AM +0100, Mateusz Guzik wrote:
> On 3/9/20, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > On Mon, Mar 09, 2020 at 01:56:17AM +0100, Mateusz Guzik wrote:
> >> On 3/8/20, Mateusz Guzik <mjg@freebsd.org> wrote:
> >> > Author: mjg
> >> > Date: Sun Mar  8 00:22:32 2020
> >> > New Revision: 358733
> >> > URL: https://svnweb.freebsd.org/changeset/base/358733
> >> >
> >> > Log:
> >> >   seqc: tidy up
> >> >
> >> >   - avoid hand-rolled read
> >> >   - match begin/end in terms of fence style
> >> >
> >>
> >> There were off lists questions about this so let me clarify.
> >>
> >> The first bit is a cosmetic change, but the second one is not.
> >>
> >> > Modified:
> >> >   head/sys/sys/seqc.h
> >> >
> >> > Modified: head/sys/sys/seqc.h
> >> > ==============================================================================
> >> > --- head/sys/sys/seqc.h	Sat Mar  7 15:37:23 2020	(r358732)
> >> > +++ head/sys/sys/seqc.h	Sun Mar  8 00:22:32 2020	(r358733)
> >> > @@ -66,7 +66,8 @@ static __inline void
> >> >  seqc_write_end(seqc_t *seqcp)
> >> >  {
> >> >
> >> > -	atomic_store_rel_int(seqcp, *seqcp + 1);
> >> > +	atomic_thread_fence_rel();
> >> > +	*seqcp += 1;
> >> >  	MPASS(!seqc_in_modify(*seqcp));
> >> >  	critical_exit();
> >> >  }
> >>
> >> For correct operation the counter has to be modified *before* any work
> >> is done and *after* it is completed.
> >>
> >> Consider a trivial case:
> >> seqc++;
> >> foo[0] = 0;
> >> foo[1] = 1;
> >> seqc++;
> >>
> >> There are 2 ways in which this can be mucked with:
> >> - the compiler can be looking at reordering or reworking the increment
> >> (e.g., to collapse it to just += 2 instead). a compiler barrier
> >> prevents that.
> >> - even with generated machine code which increments in correct places
> >> the cpu can be looking at reordering the operation (which is heavily
> >> dependent on architecture), in particular it could end up issuing
> >> seqc++; foo[1] = 1; which would defeat the point. This is where
> >> release fences come in.
> >>
> >> With the patched code there is:
> >> seqc++;
> >> atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo
> >> starts getting modified */
> >> foo[0] = 0;
> >> foo[1] = 1;
> >> atomic_thread_fence_rel(); /* make sure modifications to foo don't
> >> leak past this spot */
> >> seqc++;
> >>
> >> In comparison, the previous code was:
> >> seqc++;
> >> atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo
> >> starts getting modified */
> >> foo[0] = 0;
> >> foo[1] = 1;
> >> atomic_store_rel_int(seqcp, *seqcp + 1); /* make sure modifications to
> >> too don't leak past this spot and update seqc immediately */
> >>
> >> There are 2 differences here -- one is an improvement and second one
> >> does not matter:
> >> the win is: the previous code forces the compiler to compute an
> >> incremented value and then store it. On amd64 this translates to the
> >> following (with rdx holding the address of the counter):
> >>
> >> mov    (%rdx),%eax /* load the value */
> >> add    $0x1,%eax    /* add 1 */
> >> mov    %eax,(%rdx) /* store it */
> >>
> >> On patched kernel this is:
> >> addl   $0x1,(%rdx)
> >>
> >> which is clearly much nicer.
> > I am not sure that the new code on amd64 is much nicer.
> >
> > But the point was that on non-amd64, i.e. armv8 and potentially on
> > powerpc 3.0, the patch substitutes release store with full barrier. The
> > later is (much) slower.
> >
> 
> If an arch performs something significantly more expensive here it's
> probably a performance bug in the port.
It is a question of how much hardware support for fine-grained fences.
Relaxed architectures add some optimized operations already, but did not
(yet) implemented complete C11 optimized set.
I think they eventually converge.

Until they did not, I think it is more important to not pessimize
to-be Tier1 arches than to collapse three fast integer instructions
on amd64 into one rw op.

> But perhaps more importantly
> there are significantly more frequent users of
> atomic_thread_fence_rel, like refcount(9). If the issue is real that
> should be looked at.
For refcount(9) use of release fence removal, we would need
	atomic_fetchadd_rel_int().

> 
> >>
> >> the not a problem: the code does not issue the release fence after
> >> incrementing the counter the second time, meaning that in principle it
> >> can happen arbitrarily late, possibly disturbing code which waits for
> >> the counter to become even. This is not a problem because in whoever
> >> modifies it is supposed to have a lock and release it shortly after,
> >> which also posts the release fence and consequently makes sure the
> >> counter update is visible. Also note these routines come with
> >> preemption disablement around the modification -- if the thread gets
> >> preempted as a result of critical_exit from seqc_write_end, it will
> >> also publish the updated counter. If the updater never does anything
> >> which flushes the store buffer then in the absolute worst case they
> >> will get preempted, at which point once more we are covered. Note that
> >> code which wants more deterministic behavior will want to
> >> seqc_read_any, test the counter once and if it is caught uneven resort
> >> to locking.
> >>
> >> > @@ -84,7 +85,7 @@ seqc_read(const seqc_t *seqcp)
> >> >  	seqc_t ret;
> >> >
> >> >  	for (;;) {
> >> > -		ret = atomic_load_acq_int(__DECONST(seqc_t *, seqcp));
> >> > +		ret = seqc_read_any(seqcp);
> >> >  		if (__predict_false(seqc_in_modify(ret))) {
> >> >  			cpu_spinwait();
> >> >  			continue;
> >> > _______________________________________________
> >> > svn-src-all@freebsd.org mailing list
> >> > https://lists.freebsd.org/mailman/listinfo/svn-src-all
> >> > To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
> >> >
> >>
> >>
> >> --
> >> Mateusz Guzik <mjguzik gmail.com>
> >
> 
> 
> -- 
> Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200310192326.GC1992>