Date: Tue, 10 Mar 2020 00:22:09 +0100 From: Mateusz Guzik <mjguzik@gmail.com> To: Konstantin Belousov <kostikbel@gmail.com> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r358733 - head/sys/sys Message-ID: <CAGudoHHxRvrABfNThT=9_bv1dXsSExKrnFTScrHBDY1Z6GzGgg@mail.gmail.com> In-Reply-To: <20200309213512.GS98340@kib.kiev.ua> References: <202003080022.0280MX9j055805@repo.freebsd.org> <CAGudoHE4hTN5niCAsA4tfggywkEnGxiaoPz=rTX3AHA=fgZfkQ@mail.gmail.com> <20200309213512.GS98340@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 3/9/20, Konstantin Belousov <kostikbel@gmail.com> wrote: > On Mon, Mar 09, 2020 at 01:56:17AM +0100, Mateusz Guzik wrote: >> On 3/8/20, Mateusz Guzik <mjg@freebsd.org> wrote: >> > Author: mjg >> > Date: Sun Mar 8 00:22:32 2020 >> > New Revision: 358733 >> > URL: https://svnweb.freebsd.org/changeset/base/358733 >> > >> > Log: >> > seqc: tidy up >> > >> > - avoid hand-rolled read >> > - match begin/end in terms of fence style >> > >> >> There were off lists questions about this so let me clarify. >> >> The first bit is a cosmetic change, but the second one is not. >> >> > Modified: >> > head/sys/sys/seqc.h >> > >> > Modified: head/sys/sys/seqc.h >> > ============================================================================== >> > --- head/sys/sys/seqc.h Sat Mar 7 15:37:23 2020 (r358732) >> > +++ head/sys/sys/seqc.h Sun Mar 8 00:22:32 2020 (r358733) >> > @@ -66,7 +66,8 @@ static __inline void >> > seqc_write_end(seqc_t *seqcp) >> > { >> > >> > - atomic_store_rel_int(seqcp, *seqcp + 1); >> > + atomic_thread_fence_rel(); >> > + *seqcp += 1; >> > MPASS(!seqc_in_modify(*seqcp)); >> > critical_exit(); >> > } >> >> For correct operation the counter has to be modified *before* any work >> is done and *after* it is completed. >> >> Consider a trivial case: >> seqc++; >> foo[0] = 0; >> foo[1] = 1; >> seqc++; >> >> There are 2 ways in which this can be mucked with: >> - the compiler can be looking at reordering or reworking the increment >> (e.g., to collapse it to just += 2 instead). a compiler barrier >> prevents that. >> - even with generated machine code which increments in correct places >> the cpu can be looking at reordering the operation (which is heavily >> dependent on architecture), in particular it could end up issuing >> seqc++; foo[1] = 1; which would defeat the point. This is where >> release fences come in. >> >> With the patched code there is: >> seqc++; >> atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo >> starts getting modified */ >> foo[0] = 0; >> foo[1] = 1; >> atomic_thread_fence_rel(); /* make sure modifications to foo don't >> leak past this spot */ >> seqc++; >> >> In comparison, the previous code was: >> seqc++; >> atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo >> starts getting modified */ >> foo[0] = 0; >> foo[1] = 1; >> atomic_store_rel_int(seqcp, *seqcp + 1); /* make sure modifications to >> too don't leak past this spot and update seqc immediately */ >> >> There are 2 differences here -- one is an improvement and second one >> does not matter: >> the win is: the previous code forces the compiler to compute an >> incremented value and then store it. On amd64 this translates to the >> following (with rdx holding the address of the counter): >> >> mov (%rdx),%eax /* load the value */ >> add $0x1,%eax /* add 1 */ >> mov %eax,(%rdx) /* store it */ >> >> On patched kernel this is: >> addl $0x1,(%rdx) >> >> which is clearly much nicer. > I am not sure that the new code on amd64 is much nicer. > > But the point was that on non-amd64, i.e. armv8 and potentially on > powerpc 3.0, the patch substitutes release store with full barrier. The > later is (much) slower. > If an arch performs something significantly more expensive here it's probably a performance bug in the port. But perhaps more importantly there are significantly more frequent users of atomic_thread_fence_rel, like refcount(9). If the issue is real that should be looked at. >> >> the not a problem: the code does not issue the release fence after >> incrementing the counter the second time, meaning that in principle it >> can happen arbitrarily late, possibly disturbing code which waits for >> the counter to become even. This is not a problem because in whoever >> modifies it is supposed to have a lock and release it shortly after, >> which also posts the release fence and consequently makes sure the >> counter update is visible. Also note these routines come with >> preemption disablement around the modification -- if the thread gets >> preempted as a result of critical_exit from seqc_write_end, it will >> also publish the updated counter. If the updater never does anything >> which flushes the store buffer then in the absolute worst case they >> will get preempted, at which point once more we are covered. Note that >> code which wants more deterministic behavior will want to >> seqc_read_any, test the counter once and if it is caught uneven resort >> to locking. >> >> > @@ -84,7 +85,7 @@ seqc_read(const seqc_t *seqcp) >> > seqc_t ret; >> > >> > for (;;) { >> > - ret = atomic_load_acq_int(__DECONST(seqc_t *, seqcp)); >> > + ret = seqc_read_any(seqcp); >> > if (__predict_false(seqc_in_modify(ret))) { >> > cpu_spinwait(); >> > continue; >> > _______________________________________________ >> > svn-src-all@freebsd.org mailing list >> > https://lists.freebsd.org/mailman/listinfo/svn-src-all >> > To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" >> > >> >> >> -- >> Mateusz Guzik <mjguzik gmail.com> > -- Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHHxRvrABfNThT=9_bv1dXsSExKrnFTScrHBDY1Z6GzGgg>