Date: Thu, 25 Apr 2019 11:22:22 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Wojciech Macek <wma@semihalf.com> Cc: Mark Johnston <markj@freebsd.org>, Wojciech Macek <wma@freebsd.org>, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r346593 - head/sys/sys Message-ID: <20190425082222.GJ12936@kib.kiev.ua> In-Reply-To: <CANsEV8ca_y9EGxRQGoi%2BCMbCBn-8cfw_MJcRtujgP7vE0n_JKQ@mail.gmail.com> References: <201904230636.x3N6aWQK057863@repo.freebsd.org> <20190425040817.GA3789@spy> <CANsEV8ca_y9EGxRQGoi%2BCMbCBn-8cfw_MJcRtujgP7vE0n_JKQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Apr 25, 2019 at 07:38:21AM +0200, Wojciech Macek wrote: > Intel does not reorder reads against the condition "if" here. I know for > sure that ARM does, but therestill might be some other architectures that > also suffers such behavior - I just don't have any means to verify. > I remember the discussion for rS302292 where we agreed that this kind of > patches should be the least impacting in perfomrance as possible. Adding > unconditional memory barrier causes significant performance drop on Intel, > where in fact, the issue was never seen. > Atomic_thread_fence_acq() is nop on x86, or rather, it is compiler memory barrier. If you need read/read fence on some architectures, I am sure that you need compiler barrier on all. > Wojtek > > czw., 25 kwi 2019 o 06:08 Mark Johnston <markj@freebsd.org> napisaĆ(a): > > > On Tue, Apr 23, 2019 at 06:36:32AM +0000, Wojciech Macek wrote: > > > Author: wma > > > Date: Tue Apr 23 06:36:32 2019 > > > New Revision: 346593 > > > URL: https://svnweb.freebsd.org/changeset/base/346593 > > > > > > Log: > > > This patch offers a workaround to buf_ring reordering > > > visible on armv7 and armv8. Similar issue to rS302292. > > > > > > Obtained from: Semihalf > > > Authored by: Michal Krawczyk <mk@semihalf.com> > > > Approved by: wma > > > Differential Revision: https://reviews.freebsd.org/D19932 > > > > > > Modified: > > > head/sys/sys/buf_ring.h > > > > > > Modified: head/sys/sys/buf_ring.h > > > > > ============================================================================== > > > --- head/sys/sys/buf_ring.h Tue Apr 23 04:06:26 2019 (r346592) > > > +++ head/sys/sys/buf_ring.h Tue Apr 23 06:36:32 2019 (r346593) > > > @@ -310,14 +310,23 @@ buf_ring_peek_clear_sc(struct buf_ring *br) > > > if (!mtx_owned(br->br_lock)) > > > panic("lock not held on single consumer dequeue"); > > > #endif > > > - /* > > > - * I believe it is safe to not have a memory barrier > > > - * here because we control cons and tail is worst case > > > - * a lagging indicator so we worst case we might > > > - * return NULL immediately after a buffer has been enqueued > > > - */ > > > + > > > if (br->br_cons_head == br->br_prod_tail) > > > return (NULL); > > > + > > > +#if defined(__arm__) || defined(__aarch64__) > > > + /* > > > + * The barrier is required there on ARM and ARM64 to ensure, that > > > + * br->br_ring[br->br_cons_head] will not be fetched before the > > above > > > + * condition is checked. > > > + * Without the barrier, it is possible, that buffer will be fetched > > > + * before the enqueue will put mbuf into br, then, in the > > meantime, the > > > + * enqueue will update the array and the br_prod_tail, and the > > > + * conditional check will be true, so we will return previously > > fetched > > > + * (and invalid) buffer. > > > + */ > > > + atomic_thread_fence_acq(); > > > +#endif > > > > Why is it specific to ARM? > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190425082222.GJ12936>