Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Sep 2020 14:36:29 +0100
From:      Jessica Clarke <jrtc27@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Marcin Wojtas <mw@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-12@freebsd.org
Subject:   Re: svn commit: r365326 - stable/12/sys/sys
Message-ID:  <B8C7F5D3-E30F-4CD7-838D-62EE2C973733@freebsd.org>
In-Reply-To: <20200904130946.GG94807@kib.kiev.ua>
References:  <202009041122.084BMIYn040628@repo.freebsd.org> <20200904130946.GG94807@kib.kiev.ua>

index | next in thread | previous in thread | raw e-mail

On 4 Sep 2020, at 14:09, Konstantin Belousov <kostikbel@gmail.com> wrote:
> On Fri, Sep 04, 2020 at 11:22:18AM +0000, Marcin Wojtas wrote:
>> Author: mw
>> Date: Fri Sep  4 11:22:18 2020
>> New Revision: 365326
>> URL: https://svnweb.freebsd.org/changeset/base/365326
>> 
>> Log:
>>  MFC: r346593
>> 
>>  Add barrier in buf ring peek function to prevent race in ARM and ARM64.
>> 
>>  Obtained from: Semihalf
>>  Sponsored by: Amazon, Inc.
>> 
>> Modified:
>>  stable/12/sys/sys/buf_ring.h
>> Directory Properties:
>>  stable/12/   (props changed)
>> 
>> Modified: stable/12/sys/sys/buf_ring.h
>> ==============================================================================
>> --- stable/12/sys/sys/buf_ring.h	Fri Sep  4 04:31:56 2020	(r365325)
>> +++ stable/12/sys/sys/buf_ring.h	Fri Sep  4 11:22:18 2020	(r365326)
>> @@ -310,14 +310,23 @@ buf_ring_peek_clear_sc(struct buf_ring *br)
>> 	if (!mtx_owned(br->br_lock))
>> 		panic("lock not held on single consumer dequeue");
>> #endif	
>> -	/*
>> -	 * I believe it is safe to not have a memory barrier
>> -	 * here because we control cons and tail is worst case
>> -	 * a lagging indicator so we worst case we might
>> -	 * return NULL immediately after a buffer has been enqueued
>> -	 */
>> +
>> 	if (br->br_cons_head == br->br_prod_tail)
>> 		return (NULL);
>> +
>> +#if defined(__arm__) || defined(__aarch64__)
>> +	/*
>> +	 * The barrier is required there on ARM and ARM64 to ensure, that
>> +	 * br->br_ring[br->br_cons_head] will not be fetched before the above
>> +	 * condition is checked.
>> +	 * Without the barrier, it is possible, that buffer will be fetched
>> +	 * before the enqueue will put mbuf into br, then, in the meantime, the
>> +	 * enqueue will update the array and the br_prod_tail, and the
>> +	 * conditional check will be true, so we will return previously fetched
>> +	 * (and invalid) buffer.
>> +	 */
>> +	atomic_thread_fence_acq();
>> +#endif
> 
> Putting the semantic of the change aside, why did you added the fence (it is
> a fence, not barrier as stated in the comment) only to arm* ?   If it is
> needed, it is needed for all arches.

Agreed. The code looks fine, though I would have made it an acquire
load of br_prod_tail myself to be able to take advantage load-acquire
instructions when present, and better document what the exact issue is.
I also don't think the comment needs to be quite so extensive
(especially since atomic_load_acq_32 is somewhat self-documenting in
terms of one half of the race); if we had a comment like this for every
fence in the kernel we'd never get anything done.

There's also an ARM-specific fence in buf_ring_dequeue_sc:

> 	/*
> 	 * This is a workaround to allow using buf_ring on ARM and ARM64.
> 	 * ARM64TODO: Fix buf_ring in a generic way.
> 	 * REMARKS: It is suspected that br_cons_head does not require
> 	 *   load_acq operation, but this change was extensively tested
> 	 *   and confirmed it's working. To be reviewed once again in
> 	 *   FreeBSD-12.
> 	 *
> 	 * Preventing following situation:
> 
> 	 * Core(0) - buf_ring_enqueue()                                       Core(1) - buf_ring_dequeue_sc()
> 	 * -----------------------------------------                                       ----------------------------------------------
> 	 *
> 	 *                                                                                cons_head = br->br_cons_head;
> 	 * atomic_cmpset_acq_32(&br->br_prod_head, ...));
> 	 *                                                                                buf = br->br_ring[cons_head];     <see <1>>
> 	 * br->br_ring[prod_head] = buf;
> 	 * atomic_store_rel_32(&br->br_prod_tail, ...);
> 	 *                                                                                prod_tail = br->br_prod_tail;
> 	 *                                                                                if (cons_head == prod_tail) 
> 	 *                                                                                        return (NULL);
> 	 *                                                                                <condition is false and code uses invalid(old) buf>`	
> 	 *
> 	 * <1> Load (on core 1) from br->br_ring[cons_head] can be reordered (speculative readed) by CPU.
> 	 */	
> #if defined(__arm__) || defined(__aarch64__)
> 	cons_head = atomic_load_acq_32(&br->br_cons_head);
> #else
> 	cons_head = br->br_cons_head;
> #endif
> 	prod_tail = atomic_load_acq_32(&br->br_prod_tail);


The comment is completely correct that the ARM-specific fence is a
waste of time. It's the single-consumer path, so such fences are just
synchronising with the current thread and thus pointless. The important
one is the load-acquire of br_prod_tail, as has been discovered (sort
of) in the peek case leading to this comment, which already stops the
reordering in question.

Jess



help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B8C7F5D3-E30F-4CD7-838D-62EE2C973733>