Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 May 2020 04:29:13 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 246475] Unread entries potentially lost in buf_ring after ABA condition
Message-ID:  <bug-246475-227@https.bugs.freebsd.org/bugzilla/>

index | next in thread | raw e-mail

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246475

            Bug ID: 246475
           Summary: Unread entries potentially lost in buf_ring after ABA
                    condition
           Product: Base System
           Version: 12.1-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: m@rtin.uy

Created attachment 214511
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=214511&action=edit
bug_ring memory corruption proof-of-concept and execution trace

We [1] found a potential memory corruption bug in the implementation of
FreeBSD's ring-buffer. Under specific scheduling conditions, and assuming a
multiple-producers scenario, it might be possible to break the 'one entry
always empty' invariant and overwrite unread values. Looks to us that release
12.1.0 [2] is affected, and previous releases may be affected as well.

The bug appears to be an ABA problem in the buf_ring_enqueue function. If
br->br_prod_head wraps-around the ring-buffer to have the same value than what
was originally read by a Producer, the CAS will succeed but the 'unnoticed'
activity may have moved cons_tail. In particular, it may have moved cons_tail
to the point that the previous "prod_next == cons_tail" check result is now
invalid. The Producer will move forward instead of checking again. As a result,
a Producer may write the entry supposed to be empty. With no-stopper empty
entry, following producers will start overwriting unread values.

We believe that short ring-buffers in highly concurrent scenarios are the most
likely circumstances in which this could happen. That is because of the easier
prod_head wrap-around.

In regards to fixing this issue, a double-word or wide CAS that includes a
counter as part of the condition should be enough to make the ABA unfeasible.
This would require some support from the architecture, though. Other
ring-buffer implementations such as DPDK [3] take a slightly different
approach, which should provide an equivalent level of security and is easier to
implement on any architecture. Extrapolated to FreeBSD's buf_ring, this would
be to let prod_head take values across the whole uint32 range and use the mask
only when reading or writing entries. For the ABA to occur, the whole uint32
range would need to be wrapped-around -which looks unrealistic-.

Attached to this bug report you will find our proof-of-concept. It is a Linux
user-space port of FreeBSD's ring-buffer. We instrumented buf_ring.h to
generate proper scheduling conditions, and implemented all the scaffolding
around. We tried to mock structures and functions with reasonable Linux x86_64
equivalents. Please note that our proof-of-concept has -for illustration
purposes- more Producers than what is actually needed: 2 Producers and 1
Consumer should be enough. Attached to this bug report you will also find a
trace of execution.

It has not been under the scope of our investigation to trigger this bug from
FreeBSD user-space. It well be the case that the bug is there but there is no
current way of triggering it due to the constraints imposed by the context.
However, we suggest fixing it on any case to avoid a show up in the future.

--
[1] - Andres Lopez, Martin Di Paola and Martin Balao
[2] -
https://svnweb.freebsd.org/base/release/12.1.0/sys/sys/buf_ring.h?view=co
[3] - http://git.dpdk.org/dpdk/tree/lib/librte_ring/rte_ring_c11_mem.h

-- 
You are receiving this mail because:
You are the assignee for the bug.

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-246475-227>