Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Jul 2017 07:05:58 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 219399] System panics after several hours of 14-threads-compilation orgies using poudriere on AMD Ryzen...
Message-ID:  <bug-219399-8-RmIDm26swj@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-219399-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-219399-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219399

--- Comment #119 from Don Lewis <truckman@FreeBSD.org> ---
(In reply to Nils Beyer from comment #115)
I have some suspicions about what might be going wrong with ryzen_segv_test,
but I really don't understand the memory fence / serialization stuff well
enough to be sure.

An experiment that I performed to try to get a better idea of where things
might be going off the rails was to add this code:
                if (func_set->func[func_set->offset] !=3D 0x8b) {
                        fprintf(stderr, "First opcode should be 0x8b, but f=
ound
0x%x\n", func_set->func[func_set->offset]);
                }
to thread1() in between the
                pf =3D (func_t)(&func_set->func[ func_set->offset ]);
and
                ret2 =3D pf(func_set);
to verify that the expected opcode was actually where we plan to jump to.  =
What
was interesting is that the error never triggered, *but* the frequency of
segfaults went way down.

That led me to look at what the mfence instruction actually does:
    Acts as a barrier to force strong memory ordering (serialization) betwe=
en
    load and store instructions preceding the MFENCE, and load and store
    instructions that follow the MFENCE. A weakly-ordered memory system
    allows the hardware to reorder reads and writes between the processor
    and memory. The MFENCE instruction guarantees that the system completes
    all previous memory accesses before executing subsequent accesses.

    The MFENCE instruction is weakly-ordered with respect to data and
    instruction prefetches.
Note the last sentence!

The mfence() at the end of the threadx() loop should flush all the pending
writes out to cache associated with core A before the lock is unlocked and
thread1() is permitted to do its work.

It looks to me like the mfence() at the top of the thread1() loop would not=
 do
anything since there aren't any interesting loads or stores that might be
outstanding for core B at that point.  One would think that serialize() aka=
 the
cpuid instruction would prevent instruction prefetching of old stale data
before that point, but maybe not.  I just don't understand this well enough=
 ...

Next I tried moving
                mfence();
                serialize();
from just after lock_enter*() to just before the call into the newly moved
data array:
                ret2 =3D pf(func_set);
This gives mfence() some memory loads to wait for, which allows the data to=
 be
migrated from the core A cache.  With this change, I no longer get any
segfaults.

Ryzen bug?  Just more aggressive prefetching?  I don't know ...

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-219399-8-RmIDm26swj>