Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Aug 2017 01:19:42 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 221029] AMD Ryzen: strange compilation failures using poudriere or plain buildkernel/buildworld
Message-ID:  <bug-221029-8-r7gs03Cv8l@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-221029-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-221029-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D221029

--- Comment #92 from Don Lewis <truckman@FreeBSD.org> ---
I'm building the same set of ports as I do on my FX-8320E machine so I have=
 a
reasonable idea of what to expect in terms of package build fallout.  Some
amount of core dump messages getting logged is fairly normal.

I've never seen the "Failed to fully fault in a core file segment" message.

The motherboard that I'm using has an igb interface.  The interrupt storm
messages are likely to be specific to that chip, driver, or motherboard.

I've set kern.sched.balance=3D0 for my testing since I suspect it could have
similar issues as kern.sched.steal_idle and I want to eliminate that source=
 of
noise.

On Ryzen, at the CPU topology level that only includes the SMT threads
belonging to one core, the steal_idle code will only steal a thread from the
other SMT thread if the load on that other SMT thread exceeds steal_thresh
(default 2).  At the other CPU hierarchy levels, the threshold for stealing=
 a
thread is hardwired to 1.  That could sometimes allow a thread to be stolen
from the other SMT thread on the same core even though that was not allowed=
 on
the previous iteration.  Since my last experiment (with steal_thresh=3D1)
exhibited a lot of random failures, I hacked the code to set the threshold =
at
the other hierarchy levels to 2.  I also hacked the code to only steal thre=
ads
from cores in the same CCX.  The results of this experiment only had two bu=
ild
failures.  One was the usual ghc SIGBUS, and the other appears to have been=
 a
SIGSEGV in lang/go14.  The latter was a bit different than the usual go bui=
ld
failures.  All of the ones that I have previously looked at appear to have =
been
caused by corruption of the internal malloc state.

fatal error: unexpected signal during runtime execution
[signal 0xb code=3D0x1 addr=3D0x0 pc=3D0x49890c]

runtime stack:
runtime.gothrow(0x6fd3f0, 0x2a)
        /usr/local/go14/src/runtime/panic.go:503 +0x8e
runtime.sigpanic()
        /usr/local/go14/src/runtime/sigpanic_unix.go:14 +0x5e
futexsleep()
        /usr/local/go14/src/runtime/os_freebsd.c:72 +0x6c
runtime.onM(0xc208349f50)
        /usr/local/go14/src/runtime/asm_amd64.s:273 +0x9a
runtime.futexsleep(0xc208116ed8, 0xc200000000, 0xffffffffffffffff)
        /usr/local/go14/src/runtime/os_freebsd.c:58 +0x73
runtime.notesleep(0xc208116ed8)
        /usr/local/go14/src/runtime/lock_futex.go:145 +0xae
stopm()
        /usr/local/go14/src/runtime/proc.c:1178 +0x119
exitsyscall0(0xc2082197a0)
        /usr/local/go14/src/runtime/proc.c:2020 +0xd8
runtime.mcall(0x49b4c4)
        /usr/local/go14/src/runtime/asm_amd64.s:186 +0x5a

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-221029-8-r7gs03Cv8l>