Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Jul 2017 23:24:42 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 219399] System panics after several hours of 14-threads-compilation orgies using poudriere on AMD Ryzen...
Message-ID:  <bug-219399-8-fQLZvKIIwJ@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-219399-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-219399-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219399

--- Comment #65 from Don Lewis <truckman@FreeBSD.org> ---
(In reply to SF from comment #63)

The motherboard I'm currently using has six Vcore VRM phases.  Basically the
top of the line for Gigabyte AM4 boards.  The only difference between this
board and the Gigabyte flagship is that this board doesn't have an adjustab=
le
bclk.

I basically didn't see any difference between this board and the B350 board
that I was initially using.  Both crashed or locked up when doing parallel
compiles, but both survived running 16 threads of Prime95 (actually mprime =
on
FreeBSD because I don't have Windows).

This X370 board has problems with SMT off and half the cores disabled, so
basically only four parallel threads running.  That should hardly stress the
PSU or VRM at all and temperatures should be pretty low.  Even with everyth=
ing
on, the idle temps in the BIOS look good, so I don't think it's a thermal
problem.  My last crash was early this morning, when the room temperature w=
as a
lot lower than when the machine was running happily last evening.  There ar=
e no
VRM knobs in the Gigabyte BIOS other than voltage and LLC.  I would think t=
hose
wouldn't
be critical at 1/4 load ...

It doesn't appear to be a RAM timing problem.  Cranking the RAM speed down
basically has no effect.   ECC should be working so if a single bit error
cropped up, it should get corrected.  Memtest86 was clean, even the rowhamm=
er
test.

The crashes seem to be fairly random.  Restarting the ports that were build=
ing
at the time of a crash is often successful.

The run that I did after upgrading to AGESA 1006 was by far the best.  With=
 all
eight cores enabled but SMT still off, poudriere ran for a bit more than 10
hours.  As I previously mentioned three ports failed due to the jemalloc
problem, but the machine stayed up.  I restarted poudriere and those ports
built as well as a number of ports that depended on them.  The build ran fo=
r a
few hours, but the machine silently rebooted before poudriere finished.   W=
hen
I restarted poudriere, all but one of the remaining ports built.  I did see=
 any
obvious error in the log for the failing port, but it successfully built wh=
en I
ran poudriere another time.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-219399-8-fQLZvKIIwJ>