From owner-freebsd-bugs@freebsd.org Thu Jul 13 23:24:43 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1ADF2DB02B7 for ; Thu, 13 Jul 2017 23:24:43 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0909468929 for ; Thu, 13 Jul 2017 23:24:43 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v6DNOgxr014843 for ; Thu, 13 Jul 2017 23:24:42 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 219399] System panics after several hours of 14-threads-compilation orgies using poudriere on AMD Ryzen... Date: Thu, 13 Jul 2017 23:24:42 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: truckman@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jul 2017 23:24:43 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219399 --- Comment #65 from Don Lewis --- (In reply to SF from comment #63) The motherboard I'm currently using has six Vcore VRM phases. Basically the top of the line for Gigabyte AM4 boards. The only difference between this board and the Gigabyte flagship is that this board doesn't have an adjustab= le bclk. I basically didn't see any difference between this board and the B350 board that I was initially using. Both crashed or locked up when doing parallel compiles, but both survived running 16 threads of Prime95 (actually mprime = on FreeBSD because I don't have Windows). This X370 board has problems with SMT off and half the cores disabled, so basically only four parallel threads running. That should hardly stress the PSU or VRM at all and temperatures should be pretty low. Even with everyth= ing on, the idle temps in the BIOS look good, so I don't think it's a thermal problem. My last crash was early this morning, when the room temperature w= as a lot lower than when the machine was running happily last evening. There ar= e no VRM knobs in the Gigabyte BIOS other than voltage and LLC. I would think t= hose wouldn't be critical at 1/4 load ... It doesn't appear to be a RAM timing problem. Cranking the RAM speed down basically has no effect. ECC should be working so if a single bit error cropped up, it should get corrected. Memtest86 was clean, even the rowhamm= er test. The crashes seem to be fairly random. Restarting the ports that were build= ing at the time of a crash is often successful. The run that I did after upgrading to AGESA 1006 was by far the best. With= all eight cores enabled but SMT still off, poudriere ran for a bit more than 10 hours. As I previously mentioned three ports failed due to the jemalloc problem, but the machine stayed up. I restarted poudriere and those ports built as well as a number of ports that depended on them. The build ran fo= r a few hours, but the machine silently rebooted before poudriere finished. W= hen I restarted poudriere, all but one of the remaining ports built. I did see= any obvious error in the log for the failing port, but it successfully built wh= en I ran poudriere another time. --=20 You are receiving this mail because: You are the assignee for the bug.=