Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Feb 2018 08:41:54 -0500
From:      Mike Tancsa <mike@sentex.net>
To:        Mark Millard <marklmi26-fbsd@yahoo.com>, nimrodl@gmail.com, michaelp@bsquare.com, FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: Ryzen issues on FreeBSD ? (summary of 4 issues) (seemingly solved!)
Message-ID:  <fd4e0c67-a48a-9009-8992-6944a8ef3152@sentex.net>
In-Reply-To: <744bbe18-80c4-d057-c88d-fbe480ee9abb@sentex.net>
References:  <A6A50EB1-6944-40B4-9F33-002336F582E6@yahoo.com> <744bbe18-80c4-d057-c88d-fbe480ee9abb@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help
OK, this is all mostly solved for me it seems.

points below inline


On 1/24/2018 9:42 AM, Mike Tancsa wrote:
> I think perhaps a good time to summarize as a few issues seem to be going on
> 
> a) fragile BIOS settings. There seems to be a number of issues around
> RAM speeds and disabled C-STATES that impact stability.  Specifically,
> lowering the default frequency from 2400 to 2133 seems to help some
> users with crashes / lockups under heavy loads.

Also disabling core boost on non X cpus (ie 1600 vs 1600x) and making
sure the CPU is not overheating.  On my ASUS board using a back ported
version of amdtemp and amdsmn I confirmed the temp does not go above 50C
at full load.  Setting the FAN speed to turbo seems to help reduce the
max temp the CPU would get.

> b) CPUs manufactured prior to week 25 (some say week 33?) have a
> hardware defect that manifests itself as segfaults in heavy compiles.  I
> was able to confirm this on 1 of the CPUs I had using a Linux setup. It
> seems to confirm this, you need to physically look at the CPU for the
> manufacturing date :( Not sure how to trigger it on FreeBSD reliably,
> but there is a github project I used to verify on Linux
> (https://github.com/suaefar/ryzen-test)

AMD sent me 3 new CPUs without issue.  Turn around was about 1 week from
Canada to the US and back.

> 
> c) The idle lockup bug.  This *seems* to be confirmed on Linux as well
> http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen
> and
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690085

Perhaps the settings in a), as well as the most recent BIOS update seems
to have fixed this issue for me.  It sure seemed like a hardware issue,
but then again it could be a side effect of d). However, I was never
able to break into the debugger using a debugging kernel in HEAD so I
suspect it was more hardware related than anything.

BIOS Information
Vendor: American Megatrends Inc.
Version: 3803
Release Date: 01/22/2018
Address: 0xF0000
This is on a
Product Name: PRIME X370-PRO
Version: Rev X.0x


> 
> d) Compile failures of some ports.  For myself and one other user,
> compiling net/samba47 reliably hangs in roughly the same place.  Its not
> clear if this is related to any of the above bugs or not.

This too seems to be fixed!
The patch in
https://docs.freebsd.org/cgi/getmsg.cgi?fetch=417183+0+archive/2018/freebsd-hackers/20180211.freebsd-hackers

seems to stop the deadlock. I did 90 builds on RELENG_11 with this patch
over night and no deadlocks. For half the builds I had 2 guest VMs also
building. For the second half, it was the only thing running on the box
and its working as expected

All this just in time for my Epyc based system to arrive!


	---Mike



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fd4e0c67-a48a-9009-8992-6944a8ef3152>