From owner-freebsd-amd64@freebsd.org  Thu Mar 23 19:14:09 2017
Return-Path: <owner-freebsd-amd64@freebsd.org>
Delivered-To: freebsd-amd64@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F2CB9CBA53C
 for <freebsd-amd64@mailman.ysv.freebsd.org>;
 Thu, 23 Mar 2017 19:14:09 +0000 (UTC) (envelope-from se@freebsd.org)
Received: from mailout10.t-online.de (mailout10.t-online.de [194.25.134.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mailout00.t-online.de",
 Issuer "TeleSec ServerPass DE-2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B92A11EE2
 for <freebsd-amd64@freebsd.org>; Thu, 23 Mar 2017 19:14:09 +0000 (UTC)
 (envelope-from se@freebsd.org)
Received: from fwd27.aul.t-online.de (fwd27.aul.t-online.de [172.20.26.132])
 by mailout10.t-online.de (Postfix) with SMTP id B855541F5C52
 for <freebsd-amd64@freebsd.org>; Thu, 23 Mar 2017 20:14:06 +0100 (CET)
Received: from Stefans-MBP.fritz.box
 (VOfuG-ZHghBmuo--NQ3YNBUzWG0stw9P429lyrRttpDQ3K63KE8lwCjKmOqnUeeZgQ@[84.154.122.135])
 by fwd27.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-SHA encrypted)
 esmtp id 1cr8BQ-2FCb0S0; Thu, 23 Mar 2017 20:14:04 +0100
Subject: Re: FreeBSD on Ryzen
To: freebsd-amd64@freebsd.org
References: <201703222030.v2MKUJJs026400@gw.catspoiler.org>
From: Stefan Esser <se@freebsd.org>
Message-ID: <51b6c5d5-fc66-f371-ef54-c3d85a6f2c2d@freebsd.org>
Date: Thu, 23 Mar 2017 20:14:03 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0)
 Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <201703222030.v2MKUJJs026400@gw.catspoiler.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-ID: VOfuG-ZHghBmuo--NQ3YNBUzWG0stw9P429lyrRttpDQ3K63KE8lwCjKmOqnUeeZgQ
X-TOI-MSGID: fd4e067d-b9c7-48ee-8019-37cb3776045d
X-BeenThere: freebsd-amd64@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Porting FreeBSD to the AMD64 platform <freebsd-amd64.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-amd64>,
 <mailto:freebsd-amd64-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-amd64/>
List-Post: <mailto:freebsd-amd64@freebsd.org>
List-Help: <mailto:freebsd-amd64-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-amd64>,
 <mailto:freebsd-amd64-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Mar 2017 19:14:10 -0000

Am 22.03.17 um 21:30 schrieb Don Lewis:
> I put together a Ryzen 1700X machine over the weekend and installed the
> 12.0-CURRENT r315413 snapshot on it a couple of days ago.  The RAM is
> DDR4 2400.
> 
> First impression is that it's pretty zippy.  Compared to my previous
> fastest machine:
>   CPU: AMD FX-8320E Eight-Core Processor (3210.84-MHz K8-class CPU)
> make -j8 buildworld using tmpfs is a bit more than 2x faster.  Since the
> Ryzen has SMT, it's eight cores look like 16 CPUs to FreeBSD, I get
> almost a 2.6x speedup with -j16 as compared to my old machine.
> 
> I do see that the reported total CPU time increases quite a bit at -j16
> (~19900u) as compared to -j8 (~13600u) so it is running into some
> hardware bottlenecks that are slowing down instruction execution.  It
> could be the resources shared by both SMT threads that share each core,

It is the resources shared by the cores. Under full CPU load, SMT makes
a 3.3 GHz 8 core CPU "simulate" a ~2 GHz 16 core CPU.

The throughput is (in 1st order) proportional to cores * CPU clock, and
comes out as

	8 * 3.3 = 26.4  vs.  16 * ~2 = ~32  (estimated)

I'm positively surprised by the observed gain of +30% due to SMT. This
seems to match the reported user times:

13,600 /  8 = 1,700 seconds user time per physical core (on average)
19,900 / 16 = 1,244 seconds per virtual (SMT) core

vs. an estimate of the throughput with a CPU with SMT but without any
gain in throughput:

27,200 / 16 = 1,700 seconds per virtual core with ineffective SMT

(i.e. assuming SMT that does not increase effective IPC, resulting
in identical real time compared to the non-SMT case)

This result seems to match the increased performance when going from
-j 8 to -j 16:

27,200 / 19,900 = 2.7  ~  2.6 / 2.0

> or it could be cache or memory bandwidth related.  The Ryzen topology is
> a bit complicated. There are two groups of four cores, where each group
> of four cores shares half of the L3 cache, with a slowish interconnect
> bus between the groups.  This probably causes some NUMA-like issues.  I
> wonder if the ULE scheduler could be tweaked to handle this better.

I've been wondering whether it is possible to teach the scheduler about
above mentioned effect, i.e. by distinguishing a SMT core that executes
only 1 runnable thread from one that executes 2. The latter one should
be assumed to run at an estimated 60% clock (which makes both threads
proceed at 120% of the non-SMT speed).

OTOH, the lower "effective clock rate" should be irrelevant under high
load (when all cores are executing 2 threads), or under low load, when
some cores are idle (assuming, that the scheduler prefers to assign only
1 thread per each core until there are more runnable threads then cores.

If you assume that user time accounting is a raw measure of instructions
executed, then assuming a reduced clock rate would lead to "fairer"
results.