Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 09 Apr 2015 23:08:03 +0200
From:      Tobias Oberstein <tobias.oberstein@gmail.com>
To:        Jim Harris <jim.harris@gmail.com>,  Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, Michael Fuckner <michael@fuckner.net>, Alan Somers <asomers@freebsd.org>
Subject:   Re: NVMe performance 4x slower than expected
Message-ID:  <5526EA33.6090004@gmail.com>
In-Reply-To: <CAJP=Hc-WLKe3%2BDQ=2o21CY=aaQAjADrzEfnD7NVO1Cotu4vcGg@mail.gmail.com>
References:  <551BC57D.5070101@gmail.com>	<CAOtMX2jVwMHSnQfphAF%2Ba2%2Bo7eLp62nHmUo4t%2BEahrXLWReaFQ@mail.gmail.com>	<CAJP=Hc-RNVuhPePg7bnpmT4ByzyXs_CNvAs7Oy7ntXjqhZYhCQ@mail.gmail.com>	<551C5A82.2090306@gmail.com>	<20150401212303.GB2379@kib.kiev.ua>	<CAJP=Hc87FMYCrQYGfAtefQ8PLT3WtnvPfPSppp3zRF-0noQR9Q@mail.gmail.com> <CAJP=Hc-WLKe3%2BDQ=2o21CY=aaQAjADrzEfnD7NVO1Cotu4vcGg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Jim,

thanks for coming back to this and your work / infos - highly appreciated!

>  (Based on your ramdisk performance data, it does not
> appear that lack of per-CPU NVMe I/O queues is the cause of the performance
> issues on this system -

My unscientific gut feeling is: it might be related to NUMA in general.

The memory performance

https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd_memperf.md#results-48-core-numa-machine

is slower than a E3 single socket Xeon

https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd_memperf.md#results-small-xeon-machine

The E3 is Haswell at 3.4 GHz, whereas the E7 is one gen. older and 3.0 
GHz, but I don't think this explains the very large difference.

The 4 socket box should have an aggregate main memory bandwidth of

4 x 85GB/s = 340 GB/s

The measured numbers are orders smaller.

> but I'm working to confirm on a system in my lab.)

FWIW, the box I am testing is

http://www.quantaqct.com/Product/Servers/Rackmount-Servers/4U/QuantaGrid-Q71L-4U-p18c77c70c83c79

The box is maxed out on RAM, CPU (mostly), internal SSDs, as well as 
PCIe cards (it has 10 slots). There are very few x86 systems with bigger 
scale-up. Tops out with the SGI Ultraviolet UV2000. But this is totally 
exotic, whereas above is pure Intel design.

How about Intel donating such a baby to FBSD foundation to get NUMA and 
everything sorted out? Street price is roughly 150k, but given most of 
the components are made by Intel, should be cheaper for Intel;)

==

Sadly, given the current state of affairs, I couldn't support targeting 
FreeBSD on this system any longer. Customer wants to go to production 
soonish. We'll be using Linux / SLES12. Performance at block device 
level there is as expected from Intel datasheets. Means: massive! We now 
"only" need to translate those millions of IOPS from block device to 
filesystem level and then database (PostgreSQL). Ha, will be fun;) And I 
will miss ZFS and all the FreeBSD goodies =(

Cheers,
/Tobias




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5526EA33.6090004>