Date: Thu, 09 Apr 2015 23:08:03 +0200 From: Tobias Oberstein <tobias.oberstein@gmail.com> To: Jim Harris <jim.harris@gmail.com>, Konstantin Belousov <kostikbel@gmail.com> Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, Michael Fuckner <michael@fuckner.net>, Alan Somers <asomers@freebsd.org> Subject: Re: NVMe performance 4x slower than expected Message-ID: <5526EA33.6090004@gmail.com> In-Reply-To: <CAJP=Hc-WLKe3%2BDQ=2o21CY=aaQAjADrzEfnD7NVO1Cotu4vcGg@mail.gmail.com> References: <551BC57D.5070101@gmail.com> <CAOtMX2jVwMHSnQfphAF%2Ba2%2Bo7eLp62nHmUo4t%2BEahrXLWReaFQ@mail.gmail.com> <CAJP=Hc-RNVuhPePg7bnpmT4ByzyXs_CNvAs7Oy7ntXjqhZYhCQ@mail.gmail.com> <551C5A82.2090306@gmail.com> <20150401212303.GB2379@kib.kiev.ua> <CAJP=Hc87FMYCrQYGfAtefQ8PLT3WtnvPfPSppp3zRF-0noQR9Q@mail.gmail.com> <CAJP=Hc-WLKe3%2BDQ=2o21CY=aaQAjADrzEfnD7NVO1Cotu4vcGg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Jim, thanks for coming back to this and your work / infos - highly appreciated! > (Based on your ramdisk performance data, it does not > appear that lack of per-CPU NVMe I/O queues is the cause of the performance > issues on this system - My unscientific gut feeling is: it might be related to NUMA in general. The memory performance https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd_memperf.md#results-48-core-numa-machine is slower than a E3 single socket Xeon https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd_memperf.md#results-small-xeon-machine The E3 is Haswell at 3.4 GHz, whereas the E7 is one gen. older and 3.0 GHz, but I don't think this explains the very large difference. The 4 socket box should have an aggregate main memory bandwidth of 4 x 85GB/s = 340 GB/s The measured numbers are orders smaller. > but I'm working to confirm on a system in my lab.) FWIW, the box I am testing is http://www.quantaqct.com/Product/Servers/Rackmount-Servers/4U/QuantaGrid-Q71L-4U-p18c77c70c83c79 The box is maxed out on RAM, CPU (mostly), internal SSDs, as well as PCIe cards (it has 10 slots). There are very few x86 systems with bigger scale-up. Tops out with the SGI Ultraviolet UV2000. But this is totally exotic, whereas above is pure Intel design. How about Intel donating such a baby to FBSD foundation to get NUMA and everything sorted out? Street price is roughly 150k, but given most of the components are made by Intel, should be cheaper for Intel;) == Sadly, given the current state of affairs, I couldn't support targeting FreeBSD on this system any longer. Customer wants to go to production soonish. We'll be using Linux / SLES12. Performance at block device level there is as expected from Intel datasheets. Means: massive! We now "only" need to translate those millions of IOPS from block device to filesystem level and then database (PostgreSQL). Ha, will be fun;) And I will miss ZFS and all the FreeBSD goodies =( Cheers, /Tobias
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5526EA33.6090004>