Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Apr 2015 18:46:26 +0200
From:      Tobias Oberstein <tobias.oberstein@gmail.com>
To:        Jim Harris <jim.harris@gmail.com>
Cc:        Adrian Chadd <adrian@freebsd.org>,  Konstantin Belousov <kostikbel@gmail.com>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,  Michael Fuckner <michael@fuckner.net>, Alan Somers <asomers@freebsd.org>
Subject:   Re: NVMe performance 4x slower than expected
Message-ID:  <553E67E2.2050404@gmail.com>
In-Reply-To: <CAJP=Hc8RTmFK5DL%2BToiRbcSMQbBtTKVkGMtp2FJWbz=AY%2BQQBw@mail.gmail.com>
References:  <551BC57D.5070101@gmail.com>	<CAOtMX2jVwMHSnQfphAF%2Ba2%2Bo7eLp62nHmUo4t%2BEahrXLWReaFQ@mail.gmail.com>	<CAJP=Hc-RNVuhPePg7bnpmT4ByzyXs_CNvAs7Oy7ntXjqhZYhCQ@mail.gmail.com>	<551C5A82.2090306@gmail.com>	<20150401212303.GB2379@kib.kiev.ua>	<CAJP=Hc87FMYCrQYGfAtefQ8PLT3WtnvPfPSppp3zRF-0noQR9Q@mail.gmail.com>	<CAJP=Hc-WLKe3%2BDQ=2o21CY=aaQAjADrzEfnD7NVO1Cotu4vcGg@mail.gmail.com>	<5526EA33.6090004@gmail.com>	<CAJ-VmonecBDemkfS=3nV2jiuJfOFJg7bZOacxOKXvTWktxBd9A@mail.gmail.com>	<5527F554.2030806@gmail.com> <CAJP=Hc8RTmFK5DL%2BToiRbcSMQbBtTKVkGMtp2FJWbz=AY%2BQQBw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Jim,

I have now done extensive tests under Linux (SLES12) at the block device
level.

8kB Random IO results:
http://tavendo.com.s3.amazonaws.com/scratch/fio_p3700_8kB_random.pdf

All results:
http://tavendo.com.s3.amazonaws.com/scratch/fio_p3700.pdf

What becomes apparent is:

1) IOPS is scaling nicely "linear" for (software) RAID-0

It scales up to roughly 2.2 Mio. 8kB random reads, and 750k 8kB random
writes. Extrapolating Intel's datasheet would give: 2.36 Mio / 720k

Awesome!

2) It does not scale for RAID-1.

In fact, the write performance fully collapses for more than 4 devices.

Note: I don't know which NVMe is wired to which CPU socket, and which
block device - IOW: I did not "handplace" the devices into RAID sets or
anything.

==

I am currently running the same set of tests against 10 DC S3700 via SAS.

This should reveal if it's a general mdadm thing, or NVMe related.

==

For now, we likely will use the NVMes in a RAID-0 setup to leverage the
maximum performance.

Cheers,
/Tobias


Am 10.04.2015 um 18:58 schrieb Jim Harris:
> On Fri, Apr 10, 2015 at 9:07 AM, Tobias Oberstein <
> tobias.oberstein@gmail.com> wrote:
> 
>> Hi Adrian,
>>
>>> Dell has graciously loaned me a bunch of hardware to continue doing
>>
>> FWIW, Dell has a roughly comparable system: Dell R920. But they don't have
>> Intel NVMe's on their menu, only Samsung (and FusionIO, but that's not
>> NVMe).
>>
>>  NUMA development on, but I have no NVMe hardware. I'm hoping people at
>>>
>>
>> The 8 NVMe PCIe SSDs in the box we're deploying are a key feature of this
>> system (will be a data-warehouse). A single NVMe probably won't have
>> triggered (all) issues we experienced.
>>
>> We are using the largest model (2TB), and this amounts to 50k bucks for
>> all eight. The smallest model (400GB) is 1.5k, so 12k in total.
>>
>>  Intel can continue kicking along any desires for NUMA that they
>>> require. (Which they have, fwiw.)
>>>
>>
>> It's already awesome that Intel has senior engineers working on FreeBSD
>> driver code! And it would underline Intel's Open-source commitment and tech
>> leadership if they donated a couple of these beefy NVMes.
>>
> 
> Intel has agreed to send DC P3700 samples to the FreeBSD Foundation to put
> in the cluster for this kind of work - we are working on getting these
> through the internal sample distribution process at the moment.
> 
> -Jim
> 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?553E67E2.2050404>