Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Apr 2015 16:24:52 -0700
From:      Jim Harris <jim.harris@gmail.com>
To:        Tobias Oberstein <tobias.oberstein@gmail.com>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, Michael Fuckner <michael@fuckner.net>, Alan Somers <asomers@freebsd.org>
Subject:   Re: NVMe performance 4x slower than expected
Message-ID:  <CAJP=Hc_6BFpoWqkSRyZaxsN1Zn=-D14CXOQjMb4zjnZRKhMb-g@mail.gmail.com>
In-Reply-To: <551C6B62.7080205@gmail.com>
References:  <551BC57D.5070101@gmail.com> <CAOtMX2jVwMHSnQfphAF%2Ba2%2Bo7eLp62nHmUo4t%2BEahrXLWReaFQ@mail.gmail.com> <CAJP=Hc-RNVuhPePg7bnpmT4ByzyXs_CNvAs7Oy7ntXjqhZYhCQ@mail.gmail.com> <551C5A82.2090306@gmail.com> <20150401212303.GB2379@kib.kiev.ua> <CAJP=Hc87FMYCrQYGfAtefQ8PLT3WtnvPfPSppp3zRF-0noQR9Q@mail.gmail.com> <551C6B62.7080205@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 1, 2015 at 3:04 PM, Tobias Oberstein <tobias.oberstein@gmail.com
> wrote:

>     Is this vmstat after the test ?
>>
>
> No, it wasn't (I ran vmstat hours after the test).
>
> Here is right after test (shortened test duration, otherwise exactly the
> same FIO config):
>
> https://github.com/oberstet/scratchbox/blob/master/
> freebsd/cruncher/results/freebsd_vmstat.md#nvd7
>
>      Somewhat funny is that nvme does not use MSI(X).
>>
>>
>> Yes - this is exactly the problem.
>>
>> nvme does use MSI-X if it can allocate the vectors (one per core).  With
>> 48 cores,
>> I suspect we are quickly running out of vectors, so NVMe is reverting to
>> INTx.
>>
>> Could you actually send vmstat -ia (I left off the 'a' previously) -
>> just so we can
>> see all allocated interrupt vectors.
>>
>> As an experiment, can you try disabling hyperthreading - this will
>> reduce the
>>
>
> The CPUs in this box
>
> root@s4l-zfs:~/src/sys/amd64/conf # sysctl hw.model
> hw.model: Intel(R) Xeon(R) CPU E7-8857 v2 @ 3.00GHz
>
> don't have hyperthreading (we deliberately selected CPU model for max.
> clock rather than HT)
>
> http://ark.intel.com/products/75254/Intel-Xeon-Processor-E7-
> 8857-v2-30M-Cache-3_00-GHz
>
>  number of cores and should let you get MSI-X vectors allocated for at
>> least
>> the first couple of NVMe controllers.  Then please re-run your performance
>> test on one of those controllers.
>>
>>
> You mean I should run against nvdN where N is a controller that still got
> MSI-X while other controllers did not?
>
> How would I find out which controller N? I don't know which nvdN is
> mounted in a PCIe slot directly assigned to which CPU socket, and I don't
> know which one's still got MSI-X and which not.
>

vmstat -ia should show you which controllers were assigned per-core vectors
- you'll see all of them in the irq256+ range instead of the single vector
per controller you see now in the lower irq index range.


>
> I could arrange for disabling all but 1 CPU and retest. Would that help?
>

Yes - that would help.  Depending on how your system is configured, and
which CPU socket the NVMe controllers are attached to, you may need to keep
2 CPU sockets enabled.

You can also try a debug tunable that is in the nvme driver.

hw.nvme.per_cpu_io_queues=0

This would just try to allocate a single MSI-X vector per controller - so
all cores would still share a single I/O queue pair, but it would be MSI-X
instead of INTx.  (This actually should be the first fallback if we cannot
allocate per-core vectors).  Would at least show we are able to allocate
some number of MSI-X vectors for NVMe.


>
> ===
>
> Right after running against nvd7
>
> irq56: nvme0                        6440          0
> ...
> irq106: nvme7                     145056          3
>
>
> Then, immediately thereafter, running against nvd0
>
> https://github.com/oberstet/scratchbox/blob/master/
> freebsd/cruncher/results/freebsd_vmstat.md#nvd0
>
> irq56: nvme0                        9233          0
> ...
> irq106: nvme7                     145056          3
>
> ===
>
> Earlier this day, I ran multiple longer tests .. all against nvd7. So if
> these are cumulative numbers since last boot, that would make sense.
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJP=Hc_6BFpoWqkSRyZaxsN1Zn=-D14CXOQjMb4zjnZRKhMb-g>