From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 1 23:24:53 2015 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D9184D80; Wed, 1 Apr 2015 23:24:53 +0000 (UTC) Received: from mail-qg0-x230.google.com (mail-qg0-x230.google.com [IPv6:2607:f8b0:400d:c04::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8B91BC4D; Wed, 1 Apr 2015 23:24:53 +0000 (UTC) Received: by qgeb100 with SMTP id b100so16089231qge.3; Wed, 01 Apr 2015 16:24:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=YYXc5iRGwUJr54pWm5TDGBdrsHOKWsbK8L06JXYkxwc=; b=XetcNOV5Bm8NmMHH4lk3lPcimN7JphLJjPpOlWacuhppWGmj9X3b+yr6nHF4JAiaXK nx2iACADfr7p5vAHJEnGVuQnKZGlfAkvJVP5kxVN/MrF3w6vEAJROZqIqZESF7KqC10T 9fSSqWQp0KSt6ykhz2ywpOK6tn1C340qQbs8Ou+gpekqCLznMRw/wxu84sCUSzIL9N3R Xk3ItPjCCQBNHNEWmhTBgLLJEDBMDDx8LTscTSTharMfTrPzx/S2ahGv7q+/0jaGEuvM BvPXlfKuyjDuGJP+BlCP7qhqDg57Rb4irvNCwAMLG40ftt+f2YRHtFfWbNAoM1BnBJak NAWQ== MIME-Version: 1.0 X-Received: by 10.141.18.131 with SMTP id u125mr41690420qhd.78.1427930692729; Wed, 01 Apr 2015 16:24:52 -0700 (PDT) Received: by 10.140.38.73 with HTTP; Wed, 1 Apr 2015 16:24:52 -0700 (PDT) In-Reply-To: <551C6B62.7080205@gmail.com> References: <551BC57D.5070101@gmail.com> <551C5A82.2090306@gmail.com> <20150401212303.GB2379@kib.kiev.ua> <551C6B62.7080205@gmail.com> Date: Wed, 1 Apr 2015 16:24:52 -0700 Message-ID: Subject: Re: NVMe performance 4x slower than expected From: Jim Harris To: Tobias Oberstein Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: Konstantin Belousov , "freebsd-hackers@freebsd.org" , Michael Fuckner , Alan Somers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Apr 2015 23:24:53 -0000 On Wed, Apr 1, 2015 at 3:04 PM, Tobias Oberstein wrote: > Is this vmstat after the test ? >> > > No, it wasn't (I ran vmstat hours after the test). > > Here is right after test (shortened test duration, otherwise exactly the > same FIO config): > > https://github.com/oberstet/scratchbox/blob/master/ > freebsd/cruncher/results/freebsd_vmstat.md#nvd7 > > Somewhat funny is that nvme does not use MSI(X). >> >> >> Yes - this is exactly the problem. >> >> nvme does use MSI-X if it can allocate the vectors (one per core). With >> 48 cores, >> I suspect we are quickly running out of vectors, so NVMe is reverting to >> INTx. >> >> Could you actually send vmstat -ia (I left off the 'a' previously) - >> just so we can >> see all allocated interrupt vectors. >> >> As an experiment, can you try disabling hyperthreading - this will >> reduce the >> > > The CPUs in this box > > root@s4l-zfs:~/src/sys/amd64/conf # sysctl hw.model > hw.model: Intel(R) Xeon(R) CPU E7-8857 v2 @ 3.00GHz > > don't have hyperthreading (we deliberately selected CPU model for max. > clock rather than HT) > > http://ark.intel.com/products/75254/Intel-Xeon-Processor-E7- > 8857-v2-30M-Cache-3_00-GHz > > number of cores and should let you get MSI-X vectors allocated for at >> least >> the first couple of NVMe controllers. Then please re-run your performance >> test on one of those controllers. >> >> > You mean I should run against nvdN where N is a controller that still got > MSI-X while other controllers did not? > > How would I find out which controller N? I don't know which nvdN is > mounted in a PCIe slot directly assigned to which CPU socket, and I don't > know which one's still got MSI-X and which not. > vmstat -ia should show you which controllers were assigned per-core vectors - you'll see all of them in the irq256+ range instead of the single vector per controller you see now in the lower irq index range. > > I could arrange for disabling all but 1 CPU and retest. Would that help? > Yes - that would help. Depending on how your system is configured, and which CPU socket the NVMe controllers are attached to, you may need to keep 2 CPU sockets enabled. You can also try a debug tunable that is in the nvme driver. hw.nvme.per_cpu_io_queues=0 This would just try to allocate a single MSI-X vector per controller - so all cores would still share a single I/O queue pair, but it would be MSI-X instead of INTx. (This actually should be the first fallback if we cannot allocate per-core vectors). Would at least show we are able to allocate some number of MSI-X vectors for NVMe. > > === > > Right after running against nvd7 > > irq56: nvme0 6440 0 > ... > irq106: nvme7 145056 3 > > > Then, immediately thereafter, running against nvd0 > > https://github.com/oberstet/scratchbox/blob/master/ > freebsd/cruncher/results/freebsd_vmstat.md#nvd0 > > irq56: nvme0 9233 0 > ... > irq106: nvme7 145056 3 > > === > > Earlier this day, I ran multiple longer tests .. all against nvd7. So if > these are cumulative numbers since last boot, that would make sense. > >