From owner-freebsd-hackers@FreeBSD.ORG Thu Apr 9 21:08:11 2015 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 18A8BB6B; Thu, 9 Apr 2015 21:08:11 +0000 (UTC) Received: from mail-wg0-x22c.google.com (mail-wg0-x22c.google.com [IPv6:2a00:1450:400c:c00::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A8FACB97; Thu, 9 Apr 2015 21:08:10 +0000 (UTC) Received: by wgso17 with SMTP id o17so21037156wgs.1; Thu, 09 Apr 2015 14:08:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=7ej1afHRCxGtf7PhbWMU1/n2+c2ji4s4Ljj2hu6DiOc=; b=Z8JrNRp0Vdhf4saFuQu/nHA7nYU5XubZ0AzB/D1uI89AWIe3f7+P2L21eC2PyyH0FS zM26tjXyzGc9yFqxtUQVxAOmNArMxWjX9vSQsP3mmoarj74eUyjOvictSFJL7Xop/bVE zwEn0u2ULZU3l2Cu9Xvc0pw7kc6RM0G90gz8igkhc/iHwrI6/wMPsv63MXD8C3QvVHRS 0fKpszHlpnEO6Nk+akkOWxYOS431ADS5lhX/gO0NORxVrh8jg6VZuHEUmAPzCW9CeIJB Y9WbbQV+DOVzhj9Rdwnt7VUCrk/1PUF67JKQq4KM30J6ujDDfGeuB7zxiReBCC7w43b3 REhA== X-Received: by 10.194.200.194 with SMTP id ju2mr55227947wjc.61.1428613689138; Thu, 09 Apr 2015 14:08:09 -0700 (PDT) Received: from [192.168.43.73] ([89.204.139.142]) by mx.google.com with ESMTPSA id l3sm21780206wiv.18.2015.04.09.14.08.03 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Apr 2015 14:08:04 -0700 (PDT) Message-ID: <5526EA33.6090004@gmail.com> Date: Thu, 09 Apr 2015 23:08:03 +0200 From: Tobias Oberstein User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Jim Harris , Konstantin Belousov Subject: Re: NVMe performance 4x slower than expected References: <551BC57D.5070101@gmail.com> <551C5A82.2090306@gmail.com> <20150401212303.GB2379@kib.kiev.ua> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-hackers@freebsd.org" , Michael Fuckner , Alan Somers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Apr 2015 21:08:11 -0000 Hi Jim, thanks for coming back to this and your work / infos - highly appreciated! > (Based on your ramdisk performance data, it does not > appear that lack of per-CPU NVMe I/O queues is the cause of the performance > issues on this system - My unscientific gut feeling is: it might be related to NUMA in general. The memory performance https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd_memperf.md#results-48-core-numa-machine is slower than a E3 single socket Xeon https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd_memperf.md#results-small-xeon-machine The E3 is Haswell at 3.4 GHz, whereas the E7 is one gen. older and 3.0 GHz, but I don't think this explains the very large difference. The 4 socket box should have an aggregate main memory bandwidth of 4 x 85GB/s = 340 GB/s The measured numbers are orders smaller. > but I'm working to confirm on a system in my lab.) FWIW, the box I am testing is http://www.quantaqct.com/Product/Servers/Rackmount-Servers/4U/QuantaGrid-Q71L-4U-p18c77c70c83c79 The box is maxed out on RAM, CPU (mostly), internal SSDs, as well as PCIe cards (it has 10 slots). There are very few x86 systems with bigger scale-up. Tops out with the SGI Ultraviolet UV2000. But this is totally exotic, whereas above is pure Intel design. How about Intel donating such a baby to FBSD foundation to get NUMA and everything sorted out? Street price is roughly 150k, but given most of the components are made by Intel, should be cheaper for Intel;) == Sadly, given the current state of affairs, I couldn't support targeting FreeBSD on this system any longer. Customer wants to go to production soonish. We'll be using Linux / SLES12. Performance at block device level there is as expected from Intel datasheets. Means: massive! We now "only" need to translate those millions of IOPS from block device to filesystem level and then database (PostgreSQL). Ha, will be fun;) And I will miss ZFS and all the FreeBSD goodies =( Cheers, /Tobias