From owner-freebsd-fs@freebsd.org Sun Sep 11 03:34:15 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0A1EBBD59E5 for ; Sun, 11 Sep 2016 03:34:15 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-oi0-x22c.google.com (mail-oi0-x22c.google.com [IPv6:2607:f8b0:4003:c06::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BF4921DFD for ; Sun, 11 Sep 2016 03:34:14 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-oi0-x22c.google.com with SMTP id m11so234166726oif.1 for ; Sat, 10 Sep 2016 20:34:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=4cdDqnFGc08BvSqKGSItgMHZjTdnp6ZLRKpP/ZauRBI=; b=kyjXMIjpM4fTZxD45dRU4NI7JMquGq+hIhNwNLSg3154bZ6xbdPU8QwSoxy/1813q8 G6lzlmuxzApbBpNW0MRED4Y9kAomDwrWkEfwBYkXH6MwD97VkwTP4SZcAS4pTXhNoA/8 PcyRIy/Ocw5FhgOrtL15Y9Qw5M2vYjxbTtzSHoF6wswgPZ5FvKnLmnrlW+Za6oe0vt8M 36aDhy+hxfIUpDNunUfWTHGTpNF7rX8UfxA+X3sC2f9qGi0bTjcrgPW81zQ3IMuGPWSp ctk3sD4VG0zzgVWAH5WjFVkYs/rPFUjL8RIumbYEkbpfP2sBdh6LJSVHLZl/V9TwxLbM PruQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-transfer-encoding; bh=4cdDqnFGc08BvSqKGSItgMHZjTdnp6ZLRKpP/ZauRBI=; b=OD5ge25SWCIp3FY8ulCYkeBNyxtoZ89ZiGnxWZz4TMoVaXXq+aIXTRyHxulbK5sDSs GFExiA5lSslbsCeW1teB/ZI8gwN5eHHMLhy2aDLk7zC5CWpnj2J0PrvLcjyFXQvukwHI w3GhDSnzbfAaWDme/mWcViqLeAZl86YCrJnJwuFKXUaZuMBl9LJtBBXxDLSgk0XqiUe0 o/Y+2h6/TuuBToCUoGP+wpWDSgR61eGQfSrCpadn+YvJCOsNoWtnRXnSQF1IP3uvFYlW Gg6pNyfden/gIFZwfl/TVZtDISxM5gyuLdomlDcLIqgmlouHiFrDdEIt87GX16EQT4LZ dymg== X-Gm-Message-State: AE9vXwMF6cMtjQ5EsvwPGr6hNfZvbbHKFygRuB706ZE0+suJMwnJUKRub2CdhyOQ6XhywzP/HmzUSSzMOSj3FQ== X-Received: by 10.157.56.130 with SMTP id p2mr14904249otc.93.1473564854074; Sat, 10 Sep 2016 20:34:14 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.36.65.7 with HTTP; Sat, 10 Sep 2016 20:34:13 -0700 (PDT) X-Originating-IP: [50.253.99.174] In-Reply-To: References: From: Warner Losh Date: Sat, 10 Sep 2016 21:34:13 -0600 X-Google-Sender-Auth: oJ2w5cR223jZBu227-clNQ79b58 Message-ID: Subject: Re: Server with 40 physical cores, 48 NVMe disks, feel free to test it To: Christoph Pilka Cc: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Sep 2016 03:34:15 -0000 On Sat, Sep 10, 2016 at 2:58 AM, Christoph Pilka wrot= e: > Hi, > > we've just been granted a short-term loan of a server from Supermicro wit= h 40 physical cores (plus HTT) and 48 NVMe drives. After a bit of mucking a= bout, we managed to get 11-RC running. A couple of things are preventing th= e system from being terribly useful: > > - We have to use hw.nvme.force_intx=3D1 for the server to boot > If we don't, it panics around the 9th NVMe drive with "panic: couldn't fi= nd an APIC vector for IRQ...". Increasing hw.nvme.min_cpus_per_ioq brings i= t further, but it still panics later in the NVMe enumeration/init. hw.nvme.= per_cpu_io_queues=3D0 causes it to panic later (I suspect during ixl init -= the box has 4x10gb ethernet ports). John Baldwin has patches that help fix this. > - zfskern seems to be the limiting factor when doing ~40 parallel "dd if= =3D/dev/zer of=3D bs=3D1m" on a zpool stripe of all 48 drives. Each d= rive shows ~30% utilization (gstat), I can do ~14GB/sec write and 16 read. > > - direct writing to the NVMe devices (dd from /dev/zero) gives about 550M= B/sec and ~91% utilization per device These are slow drives then if all they can do 600MB/s. The drives we're looking at do 3.2GB/s read and 1.6GB/s write from the drives that we're looking at. 48 drives though. Woof. What's the interconnect? There's enough PCIe lanes for that? 192 lanes? How's that possible? > Obviously, the first item is the most troublesome. The rest is based on e= ntirely synthetic testing and may have little or no actual impact on the se= rver's usability or fitness for our purposes. > > There is nothing but sshd running on the server, and if anyone wants to p= lay around you'll have IPMI access (remote kvm, virtual media, power) and r= oot. Don't think I have enough time to track this all down... Warner