From owner-freebsd-questions@freebsd.org Mon Sep 19 20:10:44 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0FB6BE1689 for ; Mon, 19 Sep 2016 20:10:44 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mail.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CC0A96EF for ; Mon, 19 Sep 2016 20:10:44 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by mail.baldwin.cx (Postfix) with ESMTPSA id 5126F10AF8D; Mon, 19 Sep 2016 16:10:43 -0400 (EDT) From: John Baldwin To: Adrian Chadd Cc: "Kevin P. Neal" , Christoph Pilka , FreeBSD Questions Subject: Re: 40 cores, 48 NVMe disks, feel free to take over Date: Mon, 19 Sep 2016 13:10:40 -0700 Message-ID: <2828115.ibI7SUQqHX@ralph.baldwin.cx> User-Agent: KMail/4.14.10 (FreeBSD/11.0-PRERELEASE; KDE/4.14.10; amd64; ; ) In-Reply-To: References: <20160911203502.GA24973@neutralgood.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.baldwin.cx); Mon, 19 Sep 2016 16:10:43 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.99.2 at mail.baldwin.cx X-Virus-Status: Clean X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Sep 2016 20:10:45 -0000 On Monday, September 19, 2016 11:56:58 AM Adrian Chadd wrote: > Hi, > > I think the nvme allocation issue is known. John? A kernel with 'options EARLY_AP_STARTUP' (which I plan to enable by default in HEAD "soon") should boot fine without needing the force_intx hack. The option is available in 11 but not enabled by default. > -a > > > On 11 September 2016 at 13:35, Kevin P. Neal wrote: > > On Sat, Sep 10, 2016 at 10:57:07AM +0200, Christoph Pilka wrote: > >> Hi, > >> > >> the server we got to experiment with is the SuperMicro 2028R-NR48N (https://www.supermicro.nl/products/system/2U/2028/SSG-2028R-NR48N.cfm ), the board itself is a X10DSC+ > > > > The best thing to do is file a bug report. If you don't then your report > > will probably fall through the cracks. Include all the info you've posted > > so far. > > > >> //Chris > >> > >> > On 09 Sep 2016, at 23:14, Dennis Glatting wrote: > >> > > >> > On Fri, 2016-09-09 at 22:51 +0200, Christoph Pilka wrote: > >> >> Hi, > >> >> > >> >> we've just been granted a short-term loan of a server from Supermicro > >> >> with 40 physical cores (plus HTT) and 48 NVMe drives. After a bit of > >> >> mucking about, we managed to get 11-RC running. A couple of things > >> >> are preventing the system from being terribly useful: > >> >> > >> >> - We have to use hw.nvme.force_intx=1 for the server to boot > >> >> If we don't, it panics around the 9th NVMe drive with "panic: > >> >> couldn't find an APIC vector for IRQ...". Increasing > >> >> hw.nvme.min_cpus_per_ioq brings it further, but it still panics later > >> >> in the NVMe enumeration/init. hw.nvme.per_cpu_io_queues=0 causes it > >> >> to panic later (I suspect during ixl init - the box has 4x10gb > >> >> ethernet ports). > >> >> > >> >> - zfskern seems to be the limiting factor when doing ~40 parallel "dd > >> >> if=/dev/zer of= bs=1m" on a zpool stripe of all 48 drives. Each > >> >> drive shows ~30% utilization (gstat), I can do ~14GB/sec write and 16 > >> >> read. > >> >> > >> >> - direct writing to the NVMe devices (dd from /dev/zero) gives about > >> >> 550MB/sec and ~91% utilization per device > >> >> > >> >> Obviously, the first item is the most troublesome. The rest is based > >> >> on entirely synthetic testing and may have little or no actual impact > >> >> on the server's usability or fitness for our purposes. > >> >> > >> >> There is nothing but sshd running on the server, and if anyone wants > >> >> to play around you'll have IPMI access (remote kvm, virtual media, > >> >> power) and root. > >> >> > >> >> Any takers? > >> >> > >> > > >> > > >> > I'm curious to know what board you have. I have had FreeBSD, including > >> > release 11 candidates, running on SM boards without any trouble > >> > although some of them are older boards. I haven't looked at ZFS > >> > performance because mine are typically low disk use. That said, my > >> > virtual server (also a SM) IOPs suck but so do its disks. > >> > > >> > I recently found the Intel RAID chip on one SM isn't real RAID, rather > >> > it's pseudo RAID but for a few dollars more it could be real RAID. :( > >> > It was killing IOPs so I popped in an old LSI board, routed the cables > >> > from the Intel chip, and the server is now a happy camper. I then > >> > replaced 11-RC with Ubuntu 16.10 due to a specific application but I am > >> > also running RAIDz2 under Ubuntu on three trash 2.5T disks (I didn't do > >> > this for any reason other than fun). > >> > > >> > root@Tuck3r:/opt/bin# zpool status > >> > pool: opt > >> > state: ONLINE > >> > scan: none requested > >> > config: > >> > > >> > NAME STATE READ WRITE CKSUM > >> > opt ONLINE 0 0 0 > >> > raidz2-0 ONLINE 0 0 0 > >> > sda ONLINE 0 0 0 > >> > sdb ONLINE 0 0 0 > >> > sdc ONLINE 0 0 0 > >> > > >> > > >> > > >> >> Wbr > >> >> Christoph Pilka > >> >> Modirum MDpay > >> >> > >> >> Sent from my iPhone > >> >> _______________________________________________ > >> >> freebsd-questions@freebsd.org mailing list > >> >> https://lists.freebsd.org/mailman/listinfo/freebsd-questions > >> >> To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freeb > >> >> sd.org " > >> > _______________________________________________ > >> > freebsd-questions@freebsd.org mailing list > >> > https://lists.freebsd.org/mailman/listinfo/freebsd-questions > >> > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org " > >> > >> _______________________________________________ > >> freebsd-questions@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-questions > >> To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" > > -- > > Kevin P. Neal http://www.pobox.com/~kpn/ > > > > "Good grief, I've just noticed I've typed in a rant. Sorry chaps!" > > Keir Finlow Bates, circa 1998 > > _______________________________________________ > > freebsd-questions@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-questions > > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" -- John Baldwin