Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 10 Sep 2016 21:34:13 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Christoph Pilka <c.pilka@asconix.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Server with 40 physical cores, 48 NVMe disks, feel free to test it
Message-ID:  <CANCZdfoAgHjqgcSMQBbpDBXkA8D7zdceZgnDVvMCPeM0Psg98Q@mail.gmail.com>
In-Reply-To: <C6904B7F-D148-47C0-BD17-0A2AF63B5717@asconix.com>
References:  <C6904B7F-D148-47C0-BD17-0A2AF63B5717@asconix.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Sep 10, 2016 at 2:58 AM, Christoph Pilka <c.pilka@asconix.com> wrot=
e:
> Hi,
>
> we've just been granted a short-term loan of a server from Supermicro wit=
h 40 physical cores (plus HTT) and 48 NVMe drives. After a bit of mucking a=
bout, we managed to get 11-RC running. A couple of things are preventing th=
e system from being terribly useful:
>
> - We have to use hw.nvme.force_intx=3D1 for the server to boot
> If we don't, it panics around the 9th NVMe drive with "panic: couldn't fi=
nd an APIC vector for IRQ...". Increasing hw.nvme.min_cpus_per_ioq brings i=
t further, but it still panics later in the NVMe enumeration/init. hw.nvme.=
per_cpu_io_queues=3D0 causes it to panic later (I suspect during ixl init -=
 the box has 4x10gb ethernet ports).

John Baldwin has patches that help fix this.

> - zfskern seems to be the limiting factor when doing ~40 parallel "dd if=
=3D/dev/zer of=3D<file> bs=3D1m" on a zpool stripe of all 48 drives. Each d=
rive shows ~30% utilization (gstat), I can do ~14GB/sec write and 16 read.
>
> - direct writing to the NVMe devices (dd from /dev/zero) gives about 550M=
B/sec and ~91% utilization per device

These are slow drives then if all they can do 600MB/s. The drives
we're looking at do 3.2GB/s read and 1.6GB/s write from the drives
that we're looking at.

48 drives though. Woof. What's the interconnect? There's enough PCIe
lanes for that? 192 lanes? How's that possible?

> Obviously, the first item is the most troublesome. The rest is based on e=
ntirely synthetic testing and may have little or no actual impact on the se=
rver's usability or fitness for our purposes.
>
> There is nothing but sshd running on the server, and if anyone wants to p=
lay around you'll have IPMI access (remote kvm, virtual media, power) and r=
oot.

Don't think I have enough time to track this all down...

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfoAgHjqgcSMQBbpDBXkA8D7zdceZgnDVvMCPeM0Psg98Q>