Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Apr 2019 09:11:27 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        "Patrick M. Hausen" <hausen@punkt.de>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: NVME aborting outstanding i/o
Message-ID:  <CANCZdfoPZ9ViQzZ2k8GT5pNw5hjso3rzmYxzU=s%2B3K=ze%2BLZwg@mail.gmail.com>
In-Reply-To: <CF2365AE-23EA-4F18-9520-C998216155D5@punkt.de>
References:  <818CF16A-D71C-47C0-8A1B-35C9D8F68F4E@punkt.de> <CF2365AE-23EA-4F18-9520-C998216155D5@punkt.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Apr 4, 2019 at 2:39 AM Patrick M. Hausen <hausen@punkt.de> wrote:

> Hi all,
>
> I=E2=80=99m currently doing some load tests/burn in for two new servers.
> These feature all NVME SSDs and run FreeNAS, i.e. FreeBSD 11.2-STABLE.
>
>         pcib17: <ACPI PCI-PCI bridge> at device 3.2 numa-domain 1 on pci1=
5
>         pcib17: [GIANT-LOCKED]
>         pci17: <ACPI PCI bus> numa-domain 1 on pcib17
>         nvme7: <Generic NVMe Device> mem 0xeca10000-0xeca13fff at device
> 0.0 numa-domain 1 on pci17
>
> When putting some moderate i/o load on the system, the log fills with the=
se
> messages:
>
>         nvme7: aborting outstanding i/o
>         nvme7: DATASET MANAGEMENT sqid:41 cid:91 nsid:1
>         nvme7: ABORTED - BY REQUEST (00/07) sqid:41 cid:91 cdw0:0
>

OK. So unless you are suspending and resuming, or the drive is somehow
failing, here's what's going on:

There's a request that was sent down to the drive. It took longer than 30s
to respond. One of them, at least, was a trim request.

There's a number of reasons for this. NAND sucks. It's a horrible steaming
pile of... silicon. To make it useful, there's a layer of software called
the FTL (flash translation layer). NAND is an append-only medium at the
lowest level, so the FTL has to take requests and build a map of logical
blocks to physical blocks, as well as manage the 'log structured device' in
some way. The details of why are too long to get into here (see my BSDCan
talk from a few years ago). But what is relevant is that many drives have
really crappy FTLs, especially when it comes to TRIMs. They can't handle a
lot of them, and when you send a lot down, like FreeBSD will often do with
UFS or ZFS, you can trigger the driving doing a bunch of garbage
collection. This can cause the drive to delay > 30s before responding to
commands. So sometimes you can avoid this by disabling trims.

Other times, you have crappy FTL that crashes. This can cause a long
timeout because FreeBSD has done something that, while in spec, is
unexpected or not well tested. Here you can really only have FreeBSD do
less work at once to avoid this issue, or you can upgrade the firmware.

There has been some discussion of this on on the iX Systems forum as well
> as various
> FreeBSD media and one person suggested setting:
>
>         hw.nvme.per_cpu_io_queues=3D0
>
>
> This is where I need some help now. This is from the manpage for nvme(4):
>
> ----------
>     To force a single I/O queue pair shared by all CPUs, set the followin=
g
>     tunable value in loader.conf(5):
>
>           hw.nvme.per_cpu_io_queues=3D0
>
>     To assign more than one CPU per I/O queue pair, thereby reducing the
>     number of MSI-X vectors consumed by the device, set the following
> tunable
>     value in loader.conf(5):
>
>           hw.nvme.min_cpus_per_ioq=3DX
>
>     To force legacy interrupts for all nvme driver instances, set the
>     following tunable value in loader.conf(5):
>
>           hw.nvme.force_intx=3D1
>
>     Note that use of INTx implies disabling of per-CPU I/O queue pairs.
> ----------
>
> But:
>
>         root@freenas01[~]# sysctl hw.nvme.per_cpu_io_queues
>         sysctl: unknown oid 'hw.nvme.per_cpu_io_queues'
>         root@freenas01[~]# sysctl hw.nvme.min_cpus_per_ioq
>         sysctl: unknown oid 'hw.nvme.min_cpus_per_ioq'
>         root@freenas01[~]# sysctl hw.nvme.force_intx
>         sysctl: unknown oid 'hw.nvme.force_intx'
>
>
> Where do I go from here?
>

Did you add it to /boot/loader.conf? There's no sysctl for this.

Wanrer


>
> Thanks!
> Patrick
> --
> punkt.de GmbH                   Internet - Dienstleistungen - Beratung
> Kaiserallee 13a                 Tel.: 0721 9109-0 Fax: -100
> 76133 Karlsruhe                 info@punkt.de   http://punkt.de
> AG Mannheim 108285              Gf: Juergen Egeling
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfoPZ9ViQzZ2k8GT5pNw5hjso3rzmYxzU=s%2B3K=ze%2BLZwg>