From owner-freebsd-stable@freebsd.org  Fri Apr 12 19:22:28 2019
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 65984158437E
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Fri, 12 Apr 2019 19:22:28 +0000 (UTC) (envelope-from hausen@punkt.de)
Received: from kagate.punkt.de (kagate.punkt.de [217.29.33.131])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1A09D8954A
 for <freebsd-stable@freebsd.org>; Fri, 12 Apr 2019 19:22:26 +0000 (UTC)
 (envelope-from hausen@punkt.de)
Received: from hugo10.ka.punkt.de (hugo10.ka.punkt.de [217.29.44.10])
 by gate2.intern.punkt.de with ESMTP id x3CJMOhh069266;
 Fri, 12 Apr 2019 21:22:24 +0200 (CEST)
Received: from [217.29.46.66] (unassigned [217.29.46.66] (may be forged))
 by hugo10.ka.punkt.de (8.14.2/8.14.2) with ESMTP id x3CJMNv2068370;
 Fri, 12 Apr 2019 21:22:23 +0200 (CEST)
 (envelope-from hausen@punkt.de)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\))
Subject: Re: NVME aborting outstanding i/o and controller resets
From: "Patrick M. Hausen" <hausen@punkt.de>
In-Reply-To: <CANCZdfrcnRwqDPXMyT6xNKUZ5nX8x9Fj6DHbCnh+Q4mWzx0vGQ@mail.gmail.com>
Date: Fri, 12 Apr 2019 21:22:23 +0200
Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <92DAD65A-9BFE-4294-9066-977F498300A3@punkt.de>
References: <818CF16A-D71C-47C0-8A1B-35C9D8F68F4E@punkt.de>
 <CF2365AE-23EA-4F18-9520-C998216155D5@punkt.de>
 <CANCZdfoPZ9ViQzZ2k8GT5pNw5hjso3rzmYxzU=s+3K=ze+LZwg@mail.gmail.com>
 <58E4FC01-D154-42D4-BA0F-EF9A2C60DBF7@punkt.de>
 <CANCZdfpeZ-MMKB3Sh=3vhsjJcmFkGG7Jq8nW52D5S45PL3menA@mail.gmail.com>
 <45D98122-7596-4E8A-8A0D-C33E017C1109@punkt.de>
 <CANCZdfrcnRwqDPXMyT6xNKUZ5nX8x9Fj6DHbCnh+Q4mWzx0vGQ@mail.gmail.com>
To: Warner Losh <imp@bsdimp.com>
X-Mailer: Apple Mail (2.3445.104.8)
X-Rspamd-Queue-Id: 1A09D8954A
X-Spamd-Bar: --
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of hausen@punkt.de designates 217.29.33.131
 as permitted sender) smtp.mailfrom=hausen@punkt.de
X-Spamd-Result: default: False [-2.63 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-0.99)[-0.993,0]; FROM_HAS_DN(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:217.29.32.0/20]; MV_CASE(0.50)[];
 MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[];
 DMARC_NA(0.00)[punkt.de]; NEURAL_HAM_LONG(-1.00)[-0.999,0];
 RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: mailin.pluspunkthosting.de];
 RCPT_COUNT_TWO(0.00)[2];
 RCVD_IN_DNSWL_NONE(0.00)[131.33.29.217.list.dnswl.org : 127.0.10.0];
 NEURAL_HAM_SHORT(-0.54)[-0.540,0];
 IP_SCORE(-0.29)[ipnet: 217.29.32.0/20(-0.79), asn: 16188(-0.64), country:
 DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:16188, ipnet:217.29.32.0/20, country:DE];
 MID_RHS_MATCH_FROM(0.00)[]
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Apr 2019 19:22:28 -0000

Hi Warner,

thanks for taking the time again =E2=80=A6

> OK. This means that whatever I/O workload we've done has caused the =
NVME card to stop responding for 30s, so we reset it.

I figured as much ;-)

> So it's an intel card.

Yes - I already added this info several times. 6 of them, 2.5=E2=80=9C =
NVME =E2=80=9Edisk drives=E2=80=9C.

> OK. That suggests Intel has a problem with their firmware.

I came across this one:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211713

Is it more probable that Intel has got buggy firmware here than that
=E2=80=9Ewe=E2=80=9C are missing interrupts?

The mainboard is the Supermicro H11SSW-NT. Two NVME drive bays share
a connector on the mainboard:

	NVMe Ports ( NVMe 0~7, 10, 11, 14, 15)

	The H11SSW-iN/NT has tweleve (12) NVMe ports (2 ports per 1 Slim =
SAS connector) on the motherboard.
	These ports provide high-speed, low-latency PCI-E 3.0 x4 =
connections directly from the CPU to NVMe Solid
	State (SSD) drives. This greatly increases SSD data- throughput =
performance and significantly reduces PCI-E
	latency by simplifying driver/software requirements resulting =
from direct PCI-E interface from the CPU to the NVMe SSD drives.

Is this purely mechanical or do two drives share PCI-E resources? Which =
would explain
why the problems always come in pairs (nvme6 and nvme7, for example).

This afternoon I set up a system with 4 drives and I was not able to =
reproduce the problem.
(We just got 3 more machines which happened to have 4 drives each and no =
M.2 directly
on the mainboard).
I will change the config to 6 drives like with the two FreeNAS systems =
in our data center.

> [=E2=80=A6 nda(4) ...]
> I doubt that would have any effect. They both throw as much I/O onto =
the card as possible in the default config.

I found out - yes, just the same.

> There's been some minor improvements in -current here. Any chance you =
could experimentally try that with this test? You won't get as many I/O =
abort errors (since we don't print those), and we have a few more =
workarounds for the reset path (though honestly, it's still kinda =
stinky).

HEAD or RELENG_12, too?

Kind regards,
Patrick
--=20
punkt.de GmbH			Internet - Dienstleistungen - Beratung
Kaiserallee 13a			Tel.: 0721 9109-0 Fax: -100
76133 Karlsruhe			info@punkt.de	http://punkt.de
AG Mannheim 108285		Gf: Juergen Egeling