From owner-freebsd-stable@freebsd.org  Mon Oct 26 21:50:29 2020
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3D9E844FDD1
 for <freebsd-stable@mailman.nyi.freebsd.org>;
 Mon, 26 Oct 2020 21:50:29 +0000 (UTC)
 (envelope-from SRS0=gOlx=EB=lutter.sk=juraj@ns2.wilbury.net)
Received: from ns2.wilbury.net (ns2.wilbury.net [92.60.51.55])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "svc.wilbury.net",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4CKpRR6nHSz3YBb
 for <freebsd-stable@freebsd.org>; Mon, 26 Oct 2020 21:50:27 +0000 (UTC)
 (envelope-from SRS0=gOlx=EB=lutter.sk=juraj@ns2.wilbury.net)
Received: from [10.3.1.13] (hq.bonet.sk [92.60.48.52])
 (Authenticated sender: juraj@lutter.sk)
 by svc.wilbury.net (Postfix) with ESMTPSA id E855F45CE94
 for <freebsd-stable@freebsd.org>; Mon, 26 Oct 2020 22:50:18 +0100 (CET)
From: Juraj Lutter <juraj@lutter.sk>
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
Subject: Interrupt problems(?) on Dell R740xd
Message-Id: <9FD07762-5744-480C-A289-DDB09730A74D@lutter.sk>
Date: Mon, 26 Oct 2020 22:50:18 +0100
To: freebsd-stable@freebsd.org
X-Mailer: Apple Mail (2.3608.120.23.2.4)
X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,HELO_MISC_IP,
 LOTS_OF_MONEY,SPF_FAIL,TW_BN,TW_KD,TW_NV autolearn=no
 autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on ns2.wilbury.net
X-Rspamd-Queue-Id: 4CKpRR6nHSz3YBb
X-Spamd-Bar: +
Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none;
 spf=none (mx1.freebsd.org: domain of
 SRS0=gOlx=EB=lutter.sk=juraj@ns2.wilbury.net has no SPF policy when checking
 92.60.51.55) smtp.mailfrom=SRS0=gOlx=EB=lutter.sk=juraj@ns2.wilbury.net
X-Spamd-Result: default: False [1.95 / 15.00]; RCVD_TLS_ALL(0.00)[];
 SUBJECT_HAS_QUESTION(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[];
 FROM_HAS_DN(0.00)[]; MV_CASE(0.50)[];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org];
 TO_DN_NONE(0.00)[]; AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1];
 NEURAL_HAM_MEDIUM(-0.49)[-0.485]; ARC_NA(0.00)[];
 NEURAL_HAM_SHORT(-0.08)[-0.080]; MID_RHS_MATCH_FROM(0.00)[];
 DMARC_NA(0.00)[lutter.sk]; NEURAL_SPAM_LONG(0.82)[0.818];
 R_SPF_NA(0.00)[no SPF record];
 FORGED_SENDER(0.30)[juraj@lutter.sk,SRS0=gOlx=EB=lutter.sk=juraj@ns2.wilbury.net];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:44185, ipnet:92.60.48.0/22, country:SK];
 RCVD_COUNT_TWO(0.00)[2];
 FROM_NEQ_ENVFROM(0.00)[juraj@lutter.sk,SRS0=gOlx=EB=lutter.sk=juraj@ns2.wilbury.net];
 MAILMAN_DEST(0.00)[freebsd-stable]
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.33
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Oct 2020 21:50:29 -0000

Hi,

on a Dell R740xd with:
- 22x nvm0: Dell Express Flash PM1725b 1.6TB SFF
- 2x ATA SSDSC2KG240G8R
- 2 package(s) x 8 core(s) x 2 hardware threads
- 256GB RAM

running 12.2-STABLE r367058 I've run into a problem where under some =
time, the machine
locks up in certain operations (mkdir, for example, not always the =
same). In top output,
similar entries can be seen:

   12 root        -80    -     0B  7936K WAIT     0   0:05   0.00% =
intr{irq48: pcib12+++}
   12 root        -88    -     0B  7936K WAIT     6   0:05   0.00% =
intr{irq16: ahci0 xhci0*}
   12 root        -80    -     0B  7936K WAIT     8   0:05   0.00% =
intr{irq53: pcib16++}
   12 root        -80    -     0B  7936K WAIT    12   0:05   0.00% =
intr{irq54: pcib17++}

For example, running poudriere:
4124  1  I+      0:00.21 /usr/local/libexec/poudriere/sh -e =
/usr/local/share/poudriere/bulk.sh
4217  1  D+      0:00.00 cap_mkdb =
/poudriere/build/data/.m/12sgx64-default/ref/etc/login.conf

And then even the root pool is getting checksum errors, with subseqent =
scrub needed:
Oct 26 11:55:42 bnts-nvs-n1 ZFS[4117]: pool I/O failure, zpool=3D$zroot =
error=3D$97
Oct 26 11:55:42 bnts-nvs-n1 ZFS[4118]: checksum mismatch, zpool=3D$zroot =
path=3D$/dev/da0p3 offset=3D$30089228288 size=3D$53248
Oct 26 11:55:42 bnts-nvs-n1 ZFS[4119]: checksum mismatch, zpool=3D$zroot =
path=3D$/dev/da1p3 offset=3D$30089228288 size=3D$53248
Oct 26 11:55:49 bnts-nvs-n1 ZFS[4121]: pool I/O failure, zpool=3D$zroot =
error=3D$97
Oct 26 11:56:26 bnts-nvs-n1 ZFS[4239]: pool I/O failure, zpool=3D$zroot =
error=3D$97

This all happens when "increased" I/O is going via mrsas-attached disks:
AVAGO MegaRAID SAS FreeBSD mrsas driver version: 07.709.04.00-fbsd
mrsas0: <AVAGO Invader SAS Controller> port 0x4000-0x40ff mem =
0x9db00000-0x9db0ffff,0x9da00000-0x9dafffff irq 32 at device 0.0 =
numa-domain 0 on pci4
mrsas0: FW now in Ready state
mrsas0: Using MSI-X with 32 number of vectors
mrsas0: FW supports <96> MSIX vector,Online CPU 32 Current MSIX <32>
mrsas0: max sge: 0x46, max chain frame size: 0x400, max fw cmd: 0x39f
mrsas0: Issuing IOC INIT command to FW.
mrsas0: IOC INIT response received from FW.
mrsas0: System PD created target ID: 0x0
mrsas0: System PD created target ID: 0x1
mrsas0: FW supports: UnevenSpanSupport=3D1
mrsas0: max_fw_cmds: 927  max_scsi_cmds: 911
mrsas0: MSI-x interrupts setup success
mrsas0: mrsas_ocr_thread

Internal disks are:
<ATA SSDSC2KG240G8R DL67>          at scbus17 target 0 lun 0 (pass2,da0)
<ATA SSDSC2KG240G8R DL67>          at scbus17 target 1 lun 0 (pass3,da1)

Example:
da0 at mrsas0 bus 1 scbus17 target 0 lun 0
da0: <ATA SSDSC2KG240G8R DL67> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number BTYG01730DP5240AGN
da0: 150.000MB/s transfers
da0: 228936MB (468862128 512 byte sectors)

Internal AHCI is:
pci0: <ACPI PCI bus> numa-domain 0 on pcib0
pci0: <dasp, performance counters> at device 8.1 (no driver attached)
pci0: <unknown> at device 17.0 (no driver attached)
ahci0: <Intel Lewisburg AHCI SATA controller>
ahci0: AHCI v1.31 with 6 6Gbps ports, Port Multiplier not supported
ahci1: <Intel Lewisburg AHCI SATA controller>
ahci1: AHCI v1.31 with 8 6Gbps ports, Port Multiplier not supported

sesutil map excerpt:
ses0:
        Enclosure Name: AHCI SGPIO Enclosure 2.00
        Enclosure ID: 3061686369656d30
        Element 0, Type: Array Device Slot
                Status: Unsupported (0x00 0x00 0x00 0x00)
                Description: Drive Slots


NVME disks are:
nda0 at nvme0 bus 0 scbus19 target 0 lun 1
nda0: <Dell Express Flash PM1725b 1.6TB SFF 1.1.0 S5CUNA0N201038>
nda0: Serial Number S5CUNA0N201038
nda0: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link
nda0: 1526185MB (3125627568 512 byte sectors)

The machine also has 4x bge and 4x bnxt.

With hw.pci.enable_msi=3D"0" set, it's slightly better, with =
hw.pci.enable_msi=3D"1",
it happens more often and under even lower load than with enable_msi=3D0.
enable_msix is set to 1.

Once the machine locks up, one or more of the following also appears:
bge2: Interface stopped DISTRIBUTING, possible flapping - this might be =
caused by stuck interrupt(?)
nvme0: Missing interrupt

The only way out is to reboot.
And I wonder, what steps could I take to narrow down the source of the =
problem?
The machine is not yet in production, I even can try a -CURRENT on it, =
as a last resort.

The one thing I=E2=80=99m also considering is to disable USB in order to =
not share interrupt(s) with ahci.
The weird thing is that it can survive a full buildworld with 1 make =
job, but not with 32 or even 16.

Did anyone came across something like this?
Any hints are welcome.

Thanks.

=E2=80=94
Juraj Lutter
XMPP: juraj (at) lutter.sk
GSM: +421907986576