Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 17 May 2026 18:45:13 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 243225] mpr0: Out of chain frames leads to boot hang
Message-ID:  <bug-243225-227-I4sNaN4efr@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-243225-227@https.bugs.freebsd.org/bugzilla/>

index | next in thread | previous in thread | raw e-mail

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243225

Steve Shippa <steve@witopia.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steve@witopia.net

--- Comment #19 from Steve Shippa <steve@witopia.net> ---
Adding detailed findings and a one-line fix for this bug.

HARDWARE: Dell PowerEdge R740xd, Dell HBA330 Mini (SAS3008/mpr), 2x Intel DC
P4500 1.6TB NVMe U.2, 2x Xeon Gold 6248 (80 threads), 128GB RAM, BIOS 2.27.0,
HBA330 firmware 16.17.01.00.

OS: FreeBSD 15.0-RELEASE-p8, UEFI boot, GENERIC kernel.

SYMPTOM: Intermittent (~50%) boot hang with "mpr0: Out of chain frames,
consider increasing hw.mpr.max_chains." Occurs with two NVMe drives installed.
One NVMe or zero NVMe boots 100% reliably with all 8 HBA drives.

ROOT CAUSE: All three controllers (nvme0 at pci0:62:0:0, nvme1 at pci0:63:0:0,
mpr0 at pci0:65:0:0) share the same PCIe root complex on NUMA domain 0. With
per_cpu_io_queues enabled (default), each NVMe controller creates 40 I/O queues
(one per core) with 40 MSI-X vectors. Two controllers = 80 queues + 80 MSI-X
vectors allocated simultaneously through the same PCIe switch the mpr driver is
initializing through. This overwhelms the PCIe resource allocation and causes
mpr init to fail. The "out of chain frames" message is misleading -- it is not
a chain frame shortage.

FIX: Add to /boot/loader.conf:

  hw.nvme.per_cpu_io_queues="0"

This reduces each NVMe controller from 40 queues to 1 queue, eliminating the
PCIe resource conflict. 7+ consecutive boots (cold and warm) with zero
failures. Minimal performance impact for typical workloads.

THINGS THAT DID NOT WORK:
- hw.mpr.max_chains="200000" (tunable confirmed set in loader, kernel still
failed)
- hw.mpr.disable_msix="1"
- kern.maxphys="524288"
- loader_delay="3"
- kern.cam.boot_delay="10000"
- hint.nvme.1.disabled="1" in loader.conf (not passed to kernel on UEFI/Lua
boot -- kenv shows nothing)
- hint.nvme.1.disabled="1" in device.hints (works as workaround but requires
rc.local to devctl enable post-boot)
- Lua loader delay in /boot/lua/loader.lua (delay is pre-kernel, kernel
re-enumerates PCIe independently)
- Forth loader.rc modifications (completely ignored on FreeBSD 15 UEFI -- uses
Lua loader)
- BIOS MMIO Base changes (56TB/12TB/512GB)
- BIOS Slot Disablement / Boot Driver Disabled (NVMe bays are
backplane-connected, not in slot list)

ADDITIONAL NOTES:
1. FreeBSD 15 UEFI boot uses the Lua loader. /boot/loader.rc (Forth) is ignored
entirely. This may not be widely known and caught us off guard during
debugging.
2. hint.* set in loader.conf does not appear in kenv on UEFI/Lua boot. Hints
must go in /boot/device.hints.
3. Terry Kennedy's earlier analysis of this being a timing issue with a
misleading error message was correct.

-- 
You are receiving this mail because:
You are the assignee for the bug.

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-243225-227-I4sNaN4efr>