Date: Sun, 17 May 2026 18:45:13 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 243225] mpr0: Out of chain frames leads to boot hang Message-ID: <bug-243225-227-I4sNaN4efr@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-243225-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | previous in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243225 Steve Shippa <steve@witopia.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |steve@witopia.net --- Comment #19 from Steve Shippa <steve@witopia.net> --- Adding detailed findings and a one-line fix for this bug. HARDWARE: Dell PowerEdge R740xd, Dell HBA330 Mini (SAS3008/mpr), 2x Intel DC P4500 1.6TB NVMe U.2, 2x Xeon Gold 6248 (80 threads), 128GB RAM, BIOS 2.27.0, HBA330 firmware 16.17.01.00. OS: FreeBSD 15.0-RELEASE-p8, UEFI boot, GENERIC kernel. SYMPTOM: Intermittent (~50%) boot hang with "mpr0: Out of chain frames, consider increasing hw.mpr.max_chains." Occurs with two NVMe drives installed. One NVMe or zero NVMe boots 100% reliably with all 8 HBA drives. ROOT CAUSE: All three controllers (nvme0 at pci0:62:0:0, nvme1 at pci0:63:0:0, mpr0 at pci0:65:0:0) share the same PCIe root complex on NUMA domain 0. With per_cpu_io_queues enabled (default), each NVMe controller creates 40 I/O queues (one per core) with 40 MSI-X vectors. Two controllers = 80 queues + 80 MSI-X vectors allocated simultaneously through the same PCIe switch the mpr driver is initializing through. This overwhelms the PCIe resource allocation and causes mpr init to fail. The "out of chain frames" message is misleading -- it is not a chain frame shortage. FIX: Add to /boot/loader.conf: hw.nvme.per_cpu_io_queues="0" This reduces each NVMe controller from 40 queues to 1 queue, eliminating the PCIe resource conflict. 7+ consecutive boots (cold and warm) with zero failures. Minimal performance impact for typical workloads. THINGS THAT DID NOT WORK: - hw.mpr.max_chains="200000" (tunable confirmed set in loader, kernel still failed) - hw.mpr.disable_msix="1" - kern.maxphys="524288" - loader_delay="3" - kern.cam.boot_delay="10000" - hint.nvme.1.disabled="1" in loader.conf (not passed to kernel on UEFI/Lua boot -- kenv shows nothing) - hint.nvme.1.disabled="1" in device.hints (works as workaround but requires rc.local to devctl enable post-boot) - Lua loader delay in /boot/lua/loader.lua (delay is pre-kernel, kernel re-enumerates PCIe independently) - Forth loader.rc modifications (completely ignored on FreeBSD 15 UEFI -- uses Lua loader) - BIOS MMIO Base changes (56TB/12TB/512GB) - BIOS Slot Disablement / Boot Driver Disabled (NVMe bays are backplane-connected, not in slot list) ADDITIONAL NOTES: 1. FreeBSD 15 UEFI boot uses the Lua loader. /boot/loader.rc (Forth) is ignored entirely. This may not be widely known and caught us off guard during debugging. 2. hint.* set in loader.conf does not appear in kenv on UEFI/Lua boot. Hints must go in /boot/device.hints. 3. Terry Kennedy's earlier analysis of this being a timing issue with a misleading error message was correct. -- You are receiving this mail because: You are the assignee for the bug.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-243225-227-I4sNaN4efr>
