Date: Thu, 09 Jan 2020 19:00:53 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 243225] "mpr0: Out of chain frames" boot hang after clang 9.0.1 import (probably timing, not compiler related) Message-ID: <bug-243225-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D243225 Bug ID: 243225 Summary: "mpr0: Out of chain frames" boot hang after clang 9.0.1 import (probably timing, not compiler related) Product: Base System Version: 12.0-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: terry-freebsd@glaver.org I updated my test system from r356239 to r356557 (which crosses the clang 9= .0.1 import) and started receiving "mpr0: Out of chain frames" at boot time, whi= ch causes a boot hang with the mpr0 controller being reset and reinitialized, = and the error happening again. This happens before the device (tape drive) is detected, and happens regardless of whether anything is connected to the mpr controller. I had this before (many months ago) on this system and worked with Dell service, replacing boards / cables / tape drive, etc. The solution at that point was to put the controller into a different slot, which apparently hid whatever timing problem is causing the boot hang. That's why I say in the PR title that I don't think it is a clang 9.0.1 problem (incorrect code generation). Presumably clang 9 generates faster (hopefully) or slower code that is triggering the problem. Escaping to the boot loader and killing time, then saying "boot" without changing anything will sometimes let the system boot normally. Again pointi= ng to a possible timing problem. The boot messages from r356239 are: mpr0: <Avago Technologies (LSI) SAS3008> port 0x8000-0x80ff mem 0xc9100000-0xc910ffff,0xc8000000-0xc80fffff irq 64 at device 0.0 on pci17 mpr0: Firmware: 16.00.08.00, Driver: 23.00.00.00-fbsd mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,= HostDisc,FastPath,RDPQArray> mpr0: Found device <c01<SspTarg,Direct>,End Device> <6.0Gbps> handle<0x0009> enclosureHandle<0x0001> slot 7 mpr0: At enclosure level 0 and connector name (1 ) sa0 at mpr0 bus 0 scbus14 target 7 lun 0 In r356557, only the first of those 3 lines appear, followed by: mpr0: Out of chain frames, consider increasing hw.mpr.max_chains And then, eventually by: mpr0: Calling Reinit from mpr_wait_command, timeout=3D60, elapsed=3D60 mpr0: Reinitializing controller At that point we're in a perpetual loop of reinit / timeout. I can make the problem system available via remote console access (Dell iDR= AC 8) or can try any suggestions for debugging this further myself. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-243225-227>