Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 09 Jan 2020 19:00:53 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 243225] "mpr0: Out of chain frames" boot hang after clang 9.0.1 import (probably timing, not compiler related)
Message-ID:  <bug-243225-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D243225

            Bug ID: 243225
           Summary: "mpr0: Out of chain frames" boot hang after clang
                    9.0.1 import (probably timing, not compiler related)
           Product: Base System
           Version: 12.0-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: terry-freebsd@glaver.org

I updated my test system from r356239 to r356557 (which crosses the clang 9=
.0.1
import) and started receiving "mpr0: Out of chain frames" at boot time, whi=
ch
causes a boot hang with the mpr0 controller being reset and reinitialized, =
and
the error happening again. This happens before the device (tape drive) is
detected, and happens regardless of whether anything is connected to the mpr
controller.

I had this before (many months ago) on this system and worked with Dell
service, replacing boards / cables / tape drive, etc. The solution at that
point was to put the controller into a different slot, which apparently hid
whatever timing problem is causing the boot hang. That's why I say in the PR
title that I don't think it is a clang 9.0.1 problem (incorrect code
generation). Presumably clang 9 generates faster (hopefully) or slower code
that is triggering the problem.

Escaping to the boot loader and killing time, then saying "boot" without
changing anything will sometimes let the system boot normally. Again pointi=
ng
to a possible timing problem.

The boot messages from r356239 are:

mpr0: <Avago Technologies (LSI) SAS3008> port 0x8000-0x80ff mem
0xc9100000-0xc910ffff,0xc8000000-0xc80fffff irq 64 at device 0.0 on pci17
mpr0: Firmware: 16.00.08.00, Driver: 23.00.00.00-fbsd
mpr0: IOCCapabilities:
7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,=
HostDisc,FastPath,RDPQArray>
mpr0: Found device <c01<SspTarg,Direct>,End Device> <6.0Gbps> handle<0x0009>
enclosureHandle<0x0001> slot 7
mpr0: At enclosure level 0 and connector name (1   )
sa0 at mpr0 bus 0 scbus14 target 7 lun 0

In r356557, only the first of those 3 lines appear, followed by:

mpr0: Out of chain frames, consider increasing hw.mpr.max_chains

And then, eventually by:

mpr0: Calling Reinit from mpr_wait_command, timeout=3D60, elapsed=3D60
mpr0: Reinitializing controller

At that point we're in a perpetual loop of reinit / timeout.

I can make the problem system available via remote console access (Dell iDR=
AC
8) or can try any suggestions for debugging this further myself.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-243225-227>