Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 07 Aug 2021 07:21:34 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-arm@FreeBSD.org
Subject:   [Bug 257670] RAS CONTROLLER: Fatal unrecoverable error detected with SAS3008
Message-ID:  <bug-257670-7@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D257670

            Bug ID: 257670
           Summary: RAS CONTROLLER: Fatal unrecoverable error detected
                    with SAS3008
           Product: Base System
           Version: CURRENT
          Hardware: arm64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: arm
          Assignee: freebsd-arm@FreeBSD.org
          Reporter: daniel@morante.net
 Attachment #227004 text/plain
         mime type:

Created attachment 227004
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D227004&action=
=3Dedit
capture of boot via serial

I am testing FreeBSD-14.0-CURRENT-arm64-aarch64-20210805-f3a3b061216-248478=
 on
a Cavium ThunderX2 (Gigabyte R281-T91).  This system has an onboard SAS3008
PCI-Express Fusion-MPT SAS-3 controller.=20=20

```
mpr0@pci0:14:0:0:       class=3D0x010700 rev=3D0x02 hdr=3D0x00 vendor=3D0x1=
000
device=3D0x0097 subvendor=3D0x1458 subdevice=3D0x3008
    vendor     =3D 'Broadcom / LSI'
    device     =3D 'SAS3008 PCI-Express Fusion-MPT SAS-3'
    class      =3D mass storage
    subclass   =3D SAS
```

I load the `mpr` driver by having `mpr_load=3D"YES"` in `/boot/loader.conf`=
.  So
far so good except for the weird messages in dmesg. (see attachment)

There are currently 8 HDD's attached to it and I setup 3 ZFS pools.  This g=
oes
well until I finally start to put some load on them.  The system kernel pan=
ics
and halts with the following in dmesg:

```
mpr0: IOC Fault 0x4000265d, Resetting
mpr0: Reinitializing controller
...
RAS CONTROLLER: Fatal unrecoverable error detected
```

This is not to say the problem is with ZFS.  I suspect the mpr driver is ju=
st
unstable.

The system can no longer boot into multi user mode.  It kernel panics with =
the
same error as soon as it tries to start ZFS.

```
mountroot: waiting for device /dev/nda0p2...
WARNING: / was not properly dismounted
Dual Console: Video Primary, Serial Secondary
witness_lock_list_get: witness exhausted
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
RAS CONTROLLER: Fatal unrecoverable error detected

        *** NBU Error ***
...
```

In order to get a functional system I disable ZFS in `/etc/rc.conf` while in
single user mode.

Now back in multi user mode I can do a `service zfs onestart` and try to im=
port
one of the pools.  The system then kernel panics again.

I detail the full specs of this system in bug #254651 (where I have a probl=
em
with the onboard SATA controllers) and in my forum post at
https://forums.freebsd.org/threads/aarch64-trouble-with-cn99xx-ahci-and-fas=
tlinq-ql41000-controllers.79556/
(where I explain the lack of a driver for the onboard Ethernet).

Also, for some weird reason I can no longer boot 13.0-RELEASE on this syste=
m.=20
It panics with "panic: NVME polled command failed to complete within 10s". I
think it doesn't like the add-on PCIe NVME.  However when it was working (p=
rior
to adding in the NVME) the SAS controller was just as unstable.

Seeing how most of the hardware is still very new, I don't expect FreeBSD
(especcially arm64) to support it.  I'd like to help anyway that I can shou=
ld
someone be interested. The system has an IPMI and I'd be willing to offer
remote access to it for as long as it's required via VPN (if that's a thing
that's normally done) on a dedicated network with any other required
resources).

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-257670-7>