Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Apr 2023 02:08:36 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 270943] Complete system freeze on Asus dual socket AMD 7742 system
Message-ID:  <bug-270943-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D270943

            Bug ID: 270943
           Summary: Complete system freeze on Asus dual socket AMD 7742
                    system
           Product: Base System
           Version: 13.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: misc
          Assignee: bugs@FreeBSD.org
          Reporter: nb@synthcom.com

Created attachment 241605
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D241605&action=
=3Dedit
dmesg.boot For this system

I have a dual socket 7742 system (128 total real cores, 128 threads) that w=
ill
completely lock up the system in under an hour if left idle. By "lock up", =
this
means:

* Console unresponsive (no keyboard/USB/numlock)
* Networking unresponsive (no pings, no arps, nothing)

Like it's "jumping to self" with all interrupts disabled. The system needs =
to
be reset or power cycled. I have tried the following distributions over the
last few months with the same results:

FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64
FreeBSD 13.1
FreeBSD 13.0
FreeBSD 12.3
Several memstick images of 14.0 since December 2022

Other notes:

* The lockup is guaranteed. I've never had it not lock up when left idle.
Always locks up in <1 hour (usually in 10-20 minutes).

* If I run a "stress" program, the system runs for days at a time without a=
ny
observed lockups. If there's any significant system activity, it appears to=
 not
lock up.

* At one point (on a 14.0 build) I was able to get the kernel debugger comp=
iled
in. When the system locked up, hitting the local USB keyboard sequence to g=
et
in to the kernel debugger worked. This also seemed to unlock the system, as
after I exited the kernel debugger, the system was alive again.

* I've installed the OSes on either 2GB M.2 Samsung SSDs *OR* on a Western
Digital SN200 NVME disk. No changes in behavior. Storage does not appear to=
 be
a factor.

* I've halved the memory and swapped DIMMs entirely. No change.

System specs:

Motherboard         : Asus rs700a-e11-rs12u-wocpu009z
CPUs                : Dual AMD 7742 CPUs
BIOS Version        : 0901
BMC Firmware model  : RS700A-E11-RS12U
BMC Firmware version: 1.2.15
Installed ECC memory: 512GB
Storage             : Two Samsung EVO 980 TB M.2 SSDs, and a WD SN200 7.68TB
NVME U.2 disk

Video is the ASpeed AST2500, which supplies video for the system.=20

I'd be happy to put this system on the internet and allow any and all
interested parties access to it for troubleshooting/debugging. Thank you!

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-270943-227>