Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 07 Jul 2023 01:38:12 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 268393] system always reboots once from a powered off state
Message-ID:  <bug-268393-227-PhDHLe2UZs@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-268393-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-268393-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D268393

--- Comment #48 from Jonathan Vasquez <jon@xyinn.org> ---
Hey all,

So I spent a few hours today debugging this issue on 13.2-RELEASE and I have
interesting stuff to report.

TLDR:

1. There definitely seems to be a race condition somewhere with how either =
the
AMD Raven HDA Controller is being enumerated, or how it's being accessed.

2. I was able to build on John's idea regarding the delays and come up with
something that seems to no longer crash my system. Although I don't think it
might be an acceptable solution since it would introduce a delay to all
"hdac_intr_handler()" calls for any device that uses that function. But I'll
keep testing it locally to see if I notice any new types of weirdness (outs=
ide
of any known ones that I've experienced before this patch), and also becaus=
e I
don't want to have my system continuing to crash. A side note is that I ord=
ered
2 PCIe sound cards that I want to see if they are FreeBSD compatible, which
would help mitigate this issue if anything. Best case scenario, we fix this
issue, and I also end up having a better sounding sound card that's not the
on-board sound :).

3. We can experience different types of severity levels depending on the le=
ngth
of the delay.

-----

So this is how the patch looks like in order to allow my system to no longer
crash on first boot:

diff --git a/sys/dev/sound/pci/hda/hdac.c b/sys/dev/sound/pci/hda/hdac.c
index 9aa0e4bffdc8..e9d581a422cb 100644
--- a/sys/dev/sound/pci/hda/hdac.c
+++ b/sys/dev/sound/pci/hda/hdac.c
@@ -378,6 +378,11 @@ hdac_one_intr(struct hdac_softc *sc, uint32_t intsts)
 static void
 hdac_intr_handler(void *context)
 {
+       /*
+        * Add slight delay to avoid crashes with AMD Raven HDA Controllers
+        */
+       DELAY(5000);
+
        struct hdac_softc *sc;
        uint32_t intsts;


-----

- If there is no DELAY (the default), the system will crash.
- If there is a DELAY of 1000, the system won't crash, but we will see acce=
ss
errors! Which is revealing.

Example:

hdac2: <AMD Raven HDA Controller> mem 0xfc980000-0xfc987fff at device 0.6 on
pci19
hdac2: Unexpected unsolicited response from address 0: 00000000
hdac2: Unexpected unsolicited response from address 0: 00000000
hdac2: Unexpected unsolicited response from address 0: 00000000
hdac2: Unexpected unsolicited response from address 0: 00000000


- If there is a DELAY of 5000, the system won't crash, and we no longer see=
 any
errors.

In the situations where I don't use delays (and leading up to this reduced
solution), I was able to have the machine stop crashing if I added at least=
 4
printf statements lol. If I used 3 printf, it would crash. I suppose 4 prin=
tf
is relatively equal to a DELAY of 5000 for me.

As stated before, with the above patch, the machine no longer crashes for m=
e on
a cold boot. I was also able to access and use my pcm8 device immediately a=
nd
sound worked. This is progress.

I've attached the following files:

- bad.0.txt - Shows the access errors with a delay of 1000 with my previous
expanded debug messages.
- good.0.txt - Shows a good cold boot with a delay of 5000 with my previous
expanded debug messages.
- bad.1.txt - Shows the access errors with a delay of 1000 (minimal logging=
).

root@weshly:/usr/src # uname -a
FreeBSD weshly 13.2-RELEASE-p1 FreeBSD 13.2-RELEASE-p1 #23
releng/13.2-n254621-08b87f63a046-dirty: Thu Jul  6 21:22:10 EDT 2023=20=20=
=20=20
root@weshly:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

debugging on:

commit 08b87f63a046bd966bd0ed548211ae98ff50e638 (HEAD -> releng/13.2,
origin/releng/13.2)
Author: Gordon Tetlow <gordon@FreeBSD.org>
Date:   Tue Jun 20 22:40:02 2023 -0700

    Add UPDATING entries and bump version.

    Approved by:    so

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-268393-227-PhDHLe2UZs>