Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Dec 2021 20:55:13 +0000
From:      bugzilla-noreply@freebsd.org
To:        x11@FreeBSD.org
Subject:   [Bug 237544] graphics/drm-fbsd12.0-kmod: panic on 12-STABLE with Radeon HD 7450 (but not with drm-fbsd11.2-kmod)
Message-ID:  <bug-237544-7141-EXRlwpzS00@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-237544-7141@https.bugs.freebsd.org/bugzilla/>
References:  <bug-237544-7141@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D237544

--- Comment #11 from Bill Paul <noisetube@gmail.com> ---
So, since I'm off work this week and have not much else to do, I decided to=
 try
isolating the actual problem here. Now that I have a known working set of c=
ode
(drm-fbsd11.2-kmod) I thought I could compare it to the non-working code
(drm-fbsd12.0-kmod) and gradually bisect things to narrow down the fault

After much hair-pulling and gnashing of teeth, I finally isolated things do=
wn
to the dma-fence module in the linuxkpi code.

Here's what I tried:

- Replaced the contents of the drivers/gpu/drm/radeon directory in
drm-fbsd12.0-kmod with the contents from the radeon directory in
drm-fbsd11.2-kmod
- Result: no change, panic still occurred

- Replaced the contents of the drivers/gpu/drm/ttm directory in
drm-fbsd12.0-kmod with the contents of the drm directory in drm-fbsd11.2-km=
od
(as well as the associated header files)
- Result: no change, panic still occurred

- Replaced the contents of the linuxkpi and drivers/gpu/drm/ttm directories=
 in
drm-fbsd12.0-kmod with the contents of linuxkpi and ttm directories from
drm-fbsd11.2-kmod (as well as the associated header files)
- Result: No panic

- Replaced _just_ the contents of the linuxkpi directory in drm-fbsd12.0-km=
od
with the contents of the linuxkpi directory in drm-fbsd11.2-kmod (this time
taking care to preserve the ttm module; they are somewhat tightly coupled so
this took a bit more effort)
- Result: No panic

- Replaced _just_ the dma-fence.h and linux_dmafence.c modules in the linux=
kpi
directory in drm-fbsd12.0-kmod with the ones from drm-fbsd11.2-kmod, and al=
so
tweaked linux_synx_file.c a little (it uses an API from the 12.0 code which
isn't in the 11.2 code)
- Result: No panic

I'm still not exactly sure what's wrong here, but there seems to be a probl=
em
in the dma-fence module with locking and/or reference counting that causes
fence structures to be deleted unexpectedly. This is what leads to the trap=
s on
bad pointers.

I created a custom tarball of the drm-fbsd12.0-kmod port which includes pat=
ches
to the 4.16 FreeBSDDesktop 4.16 code to revert the dma-fence code as descri=
bed
above. You can download it from here:

http://people.freebsd.org/~wpaul/radeon/drm-fbsd12.0-kmod.tar.gz

The specific things I did are:

1) Replaced dma-fence.h and linux_dmafence.c in the drm-fbsd12.0-kmod port =
with
the versions drm-fbsd11.2-kmod.

2) Added a compat wrapper function in dma-fence.h for dma_fence_get_rcu_saf=
e()
which just calls dma_fence_get_rcu().

3) Added a compat macro in dma-fence.h for dma_fence_is_signaled_locked() w=
hich
just calls dma_fence_is_signaled()

4) In linux_sync_file.c, changed the sync_fill_fence_info() function back to
how it looked in the 11.2 codebase, because it uses dma_fence_get_status() =
and
DMA_FENCE_FLAG_TIMESTAMP_BIT, which were not available in the older 11.2
dma-fence code

Just unpack the tarball under /usr/ports/graphics in place of the old one a=
nd
then run make, followed by "make deinstall" and "make reinstall".

It occurred to me that instead of taking the older 11.2 dma-fence module and
porting it forward, it might make more sense to take the 13.0 module and po=
rt
it back. But this assumes that the drm-fbsd13.0-kmod code doesn't have the =
same
stability problem it in as drm-fbsd12.0-kmod, and I don't know if that's tr=
ue.
(So far nobody has said whether or not they're using a Radeon card with 13.0
and whether or not they've encountered the same problems.) I may still try =
this
anyway if I'm still sufficiently bored.

So far I've tested this on two devices:

vgapci0@pci0:1:0:0:     class=3D0x030000 card=3D0x21261028 chip=3D0x68f9100=
2 rev=3D0x00
hdr=3D0x00
    vendor     =3D 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     =3D 'Cedar [Radeon HD 5000/6000/7350/8350 Series]'
    class      =3D display
    subclass   =3D VGA

vgapci0@pci0:0:1:0: class=3D0x030000 card=3D0x168b103c chip=3D0x96481002 re=
v=3D0x00
hdr=3D0x00
vendor =3D 'Advanced Micro Devices, Inc. [AMD/ATI]'
device =3D 'Sumo [Radeon HD 6480G]'
class =3D display
subclass =3D VGA

I'm using the machine with the CEDAR device right now. The laptop with the =
SUMO
device is much more prone to crashing. Usually what I do to provoke it is:

- Boot and load the driver
- Plug in my phone and set up tethering over USB
- Start KDE5
- Start Firefox
- Browse Facebook or Reddit for a while

It usually panics within a few minutes.

Lastly, I have a question: I followed up to this particular PR because the =
it
seemed to most closely match the problems I was having, but it's been close=
d.
Should I open a new PR? This bug is still present with 12.3 and I'm clearly=
 not
the only one affected by it. (I also still can't explain why it doesn't see=
m to
affect the i915kms driver.)

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-237544-7141-EXRlwpzS00>