Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 09 Jan 2022 23:30:52 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 253461] LinuxKPI: [AMD/ATI] RV730 PRO [Radeon HD 4650] crashes kernel
Message-ID:  <bug-253461-227-2PP9erpxTC@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-253461-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-253461-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D253461

--- Comment #9 from Bill Paul <noisetube@gmail.com> ---
(In reply to Vladimir Kondratyev from comment #7)

It looked to me that originally Linux had the dma_fence_signal() API and la=
ter
a new API dma_fence_signal_locked() was added. According to what I've read,=
 the
idea is that dma_fence_signal_locked() can be used if the caller is already
holding the DMA fence object spinlock, while the older dma_fence_signal()
function takes the lock for you.

The question here is: when you signal a dma fence object, and you invoke its
attached callout routines, do you hold the spinlock or do you drop it?

The older linuxkpi code in drm-fbsd11.2-kmod was based on Linux 4.11 and on=
ly
had the dma_fence_signal() API, and that code always held the fence spinlock
when invoking the callouts.

In drm-fbsd12.0-kmod, based on Linux 4.16, both dma_fence_signal() and
dma_fence_signal_locked() are present. HOWEVER, the logic is now such that =
both
functions drop the dma fence spinlock when calling the callouts.

This changes the behavior of dma_fence_signal(), and I think the change was
wrong (though likely unintentional). Now, dma_fence_signal() drops the spin=
lock
when invoking the callouts. This does not seem to harm the Intel i915kms.ko
driver, but it seems to cause the radeonkms.ko driver driver to panic when =
the
system is under load. I must assume that dropping the lock leads to a race
condition when two different threads try to access the same dma fence objec=
t.

If you browse the most recent Linux kernel code, you can also see that this
behavior is inconsistent with the native Linux implementations of
dma_fence_signal() and dma_fence_signal_locked():

https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/dma-fence.c#=
L376

The dma_fence_signal_timestamp_locked() function shown here is used by both
dma_fence_signal() and dma_fence_signal_locked(). dma_fence_signal() takes =
the
fence spinlock before calling it. Note that the fence spinlock is _not_
released when invoking the callbacks.

>From this I am forced to conclude:

- When calling dma_fence_signal(), the fence spinlock is supposed to be held
until the function returns, including when the callbacks are called.
- When calling dma_fence_signal_locked(), the same is true, except it is the
caller that's expected to take the fence spinlock.
- The current behavior in drm-fbsd12.0-kmod where the lock is dropped when
invoking the callouts is therefore wrong on two counts: it deviates from the
Linux behavior, which breaks synchronization in the Radeon driver.

I think my fix preserves the expected behavior of both routines, because
dma_fence_signal_unlocked() does not call dma_fence_signal_unlocked_sub() w=
ith
the spinlock held, while dma_fence_signal() does.

My office machine with the CAICOS chipset has been running with this fix fo=
r a
week now and has been stable. I've also been using the same fix on my laptop
with the SUMO chipset with the same fix for a bit longer and it also hasn't
crashed. Before the laptop would not last more than 5 minutes before it wou=
ld
panic.

-Bill

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-253461-227-2PP9erpxTC>