Date: Sun, 09 Jan 2022 23:30:52 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 253461] LinuxKPI: [AMD/ATI] RV730 PRO [Radeon HD 4650] crashes kernel Message-ID: <bug-253461-227-2PP9erpxTC@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-253461-227@https.bugs.freebsd.org/bugzilla/> References: <bug-253461-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D253461 --- Comment #9 from Bill Paul <noisetube@gmail.com> --- (In reply to Vladimir Kondratyev from comment #7) It looked to me that originally Linux had the dma_fence_signal() API and la= ter a new API dma_fence_signal_locked() was added. According to what I've read,= the idea is that dma_fence_signal_locked() can be used if the caller is already holding the DMA fence object spinlock, while the older dma_fence_signal() function takes the lock for you. The question here is: when you signal a dma fence object, and you invoke its attached callout routines, do you hold the spinlock or do you drop it? The older linuxkpi code in drm-fbsd11.2-kmod was based on Linux 4.11 and on= ly had the dma_fence_signal() API, and that code always held the fence spinlock when invoking the callouts. In drm-fbsd12.0-kmod, based on Linux 4.16, both dma_fence_signal() and dma_fence_signal_locked() are present. HOWEVER, the logic is now such that = both functions drop the dma fence spinlock when calling the callouts. This changes the behavior of dma_fence_signal(), and I think the change was wrong (though likely unintentional). Now, dma_fence_signal() drops the spin= lock when invoking the callouts. This does not seem to harm the Intel i915kms.ko driver, but it seems to cause the radeonkms.ko driver driver to panic when = the system is under load. I must assume that dropping the lock leads to a race condition when two different threads try to access the same dma fence objec= t. If you browse the most recent Linux kernel code, you can also see that this behavior is inconsistent with the native Linux implementations of dma_fence_signal() and dma_fence_signal_locked(): https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/dma-fence.c#= L376 The dma_fence_signal_timestamp_locked() function shown here is used by both dma_fence_signal() and dma_fence_signal_locked(). dma_fence_signal() takes = the fence spinlock before calling it. Note that the fence spinlock is _not_ released when invoking the callbacks. >From this I am forced to conclude: - When calling dma_fence_signal(), the fence spinlock is supposed to be held until the function returns, including when the callbacks are called. - When calling dma_fence_signal_locked(), the same is true, except it is the caller that's expected to take the fence spinlock. - The current behavior in drm-fbsd12.0-kmod where the lock is dropped when invoking the callouts is therefore wrong on two counts: it deviates from the Linux behavior, which breaks synchronization in the Radeon driver. I think my fix preserves the expected behavior of both routines, because dma_fence_signal_unlocked() does not call dma_fence_signal_unlocked_sub() w= ith the spinlock held, while dma_fence_signal() does. My office machine with the CAICOS chipset has been running with this fix fo= r a week now and has been stable. I've also been using the same fix on my laptop with the SUMO chipset with the same fix for a bit longer and it also hasn't crashed. Before the laptop would not last more than 5 minutes before it wou= ld panic. -Bill --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-253461-227-2PP9erpxTC>