From nobody Sun Jan 9 23:30:52 2022 X-Original-To: x11@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id B6836192ED2B for ; Sun, 9 Jan 2022 23:30:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JXCrD3ypfz4vXc for ; Sun, 9 Jan 2022 23:30:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 67B481B0E2 for ; Sun, 9 Jan 2022 23:30:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 209NUqaP025421 for ; Sun, 9 Jan 2022 23:30:52 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 209NUq5k025420 for x11@FreeBSD.org; Sun, 9 Jan 2022 23:30:52 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: x11@FreeBSD.org Subject: [Bug 253461] LinuxKPI: [AMD/ATI] RV730 PRO [Radeon HD 4650] crashes kernel Date: Sun, 09 Jan 2022 23:30:52 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.2-RELEASE X-Bugzilla-Keywords: crash, needs-qa X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: noisetube@gmail.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? maintainer-feedback? maintainer-feedback? maintainer-feedback? mfc-stable13? mfc-stable12? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: X11 List-Archive: https://lists.freebsd.org/archives/freebsd-x11 List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-x11@freebsd.org X-BeenThere: freebsd-x11@freebsd.org MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1641771052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kaKyNHWEZGeNBl44nRRqRgbZRJQFTJ0Waqssj0RP0qE=; b=tafHOc+dRl/OFFpf2eu6+x/Jm8fND2MCJcZ4LoA/bP2nNhgU4TL6z/72Kbfr/ZIkiHI6tS +NqqlDV5Z3ZtD9iCNwWG9ioy/vk2hW67C9WnIwoaE7G4EnjAHZ1Af3birWVbNiweRzqNj0 9r8Om7abFJjd6c4SJF8v8ZlMjceqbSZHwEojSTTNytQu6Y/XtiW5COwOv3vX83AMJqDxhc QFPFbBwPg6tFhpWvEexa48lnDxcYpADc8FYEzRj68nE/B3OYU1mJCTmxhc7nUAFqLQnjq8 DwIRgrWr5n9Lg+cl2LPV1szd9Mj3ytWiZfgQMJ5b2OvyjJyQeZTTKMhZO9gNMg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1641771052; a=rsa-sha256; cv=none; b=AK0EE1hZqw08fGWw7bpWyY7VP6fOQrV7lEFBL6Z7+J85KZrVDdZv7roX2vynASaKa31ePf yBkt98Wf7DysUevfbRPOqAV70eU+eSqIpiP/NBorm+wBalx/F4L/nOsFglDzhhldfMiy9q Y0hthv8bdgeXYpNVlZcVr8OoR756+0t8TPvdwjwqH5X9h7hAtXeNHZOfRxBKwS6QYXYYMt OqO7DgDtM68rbH11uU8yVV9pph2RF/Hd0QsWqZQ2nrj0Z7QWLVxOUqlsZrI7ch2Jb2OBmm cJkkL5c0BOfBmU1oKpLPOOl+VHDhCrXJ4vCMifo9kOj6okFMXBYFG9WlKuOUqQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D253461 --- Comment #9 from Bill Paul --- (In reply to Vladimir Kondratyev from comment #7) It looked to me that originally Linux had the dma_fence_signal() API and la= ter a new API dma_fence_signal_locked() was added. According to what I've read,= the idea is that dma_fence_signal_locked() can be used if the caller is already holding the DMA fence object spinlock, while the older dma_fence_signal() function takes the lock for you. The question here is: when you signal a dma fence object, and you invoke its attached callout routines, do you hold the spinlock or do you drop it? The older linuxkpi code in drm-fbsd11.2-kmod was based on Linux 4.11 and on= ly had the dma_fence_signal() API, and that code always held the fence spinlock when invoking the callouts. In drm-fbsd12.0-kmod, based on Linux 4.16, both dma_fence_signal() and dma_fence_signal_locked() are present. HOWEVER, the logic is now such that = both functions drop the dma fence spinlock when calling the callouts. This changes the behavior of dma_fence_signal(), and I think the change was wrong (though likely unintentional). Now, dma_fence_signal() drops the spin= lock when invoking the callouts. This does not seem to harm the Intel i915kms.ko driver, but it seems to cause the radeonkms.ko driver driver to panic when = the system is under load. I must assume that dropping the lock leads to a race condition when two different threads try to access the same dma fence objec= t. If you browse the most recent Linux kernel code, you can also see that this behavior is inconsistent with the native Linux implementations of dma_fence_signal() and dma_fence_signal_locked(): https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/dma-fence.c#= L376 The dma_fence_signal_timestamp_locked() function shown here is used by both dma_fence_signal() and dma_fence_signal_locked(). dma_fence_signal() takes = the fence spinlock before calling it. Note that the fence spinlock is _not_ released when invoking the callbacks. >From this I am forced to conclude: - When calling dma_fence_signal(), the fence spinlock is supposed to be held until the function returns, including when the callbacks are called. - When calling dma_fence_signal_locked(), the same is true, except it is the caller that's expected to take the fence spinlock. - The current behavior in drm-fbsd12.0-kmod where the lock is dropped when invoking the callouts is therefore wrong on two counts: it deviates from the Linux behavior, which breaks synchronization in the Radeon driver. I think my fix preserves the expected behavior of both routines, because dma_fence_signal_unlocked() does not call dma_fence_signal_unlocked_sub() w= ith the spinlock held, while dma_fence_signal() does. My office machine with the CAICOS chipset has been running with this fix fo= r a week now and has been stable. I've also been using the same fix on my laptop with the SUMO chipset with the same fix for a bit longer and it also hasn't crashed. Before the laptop would not last more than 5 minutes before it wou= ld panic. -Bill --=20 You are receiving this mail because: You are on the CC list for the bug.=