Date: Thu, 24 Mar 2022 20:48:18 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 262765] Random lockups, data loss, and poor I/O and sound quality after 95edb10b47fc1a919cd1687aaf16be9e14456c89 Message-ID: <bug-262765-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D262765 Bug ID: 262765 Summary: Random lockups, data loss, and poor I/O and sound quality after 95edb10b47fc1a919cd1687aaf16be9e14456c89 Product: Base System Version: CURRENT Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: tod.jackson@gmail.com This is way beyond my level, and one of the reasons I didn't want to move beyond 13.0. Reverting LinuxKPI: implement dma_sync_single_for_*, apply to (un)map singl= e/sg fixes all sorts of problems for me, but it's a big hammer that probably bre= aks things for everyone else. I have no idea who the culprit is. I first started having troubles in Linux a few years ago, and finally narro= wed it down here. It's entirely possible my firmware is broken, but is there anything we can do? My drm-kmod is aso devoid of panic(), but unluckily this doesn't manifest as panics. I had to make some stuff up or return (-ENOMEM) to accomodate these changes, but it's nothing of interest. This is really complicated because multiple drivers are trying to manage me= mory owned by the firmware, and they don't cooperate. I found my workaround, and it solves a sort of several year mystery, but ma= ybe we can do better. I don't even know what kind of quirk this could be. If I had to guess, the relevant parts are dma_sync_single_for_cpu and cache flushing. This is from some i915 documentation: Now the pagetables are a bit tricky. In the end, they're all in system memo= ry, but there are a few hoops to jump through to get at them. The GTT pagetables has just one level, so with a 4 byte entry size we need 2MB of contiguous pagetable space. The firmware allocates that for us from stolen memory (that is, a part of the system memory that is not listed in the e820 map, so it's= not managed by the Linux kernel). But we write these PTEs through an alias in t= he register mmio bar! The reason for that is to allow the SA to invalidate TLB= s. Note, though, that this only invalidates TLBs for cpu access. Any other acc= ess to the GTT (such as from the GT or the display block) has its own rules for= TLB invalidation. Also, on recent generations we need to (depending upon circumstances) manually invalidate the SA TLB by writing to a magic registe= r. To speed up map/unmap operations, we map that GTT PTE aliasing region in the mmio with wc (if this is possible, which means the cpu needs to support PAT= ). A lot of this is just stubbed or nonexistent right now, notably runtime PM = and the more complicated GT/engine bits. And we really have no idea what the Nv= idia driver is doing, aside from trying and failing to write in write-protected regions. I took this upstream, but nobody really cares because they don't w= ant to deal with a proprietary blob. scbus0 on ahcich1 bus 0: <TOSHIBA MQ02ABD100H HEF01D> at scbus0 target 0 lun 0 (pass0,ada0) <> at scbus0 target -1 lun ffffffff () scbus1 on ahciem0 bus 0: <AHCI SGPIO Enclosure 2.00 0001> at scbus1 target 0 lun 0 (pass1,ses0) <> at scbus1 target -1 lun ffffffff () scbus-1 on xpt0 bus 0: <> at scbus-1 target -1 lun ffffffff (xpt0) I can provide any relevant information, but I don't fully understand the problem. I'm on a few day old CURRENT with evadot's drm-subtree on top of i= t, but I don't think my drm-kmod grabs anything of interest from there. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-262765-227>