Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Mar 2022 20:48:18 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 262765] Random lockups, data loss, and poor I/O and sound quality after 95edb10b47fc1a919cd1687aaf16be9e14456c89
Message-ID:  <bug-262765-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D262765

            Bug ID: 262765
           Summary: Random lockups, data loss, and poor I/O and sound
                    quality after 95edb10b47fc1a919cd1687aaf16be9e14456c89
           Product: Base System
           Version: CURRENT
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: tod.jackson@gmail.com

This is way beyond my level, and one of the reasons I didn't want to move
beyond 13.0.

Reverting LinuxKPI: implement dma_sync_single_for_*, apply to (un)map singl=
e/sg
fixes all sorts of problems for me, but it's a big hammer that probably bre=
aks
things for everyone else. I have no idea who the culprit is.

I first started having troubles in Linux a few years ago, and finally narro=
wed
it down here. It's entirely possible my firmware is broken, but is there
anything we can do?

My drm-kmod is aso devoid of panic(), but unluckily this doesn't manifest as
panics. I had to make some stuff up or return (-ENOMEM) to accomodate these
changes, but it's nothing of interest.

This is really complicated because multiple drivers are trying to manage me=
mory
owned by the firmware, and they don't cooperate.

I found my workaround, and it solves a sort of several year mystery, but ma=
ybe
we can do better.

I don't even know what kind of quirk this could be. If I had to guess, the
relevant parts are dma_sync_single_for_cpu and cache flushing.

This is from  some i915 documentation:

Now the pagetables are a bit tricky. In the end, they're all in system memo=
ry,
but there are a few hoops to jump through to get at them. The GTT pagetables
has just one level, so with a 4 byte entry size we need 2MB of contiguous
pagetable space. The firmware allocates that for us from stolen memory (that
is, a part of the system memory that is not listed in the e820 map, so it's=
 not
managed by the Linux kernel). But we write these PTEs through an alias in t=
he
register mmio bar! The reason for that is to allow the SA to invalidate TLB=
s.
Note, though, that this only invalidates TLBs for cpu access. Any other acc=
ess
to the GTT (such as from the GT or the display block) has its own rules for=
 TLB
invalidation. Also, on recent generations we need to (depending upon
circumstances) manually invalidate the SA TLB by writing to a magic registe=
r.
To speed up map/unmap operations, we map that GTT PTE aliasing region in the
mmio with wc (if this is possible, which means the cpu needs to support PAT=
).

A lot of this is just stubbed or nonexistent right now, notably runtime PM =
and
the more complicated GT/engine bits. And we really have no idea what the Nv=
idia
driver is doing, aside from trying and failing to write in write-protected
regions. I took this upstream, but nobody really cares because they don't w=
ant
to deal with a proprietary blob.

scbus0 on ahcich1 bus 0:
<TOSHIBA MQ02ABD100H HEF01D>       at scbus0 target 0 lun 0 (pass0,ada0)
<>                                 at scbus0 target -1 lun ffffffff ()
scbus1 on ahciem0 bus 0:
<AHCI SGPIO Enclosure 2.00 0001>   at scbus1 target 0 lun 0 (pass1,ses0)
<>                                 at scbus1 target -1 lun ffffffff ()
scbus-1 on xpt0 bus 0:
<>                                 at scbus-1 target -1 lun ffffffff (xpt0)

I can provide any relevant information, but I don't fully understand the
problem. I'm on a few day old CURRENT with evadot's drm-subtree on top of i=
t,
but I don't think my drm-kmod grabs anything of interest from there.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-262765-227>