Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Apr 2019 18:54:40 +0200
From:      Niclas Zeising <zeising@freebsd.org>
To:        Andrey Fesenko <f0andrey@gmail.com>, Tycho Nightingale <tychon@freebsd.org>
Cc:        "freebsd-x11@freebsd.org" <freebsd-x11@freebsd.org>, Johannes Lundberg <johalun@freebsd.org>
Subject:   Re: dmar, dma_pool, etc
Message-ID:  <a38cb18a-4cc0-7fb9-f5fb-03e20aad2b94@freebsd.org>
In-Reply-To: <4e89ea00-7439-b0ea-3614-ee344d3fe074@freebsd.org>
References:  <e0415524-3126-5ea2-c2e2-3d3dccc6832e@FreeBSD.org> <9E2356CF-6483-4C06-B4A8-0120088063FE@freebsd.org> <60b447bb-81da-4c01-e164-bdf10e5560b0@freebsd.org> <594E1E71-6834-431E-B122-005E64EDB1C2@freebsd.org> <3a07ffef-a978-2fdd-8d54-85fc0b6f3a63@freebsd.org> <23fe1183-d12c-b4b8-958f-34cee6e33977@freebsd.org> <9E61210C-4939-4D3A-8110-72023B67BBE6@freebsd.org> <C5E9EE94-00F7-4F9D-AFAC-48609F635C8D@freebsd.org> <CA%2BK5SrOcqKSvyuQaC5zfT-Wum7%2B%2B_XsFQwWSmwDTVH1qu7E2kg@mail.gmail.com> <4e89ea00-7439-b0ea-3614-ee344d3fe074@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-04-30 14:54, Niclas Zeising wrote:
> On 2019-04-30 14:25, Andrey Fesenko wrote:
>> .On Tue, Apr 30, 2019 at 2:16 AM Tycho Nightingale=20
>> <tychon@freebsd.org> wrote:
>>>
>>>
>>> Hi,
>>>
>>>> On Apr 29, 2019, at 4:24 PM, Tycho Nightingale <tychon@freebsd.org>=20
>>>> wrote:
>>>>
>>>>
>>>>> On Apr 29, 2019, at 2:34 PM, Niclas Zeising <zeising@freebsd.org>=20
>>>>> wrote:
>>>>>
>>>>> On 2019-04-29 20:27, Niclas Zeising wrote:
>>>>>> On 2019-04-29 18:00, Tycho Nightingale wrote:
>>>>>>>> On Apr 29, 2019, at 11:06 AM, Niclas Zeising=20
>>>>>>>> <zeising@FreeBSD.org> wrote:
>>>>>>>>
>>>>>>>> As a side note, I can readily reproduce the hang on a spare=20
>>>>>>>> laptop, please let me know if I can help in testing or=20
>>>>>>>> diagnosing in any way.
>>>>>>>
>>>>>>>
>>>>>>> If you can readily reproduce the hang, since there are 2 halves=20
>>>>>>> that comprised the fix (the DMA pool and non-pool mappings) it=20
>>>>>>> would be instructive to try reverting either dmapool.h or=20
>>>>>>> dma-mapping.h independently to see if that helps.
>>>>>>>
>>>>>> Hi!
>>>>>> I will test this and report back.=C2=A0 Thank you!
>>>>>> Regards
>>>>>
>>>>> Hi!
>>>>> Just reverting dmapool.h or dma-mapping.h doesn't work, it won't=20
>>>>> build. I need to revert more than that.=C2=A0 Can you help me figur=
e out=20
>>>>> what to revert in either case, or provide a patch?
>>>>
>>>> Thanks for trying.=C2=A0 I managed to setup my HW and reproduce this=
 and=20
>>>> I see your point that it doesn=E2=80=99t cleave in half perfectly.
>>>>
>>>> Since I=E2=80=99ve got a (non)-working test case, let me investigate=
 a bit=20
>>>> further.
>>>
>>> I believe I=E2=80=99ve figured this out.=C2=A0 At least in my case I=E2=
=80=99ve been able=20
>>> to eliminate the hangs with the patch included at the bottom of this=20
>>> email.
>>>
>>> The issue stems from the implementation of dma_map_sg().=C2=A0 Accord=
ing=20
>>> to the linux DMA documentation[1], entries of the scatter/gather list=
=20
>>> may be coalesced:
>>>
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 dma_map_sg(struct de=
vice *dev, struct scatterlist *sg, int=20
>>> nents, enum dma_data_direction direction)
>>>
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Returns: the number =
of DMA address segments mapped (this may=20
>>> be shorter
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 than <nents> passed =
in if some elements of the=20
>>> scatter/gather list are
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 physically or virtua=
lly adjacent and an IOMMU maps them with=20
>>> a single
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 entry).
>>>
>>> My implementation of dma_map_sg() does just that.=C2=A0 As it turns o=
ut=20
>>> there are several consumers of dma_map_sg(), e.g.=20
>>> i915_gem_map_dma_buf() and i915_gem_gtt_prepare_pages(), and=20
>>> mock_map_dma_buf() among others that aren=E2=80=99t compliant with th=
is=20
>>> documented API.=C2=A0 Going back to the non-coalesced version is like=
ly=20
>>> the only path forward as bugs in the callee redefine this API de fact=
o.
>>>
>>> If this addresses the hangs you are seeing, I will post this on=20
>>> Phabricator.
>>>
>>> Tycho
>>>
>>> [1] https://www.kernel.org/doc/Documentation/DMA-API.txt
>>>
>>> Index: sys/compat/linuxkpi/common/src/linux_pci.c
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>> --- sys/compat/linuxkpi/common/src/linux_pci.c=C2=A0 (revision 346687=
)
>>> +++ sys/compat/linuxkpi/common/src/linux_pci.c=C2=A0 (working copy)
>>> @@ -565,10 +565,8 @@
>>> =C2=A0 {
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct linux_dma_pri=
v *priv;
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct linux_dma_obj=
 *obj;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct scatterlist *dma_sg, *sg=
;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int dma_nents, error, nseg;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 size_t seg_len;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vm_paddr_t seg_phys, prev_phys_=
end;
>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct scatterlist *sg;
>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int error, i, nseg;
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bus_dma_segment_t se=
g;
>>>
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 priv =3D dev->dma_pr=
iv;
>>> @@ -580,25 +578,11 @@
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 return (0);
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
>>>
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sg =3D sgl;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 dma_sg =3D sg;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 dma_nents =3D 0;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 while (nents > 0) {
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 seg_phys =3D sg_phys(sg);
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 seg_len =3D sg->length;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 while (--nents > 0) {
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 prev_phys=
_end =3D sg_phys(sg) + sg->length;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sg =3D sg=
_next(sg);
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (prev_=
phys_end !=3D sg_phys(sg))
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 break;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 seg_len +=
=3D sg->length;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 }
>>> -
>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for_each_sg(sgl, sg, nents, i) =
{
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 nseg =3D -1;
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 mtx_lock(&priv->dma_lock);
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (_bus_dmamap_load_phys(priv->dmat, obj->dm=
amap,
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 seg_phys, seg_len, BUS_DMA_NOWAIT=
,
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &seg, &nseg) !=3D 0) {
>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sg_phys(sg), sg->length, 0, &seg,=
 &nseg) !=3D 0) {
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
 bus_dmamap_unload(priv->dmat, obj->dmamap);
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
 bus_dmamap_destroy(priv->dmat, obj->dmamap);
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
 mtx_unlock(&priv->dma_lock);
>>> @@ -607,14 +591,9 @@
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 mtx_unlock(&priv->dma_lock);
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 KASSERT(++nseg =3D=3D 1, ("More than one segm=
ent=20
>>> (nseg=3D%d)", nseg));
>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 sg_dma_address(sg) =3D seg.ds_addr;
>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
>>>
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 sg_dma_address(dma_sg) =3D seg.ds_addr;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 sg_dma_len(dma_sg) =3D seg.ds_len;
>>> -
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 dma_sg =3D sg_next(dma_sg);
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 dma_nents++;
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
>>> -
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 obj->dma_addr =3D sg=
_dma_address(sgl);
>>>
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mtx_lock(&priv->ptre=
e_lock);
>>> @@ -629,7 +608,7 @@
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 return (0);
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
>>>
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return (dma_nents);
>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return (nents);
>>> =C2=A0 }
>>>
>>> =C2=A0 void
>>
>> Thanks, this patch make system more stable Farefox apparently work=20
>> infinitely.
>>
>> However, a heavy application (3D game in wine) still hangs after a
>> while, once hung after about a minute, about another (after reboot)
>> ten minutes later, video artifacts appear before the crash
>>
>=20
> Hi!
> This used to work before the regression?
> This means that this patch doesn't solve all the issues with the dmar=20
> changes.=C2=A0 Which GPU are you using?
> Regards

Hi!
The patch has been updated and now lives as part of this review:=20
https://reviews.freebsd.org/D20097
Can you please test that version?  It requires an updated source tree to=20
current from today.
Thanks!
Regards
--=20
Niclas Zeising



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a38cb18a-4cc0-7fb9-f5fb-03e20aad2b94>