Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Apr 2015 09:02:12 -0500
From:      Jason Harmening <jason.harmening@gmail.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Svatopluk Kraus <onwahe@gmail.com>,  FreeBSD Arch <freebsd-arch@freebsd.org>
Subject:   Re: bus_dmamap_sync() for bounced client buffers from user address space
Message-ID:  <553B9E64.8030907@gmail.com>
In-Reply-To: <20150425094152.GE2390@kib.kiev.ua>
References:  <CAFHCsPXMjge84AR2cR8KXMXWP4kH2YvuV_uqtPKUvn5C3ygknw@mail.gmail.com> <CAM=8qan-4SbKJaddrfkv=HG3n%2BHaOPDL5MEPS9DoaTvnhrJPZQ@mail.gmail.com> <20150425094152.GE2390@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


On 04/25/15 04:41, Konstantin Belousov wrote:
> On Fri, Apr 24, 2015 at 05:50:15PM -0500, Jason Harmening wrote:
>> A couple of comments:
>>
>> --POSTWRITE and POSTREAD are only asynchronous if you call them from a=
n
>> asynchronous context.
>> For a driver that's already performing DMA operations on usermode memo=
ry,
>> it seems likely that there's going to be *some* place where you can ca=
ll
>> bus_dmamap_sync() and be guaranteed to be executing in the context of =
the
>> process that owns the memory.  Then a regular bcopy will be safe and
>> inexpensive, assuming the pages have been properly vslock-ed/vm_map_wi=
re-d.
>> That's usually whatever read/write/ioctl operation spawned the DMA tra=
nsfer
>> in the first place.  So, in those cases can you not just move the
>> POSTREAD/POSTWRITE sync from the "DMA-finished" interrupt to the
>> d_read/d_write/d_ioctl that waits on the "DMA-finished" interrupt?
>>
>> --physcopyin/physcopyout aren't trivial.  They go through uiomove_from=
phys,
>> which often uses sfbufs to create temporary KVA mappings for the physi=
cal
>> pages.  sf_buf_alloc() can sleep (unless SFB_NOWAIT is specified, whic=
h
>> means it can fail and which uiomove_fromphys does not specify for good=

>> reason); that makes it unsafe for use in either a threaded interrupt o=
r a
>> filter.  Perhaps the physcopyout path could be changed to use pmap_qen=
ter
>> directly in this case, but that can still be expensive in terms of TLB=

>> shootdowns.
>>
>> Checking against VM_MIN_KERNEL_ADDRESS seems sketchy; it eliminates th=
e
>> chance to use a much-less-expensive bcopy in cases where the sync is
>> happening in correct process context.
>>
>> Context-switching during bus_dmamap_sync() shouldn't be an issue.  In =
a
>> filter interrupt, curproc will be completely arbitrary but none of thi=
s
>> stuff should be called in a filter anyway.  Otherwise, if you're in a
>> kernel thread (including an ithread), curproc should be whatever proc =
was
>> supplied when the thread was created.  That's usually proc0, which onl=
y has
>> kernel address space.  IOW, even if a context-switch happens sometime
>> during bus_dmamap_sync, any pmap check or copy should have a consisten=
t and
>> non-arbitrary process context.
>>
>> I think something like your second solution would be workable to make
>> UIO_USERSPACE syncs work in non-interrupt kernel threads, but given al=
l the
>> restrictions and extra cost of physcopy, I'm not sure how useful that =
would
>> be.
>>
>> I do think busdma.9 could at least use a note that bus_dmamap_sync() i=
s
>> only safe to call in the context of the owning process for user buffer=
s.
> UIO_USERSPACE for busdma is absolutely unsafe and cannot be used withou=
t
> making kernel panicing.  Even if you wire the userspace bufer, there is=

> nothing which could prevent other thread in the user process, or other
> process sharing the same address space, to call munmap(2) on the range.=

>
> The only safe method to work with the userspace regions is to
> vm_fault_quick_hold() them to get hold on the pages, and then either
> pass pages array down, or remap them in the KVA with pmap_qenter().
I was under the impression that any attempt to free or munmap the
virtual range would block waiting for the underlying pages to be unwired.=

That's certainly the behavior I've seen with free; is it not the case
with munmap?  Either way, I haven't walked the code to see where the
blocking is implemented.

If UIO_USERSPACE truly is unsafe to use with busdma, then we need to
either make it safe or stop documenting it in the manpage.
Perhaps the bounce buffer logic could use copyin/copyout for userspace,
if in the right process?  Or just always use physcopy for non-KVA as in
the first suggestion?

It seems like in general it is too hard for drivers using busdma to deal
with usermode memory in a way that's both safe and efficient:
--bus_dmamap_load_uio + UIO_USERSPACE is apparently really unsafe
--if they do things the other way and allocate in the kernel, then then
they better either be willing to do extra copying, or create and
refcount their own vm_objects and use d_mmap_single (I still haven't
seen a good example of that), or leak a bunch of memory (if they use
d_mmap), because the old device pager is also really unsafe.

>
>>
>> On Fri, Apr 24, 2015 at 8:13 AM, Svatopluk Kraus <onwahe@gmail.com> wr=
ote:
>>
>>> DMA can be done on client buffer from user address space. For example=
,
>>> thru bus_dmamap_load_uio() when uio->uio_segflg is UIO_USERSPACE. Suc=
h
>>> client buffer can bounce and then, it must be copied to and from
>>> bounce buffer in bus_dmamap_sync().
>>>
>>> Current implementations (in all archs) do not take into account that
>>> bus_dmamap_sync() is asynchronous for POSTWRITE and POSTREAD in
>>> general. It can be asynchronous for PREWRITE and PREREAD too. For
>>> example, in some driver implementations where DMA client buffers
>>> operations are buffered. In those cases, simple bcopy() is not
>>> correct.
>>>
>>> Demonstration of current implementation (x86) is the following:
>>>
>>> -----------------------------
>>> struct bounce_page {
>>>     vm_offset_t vaddr;      /* kva of bounce buffer */
>>>     bus_addr_t  busaddr;    /* Physical address */
>>>     vm_offset_t datavaddr;  /* kva of client data */
>>>     bus_addr_t  dataaddr;   /* client physical address */
>>>     bus_size_t  datacount;  /* client data count */
>>>     STAILQ_ENTRY(bounce_page) links;
>>> };
>>>
>>>
>>> if ((op & BUS_DMASYNC_PREWRITE) !=3D 0) {
>>>     while (bpage !=3D NULL) {
>>>         if (bpage->datavaddr !=3D 0) {
>>>             bcopy((void *)bpage->datavaddr,
>>>                 (void *)bpage->vaddr,
>>>                 bpage->datacount);
>>>         } else {
>>>             physcopyout(bpage->dataaddr,
>>>                 (void *)bpage->vaddr,
>>>                 bpage->datacount);
>>>         }
>>>         bpage =3D STAILQ_NEXT(bpage, links);
>>>     }
>>>     dmat->bounce_zone->total_bounced++;
>>> }
>>> -----------------------------
>>>
>>> There are two things:
>>>
>>> (1) datavaddr is not always kva of client data, but sometimes it can
>>> be uva of client data.
>>> (2) bcopy() can be used only if datavaddr is kva or when map->pmap is=

>>> current pmap.
>>>
>>> Note that there is an implication for bus_dmamap_load_uio() with
>>> uio->uio_segflg set to UIO_USERSPACE that used physical pages are
>>> in-core and wired. See "man bus_dma".
>>>
>>> There is not public interface to check that map->pmap is current pmap=
=2E
>>> So one solution is the following:
>>>
>>> if (bpage->datavaddr >=3D VM_MIN_KERNEL_ADDRESS) {
>>>         bcopy();
>>> } else {
>>>         physcopy();
>>> }
>>>
>>> If there will be public pmap_is_current() then another solution is th=
e
>>> following:
>>>
>>> if (bpage->datavaddr !=3D 0) && pmap_is_current(map->pmap)) {
>>>         bcopy();
>>> } else {
>>>         physcopy();
>>> }
>>>
>>> The second solution implies that context switch must not happen durin=
g
>>> bus_dmamap_sync() called from an interrupt routine. However, IMO, it'=
s
>>> granted.
>>>
>>> Note that map->pmap should be always kernel_pmap for datavaddr >=3D
>>> VM_MIN_KERNEL_ADDRESS.
>>>
>>> Comments, different solutions, or objections?
>>>
>>> Svatopluk Kraus
>>> _______________________________________________
>>> freebsd-arch@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.or=
g"
>>>
>> _______________________________________________
>> freebsd-arch@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org=
"



--eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQF8BAEBCgBmBQJVO55kXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw
MDAwMDAwMDAwMDAwMDAwAAoJELufi/mShB0bH3AH/iynIZttCSfpAjLURApDguQB
rss1/oNNIfDpigv4BC2AYLs144GyIfn8k8mAMzpiNQK1p+6cI3e5YC5Luo4KEyeB
aenxiu6TuvEEPRVVsJaDjAt6J0ZTM4KQpiwokE4hHlib/hT2YQ9gQKVjYMPKHDQI
D0jc0lEQbbDBZmLRMeFzk9gVy8/YmJTt6dGxkS0mttZrjuni05lELdPgIb250KZW
rvs1irjEcztCbMUj6pGoaenhD/8arlSehxjKOmEP3HhYx7L6z/S+vOz5/rPXxYd/
6/qpJwmuX14as0qUabcOHBl0Yd1nww0iUSwHlQhD2/0cTLN5NrbhIbIKkQIBbq0=
=4s0y
-----END PGP SIGNATURE-----

--eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?553B9E64.8030907>