Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Apr 2015 17:50:15 -0500
From:      Jason Harmening <jason.harmening@gmail.com>
To:        Svatopluk Kraus <onwahe@gmail.com>
Cc:        FreeBSD Arch <freebsd-arch@freebsd.org>
Subject:   Re: bus_dmamap_sync() for bounced client buffers from user address space
Message-ID:  <CAM=8qan-4SbKJaddrfkv=HG3n%2BHaOPDL5MEPS9DoaTvnhrJPZQ@mail.gmail.com>
In-Reply-To: <CAFHCsPXMjge84AR2cR8KXMXWP4kH2YvuV_uqtPKUvn5C3ygknw@mail.gmail.com>
References:  <CAFHCsPXMjge84AR2cR8KXMXWP4kH2YvuV_uqtPKUvn5C3ygknw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
A couple of comments:

--POSTWRITE and POSTREAD are only asynchronous if you call them from an
asynchronous context.
For a driver that's already performing DMA operations on usermode memory,
it seems likely that there's going to be *some* place where you can call
bus_dmamap_sync() and be guaranteed to be executing in the context of the
process that owns the memory.  Then a regular bcopy will be safe and
inexpensive, assuming the pages have been properly vslock-ed/vm_map_wire-d.
That's usually whatever read/write/ioctl operation spawned the DMA transfer
in the first place.  So, in those cases can you not just move the
POSTREAD/POSTWRITE sync from the "DMA-finished" interrupt to the
d_read/d_write/d_ioctl that waits on the "DMA-finished" interrupt?

--physcopyin/physcopyout aren't trivial.  They go through uiomove_fromphys,
which often uses sfbufs to create temporary KVA mappings for the physical
pages.  sf_buf_alloc() can sleep (unless SFB_NOWAIT is specified, which
means it can fail and which uiomove_fromphys does not specify for good
reason); that makes it unsafe for use in either a threaded interrupt or a
filter.  Perhaps the physcopyout path could be changed to use pmap_qenter
directly in this case, but that can still be expensive in terms of TLB
shootdowns.

Checking against VM_MIN_KERNEL_ADDRESS seems sketchy; it eliminates the
chance to use a much-less-expensive bcopy in cases where the sync is
happening in correct process context.

Context-switching during bus_dmamap_sync() shouldn't be an issue.  In a
filter interrupt, curproc will be completely arbitrary but none of this
stuff should be called in a filter anyway.  Otherwise, if you're in a
kernel thread (including an ithread), curproc should be whatever proc was
supplied when the thread was created.  That's usually proc0, which only has
kernel address space.  IOW, even if a context-switch happens sometime
during bus_dmamap_sync, any pmap check or copy should have a consistent and
non-arbitrary process context.

I think something like your second solution would be workable to make
UIO_USERSPACE syncs work in non-interrupt kernel threads, but given all the
restrictions and extra cost of physcopy, I'm not sure how useful that would
be.

I do think busdma.9 could at least use a note that bus_dmamap_sync() is
only safe to call in the context of the owning process for user buffers.


On Fri, Apr 24, 2015 at 8:13 AM, Svatopluk Kraus <onwahe@gmail.com> wrote:

> DMA can be done on client buffer from user address space. For example,
> thru bus_dmamap_load_uio() when uio->uio_segflg is UIO_USERSPACE. Such
> client buffer can bounce and then, it must be copied to and from
> bounce buffer in bus_dmamap_sync().
>
> Current implementations (in all archs) do not take into account that
> bus_dmamap_sync() is asynchronous for POSTWRITE and POSTREAD in
> general. It can be asynchronous for PREWRITE and PREREAD too. For
> example, in some driver implementations where DMA client buffers
> operations are buffered. In those cases, simple bcopy() is not
> correct.
>
> Demonstration of current implementation (x86) is the following:
>
> -----------------------------
> struct bounce_page {
>     vm_offset_t vaddr;      /* kva of bounce buffer */
>     bus_addr_t  busaddr;    /* Physical address */
>     vm_offset_t datavaddr;  /* kva of client data */
>     bus_addr_t  dataaddr;   /* client physical address */
>     bus_size_t  datacount;  /* client data count */
>     STAILQ_ENTRY(bounce_page) links;
> };
>
>
> if ((op & BUS_DMASYNC_PREWRITE) != 0) {
>     while (bpage != NULL) {
>         if (bpage->datavaddr != 0) {
>             bcopy((void *)bpage->datavaddr,
>                 (void *)bpage->vaddr,
>                 bpage->datacount);
>         } else {
>             physcopyout(bpage->dataaddr,
>                 (void *)bpage->vaddr,
>                 bpage->datacount);
>         }
>         bpage = STAILQ_NEXT(bpage, links);
>     }
>     dmat->bounce_zone->total_bounced++;
> }
> -----------------------------
>
> There are two things:
>
> (1) datavaddr is not always kva of client data, but sometimes it can
> be uva of client data.
> (2) bcopy() can be used only if datavaddr is kva or when map->pmap is
> current pmap.
>
> Note that there is an implication for bus_dmamap_load_uio() with
> uio->uio_segflg set to UIO_USERSPACE that used physical pages are
> in-core and wired. See "man bus_dma".
>
> There is not public interface to check that map->pmap is current pmap.
> So one solution is the following:
>
> if (bpage->datavaddr >= VM_MIN_KERNEL_ADDRESS) {
>         bcopy();
> } else {
>         physcopy();
> }
>
> If there will be public pmap_is_current() then another solution is the
> following:
>
> if (bpage->datavaddr != 0) && pmap_is_current(map->pmap)) {
>         bcopy();
> } else {
>         physcopy();
> }
>
> The second solution implies that context switch must not happen during
> bus_dmamap_sync() called from an interrupt routine. However, IMO, it's
> granted.
>
> Note that map->pmap should be always kernel_pmap for datavaddr >=
> VM_MIN_KERNEL_ADDRESS.
>
> Comments, different solutions, or objections?
>
> Svatopluk Kraus
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM=8qan-4SbKJaddrfkv=HG3n%2BHaOPDL5MEPS9DoaTvnhrJPZQ>