From owner-freebsd-arch@FreeBSD.ORG Sat Apr 25 14:01:04 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7C83CA8D for ; Sat, 25 Apr 2015 14:01:04 +0000 (UTC) Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com [IPv6:2607:f8b0:4003:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3C7D011EF for ; Sat, 25 Apr 2015 14:01:04 +0000 (UTC) Received: by oiko83 with SMTP id o83so60560485oik.1 for ; Sat, 25 Apr 2015 07:01:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=rYOl//E1cbPFP0QYFLbhatfqNckX0MMueJe4B5ZFGVU=; b=KMIw2gM1LUPI072xwd81W9vAL5/xAHTegdU0sAN65lo1CFK85XKqsoCb7WoO/Byw0R AH4WOBCSxZxHx0wBWht2Ayq/Jjdr82c9hlWTgbzGrQkITKJmbwk+cW8uqEvbmfghEQcN KWhaGatrm2S/Clq957NDU9TY01RR9iEX/09jN31aMJNgLhNs7Q/97p0IZvZGqWMIPn1d D0fvUXV4Hv2YEL9KRT6sC9/4JK83Olm9XLG4eU56WTdGmGJIdUMiyW9YPDN6Q0uWfJ6L Re0hUN5KelXzu8IOrjXEZyW41VQKzwDBCd68rgzn74lb+ml/Obin8Vxj2k32Dtdw4O59 3mdQ== X-Received: by 10.60.148.225 with SMTP id tv1mr2893115oeb.14.1429970463468; Sat, 25 Apr 2015 07:01:03 -0700 (PDT) Received: from corona.austin.rr.com (cpe-72-177-6-10.austin.res.rr.com. [72.177.6.10]) by mx.google.com with ESMTPSA id z133sm8230852oif.14.2015.04.25.07.01.01 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 25 Apr 2015 07:01:02 -0700 (PDT) Message-ID: <553B9E64.8030907@gmail.com> Date: Sat, 25 Apr 2015 09:02:12 -0500 From: Jason Harmening User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Konstantin Belousov CC: Svatopluk Kraus , FreeBSD Arch Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space References: <20150425094152.GE2390@kib.kiev.ua> In-Reply-To: <20150425094152.GE2390@kib.kiev.ua> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 14:01:04 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 04/25/15 04:41, Konstantin Belousov wrote: > On Fri, Apr 24, 2015 at 05:50:15PM -0500, Jason Harmening wrote: >> A couple of comments: >> >> --POSTWRITE and POSTREAD are only asynchronous if you call them from a= n >> asynchronous context. >> For a driver that's already performing DMA operations on usermode memo= ry, >> it seems likely that there's going to be *some* place where you can ca= ll >> bus_dmamap_sync() and be guaranteed to be executing in the context of = the >> process that owns the memory. Then a regular bcopy will be safe and >> inexpensive, assuming the pages have been properly vslock-ed/vm_map_wi= re-d. >> That's usually whatever read/write/ioctl operation spawned the DMA tra= nsfer >> in the first place. So, in those cases can you not just move the >> POSTREAD/POSTWRITE sync from the "DMA-finished" interrupt to the >> d_read/d_write/d_ioctl that waits on the "DMA-finished" interrupt? >> >> --physcopyin/physcopyout aren't trivial. They go through uiomove_from= phys, >> which often uses sfbufs to create temporary KVA mappings for the physi= cal >> pages. sf_buf_alloc() can sleep (unless SFB_NOWAIT is specified, whic= h >> means it can fail and which uiomove_fromphys does not specify for good= >> reason); that makes it unsafe for use in either a threaded interrupt o= r a >> filter. Perhaps the physcopyout path could be changed to use pmap_qen= ter >> directly in this case, but that can still be expensive in terms of TLB= >> shootdowns. >> >> Checking against VM_MIN_KERNEL_ADDRESS seems sketchy; it eliminates th= e >> chance to use a much-less-expensive bcopy in cases where the sync is >> happening in correct process context. >> >> Context-switching during bus_dmamap_sync() shouldn't be an issue. In = a >> filter interrupt, curproc will be completely arbitrary but none of thi= s >> stuff should be called in a filter anyway. Otherwise, if you're in a >> kernel thread (including an ithread), curproc should be whatever proc = was >> supplied when the thread was created. That's usually proc0, which onl= y has >> kernel address space. IOW, even if a context-switch happens sometime >> during bus_dmamap_sync, any pmap check or copy should have a consisten= t and >> non-arbitrary process context. >> >> I think something like your second solution would be workable to make >> UIO_USERSPACE syncs work in non-interrupt kernel threads, but given al= l the >> restrictions and extra cost of physcopy, I'm not sure how useful that = would >> be. >> >> I do think busdma.9 could at least use a note that bus_dmamap_sync() i= s >> only safe to call in the context of the owning process for user buffer= s. > UIO_USERSPACE for busdma is absolutely unsafe and cannot be used withou= t > making kernel panicing. Even if you wire the userspace bufer, there is= > nothing which could prevent other thread in the user process, or other > process sharing the same address space, to call munmap(2) on the range.= > > The only safe method to work with the userspace regions is to > vm_fault_quick_hold() them to get hold on the pages, and then either > pass pages array down, or remap them in the KVA with pmap_qenter(). I was under the impression that any attempt to free or munmap the virtual range would block waiting for the underlying pages to be unwired.= That's certainly the behavior I've seen with free; is it not the case with munmap? Either way, I haven't walked the code to see where the blocking is implemented. If UIO_USERSPACE truly is unsafe to use with busdma, then we need to either make it safe or stop documenting it in the manpage. Perhaps the bounce buffer logic could use copyin/copyout for userspace, if in the right process? Or just always use physcopy for non-KVA as in the first suggestion? It seems like in general it is too hard for drivers using busdma to deal with usermode memory in a way that's both safe and efficient: --bus_dmamap_load_uio + UIO_USERSPACE is apparently really unsafe --if they do things the other way and allocate in the kernel, then then they better either be willing to do extra copying, or create and refcount their own vm_objects and use d_mmap_single (I still haven't seen a good example of that), or leak a bunch of memory (if they use d_mmap), because the old device pager is also really unsafe. > >> >> On Fri, Apr 24, 2015 at 8:13 AM, Svatopluk Kraus wr= ote: >> >>> DMA can be done on client buffer from user address space. For example= , >>> thru bus_dmamap_load_uio() when uio->uio_segflg is UIO_USERSPACE. Suc= h >>> client buffer can bounce and then, it must be copied to and from >>> bounce buffer in bus_dmamap_sync(). >>> >>> Current implementations (in all archs) do not take into account that >>> bus_dmamap_sync() is asynchronous for POSTWRITE and POSTREAD in >>> general. It can be asynchronous for PREWRITE and PREREAD too. For >>> example, in some driver implementations where DMA client buffers >>> operations are buffered. In those cases, simple bcopy() is not >>> correct. >>> >>> Demonstration of current implementation (x86) is the following: >>> >>> ----------------------------- >>> struct bounce_page { >>> vm_offset_t vaddr; /* kva of bounce buffer */ >>> bus_addr_t busaddr; /* Physical address */ >>> vm_offset_t datavaddr; /* kva of client data */ >>> bus_addr_t dataaddr; /* client physical address */ >>> bus_size_t datacount; /* client data count */ >>> STAILQ_ENTRY(bounce_page) links; >>> }; >>> >>> >>> if ((op & BUS_DMASYNC_PREWRITE) !=3D 0) { >>> while (bpage !=3D NULL) { >>> if (bpage->datavaddr !=3D 0) { >>> bcopy((void *)bpage->datavaddr, >>> (void *)bpage->vaddr, >>> bpage->datacount); >>> } else { >>> physcopyout(bpage->dataaddr, >>> (void *)bpage->vaddr, >>> bpage->datacount); >>> } >>> bpage =3D STAILQ_NEXT(bpage, links); >>> } >>> dmat->bounce_zone->total_bounced++; >>> } >>> ----------------------------- >>> >>> There are two things: >>> >>> (1) datavaddr is not always kva of client data, but sometimes it can >>> be uva of client data. >>> (2) bcopy() can be used only if datavaddr is kva or when map->pmap is= >>> current pmap. >>> >>> Note that there is an implication for bus_dmamap_load_uio() with >>> uio->uio_segflg set to UIO_USERSPACE that used physical pages are >>> in-core and wired. See "man bus_dma". >>> >>> There is not public interface to check that map->pmap is current pmap= =2E >>> So one solution is the following: >>> >>> if (bpage->datavaddr >=3D VM_MIN_KERNEL_ADDRESS) { >>> bcopy(); >>> } else { >>> physcopy(); >>> } >>> >>> If there will be public pmap_is_current() then another solution is th= e >>> following: >>> >>> if (bpage->datavaddr !=3D 0) && pmap_is_current(map->pmap)) { >>> bcopy(); >>> } else { >>> physcopy(); >>> } >>> >>> The second solution implies that context switch must not happen durin= g >>> bus_dmamap_sync() called from an interrupt routine. However, IMO, it'= s >>> granted. >>> >>> Note that map->pmap should be always kernel_pmap for datavaddr >=3D >>> VM_MIN_KERNEL_ADDRESS. >>> >>> Comments, different solutions, or objections? >>> >>> Svatopluk Kraus >>> _______________________________________________ >>> freebsd-arch@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.or= g" >>> >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org= " --eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJVO55kXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw MDAwMDAwMDAwMDAwMDAwAAoJELufi/mShB0bH3AH/iynIZttCSfpAjLURApDguQB rss1/oNNIfDpigv4BC2AYLs144GyIfn8k8mAMzpiNQK1p+6cI3e5YC5Luo4KEyeB aenxiu6TuvEEPRVVsJaDjAt6J0ZTM4KQpiwokE4hHlib/hT2YQ9gQKVjYMPKHDQI D0jc0lEQbbDBZmLRMeFzk9gVy8/YmJTt6dGxkS0mttZrjuni05lELdPgIb250KZW rvs1irjEcztCbMUj6pGoaenhD/8arlSehxjKOmEP3HhYx7L6z/S+vOz5/rPXxYd/ 6/qpJwmuX14as0qUabcOHBl0Yd1nww0iUSwHlQhD2/0cTLN5NrbhIbIKkQIBbq0= =4s0y -----END PGP SIGNATURE----- --eiGxCA8j3t9W15b4RvfXexTIWMNHp0Eiw--