Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 May 2013 13:58:44 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        arch@freebsd.org, amd64@freebsd.org
Subject:   x86 IOMMU support (DMAR)
Message-ID:  <20130527105844.GC3047@kib.kiev.ua>

next in thread | raw e-mail | index | archive | help

--QMUNogrXulKRzqi3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

For the several months, I worked (and continue the work now) on the
driver for the Intel VT-d for FreeBSD.  The VT-d is sold as the I/O
Virtualization technology, but in essence it is a DMA addresses
remapping engine, i.e. it is advanced and improved I/O MMU, as also
found on other big-iron machines, e.g. PowerPC or Sparc.  See the
Intel document titled 'Intel Virtualization Technology for Directed
I/O Architecture Specification' and chipsets datasheets for the
description of the facility.

The development was greatly facilitated by Jim Harris from Intel who
provided me the access to the Sandy and Ivy Bridge north bridge
documentation.  John Baldwin patiently educated me about newbus and
helped developing required hooks for integration with the existing
code.

The core hardware element of the VT-d is DMA remap unit, referenced as
DMAR both in the documentation and in the source code.  Besides DMA
remap, VT-d also allows to do remapping of the MSI/MSI-X interrupt
messages.  FreeBSD could utilize the functionality for the interrupt
rebalancing, instead of reprogramming msi registers of the PCI
devices, but this part is not (yet) implemented.

For the FreeBSD architecture, DMAR naturally fits as busdma engine,
making it possible to eliminate bounce page copying.  Another great
benefit of the DMAR use is the reliability and security improvements,
since DMA transfers are only allowed to the memory areas explicitely
designated by the device driver as buffers.  As noted by Jim Harris,
this security angle could find a use in the NTB driver.

The existing busdma code for x86 was split into generic interface,
kept in the busdma_machdep.c, and bouncing implementation in the
busdma_bounce.c.  The DMAR-based implementation, which calls the DMAR
core, is located in the busdma_dmar.c.  There is no KPI provided to
manage DMARs, but I plan to implement the proper interface after
discussing the needs of the bhyve.

I tried to support both i386 and amd64, but for i386 the limited KVA,
together with the busdma interface structure of never sleeping from
the driver calls, make some promises of IOMMU less strict.  For
instance, to unload the map, code needs to transiently map the DMAR
page table pages, which require sleepable allocations of sf buffers.
As result, map unload on i386 is done asynchronously in the taskqueue
context, which makes it possible for the buggy device driver or
hardware to perform the transfer to freed pages for some time after
unload.  This problem is not present for amd64 port.  For the same
reason of busdma KPI, I cannot use queued invalidation both for i386
and amd64.

At the moment the code makes the 1:1 relations between device contexts
and domains, which is fine for busdma.  To support PCI pass-through
into the virtualized machines, the relations should be changed to N:1
contexts to domains, which is planned but currently is not yet done.

Overall state of the code is that I can boot multiuser over the
network from if_igb(4) or if_bce(4), and can use ahci(4) and ata(4)
attached disks without corrupting UFS volumes.  Uhci(4) has known
issues due to too late establishment of the RMRR mappings.  Extensive
testing of the already written code is not done yet.  Plans include
- providing the external KPI for the VMM consumers
- support ATS
- making it possible to select busdma_dmar or busdma_bounce for
  individual PCI functions
- the stabilization work.
Also, by converting the ISA DMA implementation to use the busdma KPI,
it is possible to make the floppies work reliably again !

It is known that IOMMU adds overhead due to the mapping and unmapping
for each I/O.  DMAR implementations usually have some erratas, as well
as PCIe devices sometime do not completely follow the specification,
causing misbehaviour with remapping enabled.  For this reason I do not
plan to enable IOMMU by default, and intend to provide a possibility
to route individual PCI devices to the bounce busdma implementation.

http://people.freebsd.org/~kib/misc/DMAR.1.patch


--QMUNogrXulKRzqi3
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJRozxjAAoJEJDCuSvBvK1BrFgP/0g7HlYUyg27GjeBy/s7iBsr
1Q3j6gSZM89us0r7wGKAw/DtNWaggD5aahxeab2ZJkDxZIojGAnEDDsEPo42KLAc
9ez/gvLH2AMNXZGkfG90MyP6GFg312ohSwCYfrMwFrU4YG5Hi147JlHy6iUw0wrY
Pny/3iYDn/PChwsoRVq8D6wCdy1nXypGUSCc8kbViKOPAxUTDtqL+lZnvtfaOGHZ
cqbc78f9izPewPWj5xTcC+lnd8v0kql4+qPCuSZorj2ZjQEbSKsPMW5ZcureGcFV
pnwH+49zcn6Ej9deSJEK+tYdWqonTf3liYyhHv8Uu7MNVZ6swEUObgLHWjbvGkxs
O0m7sanOvDJ1gLNcmoQvzj24T5F3WExOAHnMq7AermdnpJHFu3Z05J/eCwVFNkR8
srhvl3/P5L5N+o71uV+FoLE7JDf3qwlB77QJqeuPcYp3mVwWhhWohgPNSjle4Nob
O1EsLBW7YAoN992nm48DIbG40qdrpOjkEVWc1pqL0tFOZSUc6xF9HrXNKa/jVlEo
AJLhJII/0nDrzKhrLPqY4hGZVYyiucbh41Lk5dCuPnc0BEP4x2ZKmzVnWSbXvuIx
86QpsGl9+S4p/HdZfUzU/3NZ/IbSTAwzxQdymH+L0AhTXD/rF8XQeyJ85gr19MbS
iz23xjIkWBh5jR5oPUlb
=9e72
-----END PGP SIGNATURE-----

--QMUNogrXulKRzqi3--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130527105844.GC3047>