Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Apr 2009 10:59:30 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-hackers@freebsd.org
Subject:   Re: Using bus_dma(9)
Message-ID:  <200904241059.30788.jhb@freebsd.org>
In-Reply-To: <20090423195928.GB8531@server.vk2pj.dyndns.org>
References:  <20090423195928.GB8531@server.vk2pj.dyndns.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 23 April 2009 3:59:28 pm Peter Jeremy wrote:
> I'm currently trying to port some code that uses bus_dma(9) from
> OpenBSD to FreeBSD and am having some difficulties in following the
> way bus_dma is intended to be used on FreeBSD (and how it differs from
> Net/OpenBSD).  Other than the man page and existing FreeBSD drivers, I
> am unable to locate any information on bus_dma care and feeding.  Has
> anyone written any tutorial guide to using bus_dma?
> 
> The OpenBSD man page provides pseudo-code showing the basic cycle.
> Unfortunately, FreeBSD doesn't provide any similar pseudo-code and
> the functionality is distributed somewhat differently amongst the
> functions (and the drivers I've looked at tend to use a different
> order of calls).
> 
> So far, I've hit a number of issues that I'd like some advice on:
> 
> Firstly, the OpenBSD model only provides a single DMA tag for the
> device at attach() time, whereas FreeBSD provides the parent's DMA tag
> at attach time and allows the driver to create multiple tags.  Rather
> than just creating a single tag for a device, many drivers create a
> device tag which is only used as the parent for additional tags to
> handle receive, transmit etc.  Whilst the need for multiple tags is
> probably a consequence of moving much of the dmamap information from
> OpenBSD bus_dmamap_create() into FreeBSD bus_dma_tag_create(), the
> rationale behind multiple levels of tags is unclear.  Is this solely
> to provide a single point where overall device DMA characteristics &
> limitations can be specified or is there another reason?

Many drivers provide a parent "driver" tag specifically to have a single 
point, yes.

> Secondly, bus_dma_tag_create() supports a BUS_DMA_ALLOCNOW flag that
> "pre-allocates enough resources to handle at least one map load
> operation on this tag".  However it also states "[t]his should not be
> used for tags that only describe buffers that will be allocated with
> bus_dmamem_alloc()" - does this mean that only one of bus_dmamap_load()
> or bus_dmamap_alloc() should be used on a tag/mapping?  Or is the
> sense backwards (ie "don't specify BUS_DMA_ALLOCNOW for tags that are
> only used as the parent for other tags and never mapped themselves")?
> Or is there some other explanation.

What happens usually now is that each thing you want to pre-alloc memory
for using bus_dmamem_alloc() (such as descriptor rings) uses its own tag.
This is somewhat mandated by the fact that bus_dmamem_alloc() doesn't take
a size but gets the size to allocate from the tag.  So usually a NIC driver
will have 3 tags: 1 for the RX ring, 1 for packet data, and 1 for the TX
ring.  Some drivers have 2 tags for packet data, 1 for TX buffers and 1
for RX buffers.
 
> Thirdly, bus_dmamap_load() has a uses a callback function to return
> the actual mapping details.  According to the man page, there is no
> way to ensure that the callback occurs synchronously - a caller can
> only request that bus_dmamap_load() fail if resources are not
> immediately available.  Despite this, many drivers pass 0 for flags
> (allowing an asynchronous invocation of the callback) and then fail
> (and cleanup) if bus_dmamap_load() returns EINPROGRESS.  This appears
> to open a race condition where the callback and cleanup could occur
> simultaneously.  Mitigating the race condition seems to rely on one of
> the following two behaviours:
> 
> a) The system is implicitly single-threaded when bus_dmamap_load() is
> called (generally as part of the device attach() function).  Whilst
> this is true at boot time, it would not be true for a dynamically
> loaded module.
> 
> b) Passing BUS_DMA_ALLOCNOW to bus_dma_tag_create() guarantees that
> the first bus_dmamap_load() on that tag will be synchronous.  Is this
> true?  Whilst it appears to be implied, it's not explicitly stated.

That doesn't really guarantee that either as the pool of bounce pages can be 
shared across multiple tags.  I think what you might be missing is this:

c) bus_dmamap_load() of a map returned from bus_dmamem_alloc() will always 
succeed synchronously.

That is the only case other than BUS_DMA_NOWAIT where one can assume 
synchronous calls to the callback.  Also, some bus_dma calls basically 
assumes BUS_DMA_NOWAIT such as bus_dmamap_load_mbuf() and 
bus_dmamap_load_mbuf_sg().

> Finally, what are the ordering requirements between the alloc, create,
> load and sync functions?  OpenBSD implies that the normal ordering is
> create, alloc, load, sync whilst several FreeBSD drivers use
> tag_create, alloc, load and then create.

FreeBSD uses the same ordering as OpenBSD.  I think you might be confused by 
the bus_dmamem_alloc() case.  There are basically two cases, the first is 
preallocating a block of RAM to use for a descriptor or command ring:

alloc_ring:
	bus_dma_tag_create(..., &ring_tag);

	/* Creates a map internally. */
	bus_dmamem_alloc(ring_tag, &p, ..., &ring_map);

	/* Will not fail with EINPROGRESS. */
	bus_dmamap_load(ring_tag, ring_map, p, ...);


free_ring:
	bus_dmamap_unload(ring_tag, ring_map);

	bus_dmamem_free(ring_tag, p, ring_map);

	bus_dma_tag_destroy(ring_tag);

The second case is when you want to handle data transfer requests (queue a 
packet or disk I/O request, etc.).  For this the typical model in FreeBSD is 
to create a single tag and then pre-create a map for each descriptor or 
command:

setup_data_maps:
	bus_dma_tag_create(..., &tag);

	for (i = 0; i < NUM_RXD; i++)
		bus_dmamap_create(tag, ..., &rxdata[i].map);

	for (i = 0; i < NUM_TXD; i++)
		bus_dmamap_create(tag, ..., &txdata[i].map);

queue_a_rx_buffer:
	i = index_of_first_free_RX_descriptor;
	m = m_getcl(...);
	rxdata[i].mbuf = m;
	bus_dmamap_load_mbuf_sg(tag, rxdata[i].map, m, ...);
	/* populate s/g list in i'th RX descriptor ring */
	bus_dmamap_sync(rx_ring_tag, rx_ring_map, ...);

dequeue_an_rx_buffer_on_rx_completion:
	i = index_of_completed_receive_descriptor;
	bus_dmamap_sync(tag, rxdata[i].map, ...);
	bus_dmamap_unload(tag, rxdata[i].map);
	m = rxdata[i].mbuf;
	rxdata[i].mbuf = NULL;

	ifp->if_input(m);

free_data_maps:
	for (i = 0; i < NUM_RXD; i++)
		bus_dmamap_destroy(tag, ..., rxdata[i].map);

	for (i = 0; i < NUM_TXD; i++_
		bus_dmamap_destroy(tag, ..., txdata[i].map);

	bus_dma_tag_destroy(tag);

In a typical NIC driver you will probably be doing alloc_ring and 
setup_data_maps at the same time during your attach routine.  Similarly for 
free_ring and free_data_maps during detach.

> As a side-note, the manpage does not document the behaviour when
> bus_dmamap_destroy() or bus_dma_tag_destroy() are called whilst a
> bus_dmamap_load() callback is queued.  Is the callback cancelled
> or do one or both destroy operations fail?

Looking at amd64, if a tag has created maps it will fail with EBUSY on HEAD 
(this may not be in 7.x yet).  If a map is destroyed that has bounce buffers 
in use it will fail with EBUSY as well.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200904241059.30788.jhb>