From owner-freebsd-hackers@FreeBSD.ORG Fri Apr 24 17:11:40 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B5A81065670 for ; Fri, 24 Apr 2009 17:11:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3E26C8FC1D for ; Fri, 24 Apr 2009 17:11:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id D527646B2E; Fri, 24 Apr 2009 13:11:39 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 82E148A01D; Fri, 24 Apr 2009 13:11:38 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Fri, 24 Apr 2009 10:59:30 -0400 User-Agent: KMail/1.9.7 References: <20090423195928.GB8531@server.vk2pj.dyndns.org> In-Reply-To: <20090423195928.GB8531@server.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200904241059.30788.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 24 Apr 2009 13:11:38 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=0.2 required=4.2 tests=AWL,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Subject: Re: Using bus_dma(9) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Apr 2009 17:11:40 -0000 On Thursday 23 April 2009 3:59:28 pm Peter Jeremy wrote: > I'm currently trying to port some code that uses bus_dma(9) from > OpenBSD to FreeBSD and am having some difficulties in following the > way bus_dma is intended to be used on FreeBSD (and how it differs from > Net/OpenBSD). Other than the man page and existing FreeBSD drivers, I > am unable to locate any information on bus_dma care and feeding. Has > anyone written any tutorial guide to using bus_dma? > > The OpenBSD man page provides pseudo-code showing the basic cycle. > Unfortunately, FreeBSD doesn't provide any similar pseudo-code and > the functionality is distributed somewhat differently amongst the > functions (and the drivers I've looked at tend to use a different > order of calls). > > So far, I've hit a number of issues that I'd like some advice on: > > Firstly, the OpenBSD model only provides a single DMA tag for the > device at attach() time, whereas FreeBSD provides the parent's DMA tag > at attach time and allows the driver to create multiple tags. Rather > than just creating a single tag for a device, many drivers create a > device tag which is only used as the parent for additional tags to > handle receive, transmit etc. Whilst the need for multiple tags is > probably a consequence of moving much of the dmamap information from > OpenBSD bus_dmamap_create() into FreeBSD bus_dma_tag_create(), the > rationale behind multiple levels of tags is unclear. Is this solely > to provide a single point where overall device DMA characteristics & > limitations can be specified or is there another reason? Many drivers provide a parent "driver" tag specifically to have a single point, yes. > Secondly, bus_dma_tag_create() supports a BUS_DMA_ALLOCNOW flag that > "pre-allocates enough resources to handle at least one map load > operation on this tag". However it also states "[t]his should not be > used for tags that only describe buffers that will be allocated with > bus_dmamem_alloc()" - does this mean that only one of bus_dmamap_load() > or bus_dmamap_alloc() should be used on a tag/mapping? Or is the > sense backwards (ie "don't specify BUS_DMA_ALLOCNOW for tags that are > only used as the parent for other tags and never mapped themselves")? > Or is there some other explanation. What happens usually now is that each thing you want to pre-alloc memory for using bus_dmamem_alloc() (such as descriptor rings) uses its own tag. This is somewhat mandated by the fact that bus_dmamem_alloc() doesn't take a size but gets the size to allocate from the tag. So usually a NIC driver will have 3 tags: 1 for the RX ring, 1 for packet data, and 1 for the TX ring. Some drivers have 2 tags for packet data, 1 for TX buffers and 1 for RX buffers. > Thirdly, bus_dmamap_load() has a uses a callback function to return > the actual mapping details. According to the man page, there is no > way to ensure that the callback occurs synchronously - a caller can > only request that bus_dmamap_load() fail if resources are not > immediately available. Despite this, many drivers pass 0 for flags > (allowing an asynchronous invocation of the callback) and then fail > (and cleanup) if bus_dmamap_load() returns EINPROGRESS. This appears > to open a race condition where the callback and cleanup could occur > simultaneously. Mitigating the race condition seems to rely on one of > the following two behaviours: > > a) The system is implicitly single-threaded when bus_dmamap_load() is > called (generally as part of the device attach() function). Whilst > this is true at boot time, it would not be true for a dynamically > loaded module. > > b) Passing BUS_DMA_ALLOCNOW to bus_dma_tag_create() guarantees that > the first bus_dmamap_load() on that tag will be synchronous. Is this > true? Whilst it appears to be implied, it's not explicitly stated. That doesn't really guarantee that either as the pool of bounce pages can be shared across multiple tags. I think what you might be missing is this: c) bus_dmamap_load() of a map returned from bus_dmamem_alloc() will always succeed synchronously. That is the only case other than BUS_DMA_NOWAIT where one can assume synchronous calls to the callback. Also, some bus_dma calls basically assumes BUS_DMA_NOWAIT such as bus_dmamap_load_mbuf() and bus_dmamap_load_mbuf_sg(). > Finally, what are the ordering requirements between the alloc, create, > load and sync functions? OpenBSD implies that the normal ordering is > create, alloc, load, sync whilst several FreeBSD drivers use > tag_create, alloc, load and then create. FreeBSD uses the same ordering as OpenBSD. I think you might be confused by the bus_dmamem_alloc() case. There are basically two cases, the first is preallocating a block of RAM to use for a descriptor or command ring: alloc_ring: bus_dma_tag_create(..., &ring_tag); /* Creates a map internally. */ bus_dmamem_alloc(ring_tag, &p, ..., &ring_map); /* Will not fail with EINPROGRESS. */ bus_dmamap_load(ring_tag, ring_map, p, ...); free_ring: bus_dmamap_unload(ring_tag, ring_map); bus_dmamem_free(ring_tag, p, ring_map); bus_dma_tag_destroy(ring_tag); The second case is when you want to handle data transfer requests (queue a packet or disk I/O request, etc.). For this the typical model in FreeBSD is to create a single tag and then pre-create a map for each descriptor or command: setup_data_maps: bus_dma_tag_create(..., &tag); for (i = 0; i < NUM_RXD; i++) bus_dmamap_create(tag, ..., &rxdata[i].map); for (i = 0; i < NUM_TXD; i++) bus_dmamap_create(tag, ..., &txdata[i].map); queue_a_rx_buffer: i = index_of_first_free_RX_descriptor; m = m_getcl(...); rxdata[i].mbuf = m; bus_dmamap_load_mbuf_sg(tag, rxdata[i].map, m, ...); /* populate s/g list in i'th RX descriptor ring */ bus_dmamap_sync(rx_ring_tag, rx_ring_map, ...); dequeue_an_rx_buffer_on_rx_completion: i = index_of_completed_receive_descriptor; bus_dmamap_sync(tag, rxdata[i].map, ...); bus_dmamap_unload(tag, rxdata[i].map); m = rxdata[i].mbuf; rxdata[i].mbuf = NULL; ifp->if_input(m); free_data_maps: for (i = 0; i < NUM_RXD; i++) bus_dmamap_destroy(tag, ..., rxdata[i].map); for (i = 0; i < NUM_TXD; i++_ bus_dmamap_destroy(tag, ..., txdata[i].map); bus_dma_tag_destroy(tag); In a typical NIC driver you will probably be doing alloc_ring and setup_data_maps at the same time during your attach routine. Similarly for free_ring and free_data_maps during detach. > As a side-note, the manpage does not document the behaviour when > bus_dmamap_destroy() or bus_dma_tag_destroy() are called whilst a > bus_dmamap_load() callback is queued. Is the callback cancelled > or do one or both destroy operations fail? Looking at amd64, if a tag has created maps it will fail with EBUSY on HEAD (this may not be in 7.x yet). If a map is destroyed that has bounce buffers in use it will fail with EBUSY as well. -- John Baldwin