From owner-freebsd-mips@FreeBSD.ORG Mon Dec 24 00:23:02 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9B93CC2 for ; Mon, 24 Dec 2012 00:23:02 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-da0-f51.google.com (mail-da0-f51.google.com [209.85.210.51]) by mx1.freebsd.org (Postfix) with ESMTP id 5A5C28FC18 for ; Mon, 24 Dec 2012 00:23:02 +0000 (UTC) Received: by mail-da0-f51.google.com with SMTP id i30so2924781dad.10 for ; Sun, 23 Dec 2012 16:22:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:x-x-sender:to:cc:subject:in-reply-to :message-id:references:user-agent:mime-version:content-type :x-gm-message-state; bh=MBCwf05DwGqYCM45Cih6/F2x9xScFQ0qDCgMUHgMC8M=; b=Ui++mzk+8/scwOWLWmYzm1tzbtv83kWghiQXfFxWO49t19tVh/TMe4PJGWcBvhyxvy zMu3DFXYata7Tl9SVw9PslMCz5pVLeacPRa41RkHWAywmA0c9gpPGa1oQ1rGtGNomZcG ez10J8g+SX3nkntVRa7QdgQNGBwJ3paaAPX9n8nqnZBOP40nUfTUT+GYeGUUKwuJJpIv d+pCgdrstNDpddtsq3U3do3z2WeCQDQ2IiIAc4YBT7tQ40ex9GuKQXqXTRCs8VM8KLne JMv9tzU2YGOXudCiAee0xWwLV9qOPCDFCaY810AHmm4xH/HiuBKr8GHS9TqvJ+BOPtmv ShPw== X-Received: by 10.66.83.136 with SMTP id q8mr57987918pay.83.1356308576649; Sun, 23 Dec 2012 16:22:56 -0800 (PST) Received: from rrcs-66-91-135-210.west.biz.rr.com (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210]) by mx.google.com with ESMTPS id pj1sm11235205pbb.71.2012.12.23.16.22.53 (version=SSLv3 cipher=OTHER); Sun, 23 Dec 2012 16:22:55 -0800 (PST) Date: Sun, 23 Dec 2012 14:22:51 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Ian Lepore Subject: Re: Call for testing and review, busdma changes In-Reply-To: <1355085250.87661.345.camel@revolution.hippie.lan> Message-ID: References: <1355077061.87661.320.camel@revolution.hippie.lan> <1355085250.87661.345.camel@revolution.hippie.lan> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Gm-Message-State: ALoCoQkr5tJJLwfUY9HOl1+rxSJnmDk2FAh76vrhSc1Kwy2CAjeo9we10zcy1JeCNXNDvx6wxt1S X-Mailman-Approved-At: Mon, 24 Dec 2012 00:38:46 +0000 Cc: powerpc@freebsd.org, marcel@freebsd.org, mips@freebsd.org, John Baldwin , mav@freebsd.org, scottl@freebsd.org, attilio@freebsd.org, kib@freebsd.org, sparc64@freebsd.org, arm@freebsd.org X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2012 00:23:02 -0000 On Sun, 9 Dec 2012, Ian Lepore wrote: > On Sun, 2012-12-09 at 08:48 -1000, Jeff Roberson wrote: >> >> On Sun, 9 Dec 2012, Ian Lepore wrote: >> >>> On Sat, 2012-12-08 at 08:51 -1000, Jeff Roberson wrote: >>>> Hello, >>>> >>>> http://people.freebsd.org/~jeff/physbio.diff >>>> >>>> I have a relative large patch that reforms the busdma API so that new >>>> types may be added without modifying every architecture's >>>> busdma_machdep.c. It does this by unifying the bus_dmamap_load_buffer() >>>> routines so that they may be called from MI code. The MD busdma is then >>>> given a chance to do any final processing in the complete() callback. >>>> This patch also contains cam changes to unify the bus_dmamap_load* >>>> handling in cam drivers. >>>> >>>> The arm and mips implementations saw the largest changes since they have >>>> to track virtual addresses for sync(). Previously this was done in a type >>>> specific way. Now it is done in a generic way by recording the list of >>>> virtuals in the map. >>>> >>>> I have verified that this patch passes make universe which includes >>>> several kernel builds from each architecture. I suspect that if I broke >>>> anything your machine simply won't boot or will throw I/O errors. There >>>> is little subtlety, it is mostly refactoring. >>>> > > First an FYI, my replies to this thread aren't making it to the mailing > lists, except when quoted by someone replying to them. They're being > held for moderator approval because they're going to too many addresses. > >>> >>> Testing on armv4/v5 platforms, the patches applied and compiled cleanly, >>> but fault-abort with a NULL deref on what appears to be the first call >>> to bus_dmamap_load(). I'll dig deeper into debugging that after >>> browsing the new code so I can figure out where to start debugging. >> >> Can you give me the file and line of the fault? >> > > Better than that, I can give you patches (attached). There were two > little glitches... the allocated slist wasn't getting attached to the > map, and when building the slist in _bus_dmamap_load_buffer() the slist > needs to participate in the coalescing logic near the end of the loop. I understand the first problem. The second, I'm not sure about. When we're going through bounce pages we use virtual to virtual. Why do you need to flush the cache lines for the source addresses? The device will only do io to the bounce pages so they are the only that need other synchronization. > > I'm not positive the attached quick-fix is perfect, but it allows one of > my arm units here to boot and run and pass some basic IO smoke tests. > I'll be trying it on other arm platforms later today. > >>> >>> Just from an initial quick browsing of the new code for armv4, I notice >>> these things... >>> >>> The new routine __bus_dmamap_mayblock() does (*flags) |= BUS_DMA_WAITOK; >>> which is a no-op because BUS_DMA_WAITOK is zero. I'm not sure what the >>> intent was, but I think any modification of the WAIT-related flags would >>> be conceptually wrong because it overrides the caller's instructions on >>> whether or not to do a deferred load when resources aren't available. >> >> The intent of the original load function seems to be that it is always >> blocking. Most architectures force the flag. The mbuf and uio load >> routines don't support blocking at all because the callback lacks the >> information needed to restart the operation. >> > > The manpage states that bus_dmamap_load() never blocks for any reason. > The mbuf and uio flavors say they are variations of bus_dmapmap_load(); > I take the implication of that to mean that they also are defined as > never blocking. > > The docs then say that WAITOK vs. NOWAIT control the scheduling of a > deferred load, and that NOWAIT is forced in the mbuf and uio cases. > Your refactored code already handles that by adding NOWAIT to the > caller-specified flags for mbufs and uio. > > So the code will do the right thing now (with your patches), but only > because (*flags) |= BUS_DMA_WAITOK is a no-op. This code is problematic. We can only support WAITOK operations for bus_dmamap_load() because the callbacks from the swi when bounce pages are available can only remember one pointer. To fix this for real we'd have to remember the original type and call the appropriate load function again. This could be solved by having a { type, pointer, length } structure that is passed between the front and back-end. This would solve your mbuf problem as well. I think I will do this next. By the way I have an mbuf branch that pushes data and mbuf structure into separate cachelines. It's not only faster on arm and the like but on x86 as well. So someone should pursue this and you'd have way fewer partial line flushes. > >>> >>> The way you've refactored the mbuf-related load routines, the MD code is >>> no longer able to tell that it's an mbuf operation. This comes back to >>> what I said a few days ago... when you look at the current code on >>> various platforms you see a lot of false similarities. The code isn't >>> nearly identical because it really all needs to be the same, it's >>> because it was all copied from the same source, and it doesn't currently >>> work correctly on arm and mips. With your changes, the ability to >>> correct the implementation on those platforms is now gone. Perhaps this >>> can be fixed easily by defining some new flags that the MI routines can >>> OR-in to give the MD routines more info on what's happening. (Correcting >>> the mbuf sync handling on arm and mips requires that we know at sync >>> time that the buffers involved are mbufs, which means we need to know at >>> map-load time so we can set a flag or a type code or something in the >>> map that we can examine at sync time.) >> >> Why do you need to know it is an mbuf if you have a record of the virtual >> addresses? The only type specific information was used to find the >> addresses. See how arm's busdma_machdep-v6 ws implemented. >> > > Because there is a special rule for mbufs on VIVT-cache architectures: > the address of the data buffer is not aligned to a cacheline boundary, > but the sync code is allowed to treat the buffer as if it is aligned and > thus avoid invoking the partial cacheline flush algorithm. That > algorithm was shown a few months ago to be fatally flawed, and we're on > a path (albeit in super-slow-motion) to eliminate it. You can skip the sync because you know the information in the mbuf header is not necessary? > > I have patches to fix the mbuf partial sync and other things related to > partial sync problems, and my next step will be to try to rework them to > fit in with what you've done. Based on what I've seen so far in your > refactoring, I think just passing a flag to say it's an mbuf, from the > MI to the MD mapping routine, will provide enough info. I think I described a better approach above that will solve mutliple problems. Let me know what you think. Jeff > >>> >>>> The next step is to allow for dma loading of physical addresses. This >>>> will permit unmapped I/O. Which is a significant performance optimization >>>> targeted for 10.0. >>> >>> This sounds scary for arm and mips, or more specifically for VIVT cache >>> platforms that can only do a sync based on virtual addresses. Can you >>> talk some more about the plans for this? >> >> It will be for addresses which were never mapped into kva. To support >> unmaapped io. There will only be a need for bounce pages, no syncing. >> > > Can the pages be mapped into any non-kernel vmspaces? If they aren't > mapped into any vmspace at all, then I think there'll be no implications > for VIVT cache architectures. If they appear in any pmap at all then > there may be problems. > > -- Ian > > From owner-freebsd-mips@FreeBSD.ORG Mon Dec 24 11:06:46 2012 Return-Path: Delivered-To: freebsd-mips@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 094C1674 for ; Mon, 24 Dec 2012 11:06:46 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id E0A248FC0A for ; Mon, 24 Dec 2012 11:06:45 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id qBOB6jef066091 for ; Mon, 24 Dec 2012 11:06:45 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id qBOB6jMU066089 for freebsd-mips@FreeBSD.org; Mon, 24 Dec 2012 11:06:45 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 24 Dec 2012 11:06:45 GMT Message-Id: <201212241106.qBOB6jMU066089@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-mips@FreeBSD.org Subject: Current problem reports assigned to freebsd-mips@FreeBSD.org X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2012 11:06:46 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/165951 mips [ar913x] [ath] DDR flush isn't being done for the WMAC p kern/163670 mips [mips][arge] arge can't allocate ring buffer on multip 2 problems total. From owner-freebsd-mips@FreeBSD.ORG Mon Dec 24 20:43:14 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 742BA1A7; Mon, 24 Dec 2012 20:43:14 +0000 (UTC) (envelope-from freebsd@damnhippie.dyndns.org) Received: from duck.symmetricom.us (duck.symmetricom.us [206.168.13.214]) by mx1.freebsd.org (Postfix) with ESMTP id 08F738FC0C; Mon, 24 Dec 2012 20:43:07 +0000 (UTC) Received: from damnhippie.dyndns.org (daffy.symmetricom.us [206.168.13.218]) by duck.symmetricom.us (8.14.5/8.14.5) with ESMTP id qBOKh0jZ083430; Mon, 24 Dec 2012 13:43:00 -0700 (MST) (envelope-from freebsd@damnhippie.dyndns.org) Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id qBOKgt31071102; Mon, 24 Dec 2012 13:42:55 -0700 (MST) (envelope-from freebsd@damnhippie.dyndns.org) Subject: Re: Call for testing and review, busdma changes From: Ian Lepore To: Jeff Roberson In-Reply-To: References: <1355077061.87661.320.camel@revolution.hippie.lan> <1355085250.87661.345.camel@revolution.hippie.lan> Content-Type: text/plain; charset="us-ascii" Date: Mon, 24 Dec 2012 13:42:55 -0700 Message-ID: <1356381775.1129.181.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Mon, 24 Dec 2012 21:21:06 +0000 Cc: powerpc@freebsd.org, marcel@freebsd.org, mips@freebsd.org, John Baldwin , mav@freebsd.org, scottl@freebsd.org, attilio@freebsd.org, kib@freebsd.org, sparc64@freebsd.org, arm@freebsd.org X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2012 20:43:14 -0000 On Sun, 2012-12-23 at 14:22 -1000, Jeff Roberson wrote: > On Sun, 9 Dec 2012, Ian Lepore wrote: > > > On Sun, 2012-12-09 at 08:48 -1000, Jeff Roberson wrote: > >> > >> On Sun, 9 Dec 2012, Ian Lepore wrote: > >> > >>> On Sat, 2012-12-08 at 08:51 -1000, Jeff Roberson wrote: > >>>> Hello, > >>>> > >>>> http://people.freebsd.org/~jeff/physbio.diff > >>>> [...] > >>> > >>> Testing on armv4/v5 platforms, the patches applied and compiled cleanly, > >>> but fault-abort with a NULL deref on what appears to be the first call > >>> to bus_dmamap_load(). I'll dig deeper into debugging that after > >>> browsing the new code so I can figure out where to start debugging. > >> > >> Can you give me the file and line of the fault? > >> > > > > Better than that, I can give you patches (attached). There were two > > little glitches... the allocated slist wasn't getting attached to the > > map, and when building the slist in _bus_dmamap_load_buffer() the slist > > needs to participate in the coalescing logic near the end of the loop. > > I understand the first problem. The second, I'm not sure about. When > we're going through bounce pages we use virtual to virtual. Why do you > need to flush the cache lines for the source addresses? The device will > only do io to the bounce pages so they are the only that need other > synchronization. > As I indicated in my following paragraph, I didn't think my fix for constructing the sync list was perfect, it was just good enough to allow more testing to progress. In particular I don't think it would be right if bouncing were required, but it never is on the arm platforms I have to test with. The logic in your patch didn't work in the normal non-bounce case. The main loop that does the mapping works on a single page at a time. For each page it decides whether to use a bounce page instead, then it either merges the page into the current segment if it can, or starts a new segment. Your patch has it create a new sync list entry for every non-bounced page, but the sync list is sized to maxsegs not maxpages. So your logic writes outside the bounds of the sync list array, and my logic includes the virtual ranges of pages that will be bounced in the sync list (which I think should be harmless anyway, just a bit inefficient). The logic is wrong in both cases, but since bouncing never happens on my hardware I knew my tweak would let me do more testing. In the past, I've never paid close attention to the bounce logic, I've tended to read around it as if it weren't there. Recently I've begun to look at it and the more I look the more I think that to the degree that it works at all, it works by accident. I think mostly the logic just never gets used on the arm platforms that people are working with, because there'd be more problems reported if it were. For example... there is no handling of the PREREAD case when syncing bounce pages. A POSTREAD invalidate is insufficient; if any of the cache lines are dirty at the time the DMA starts, there are several ways those lines can get written back to physical memory while the DMA is in progress, corrupting the data in the IO buffer. Another example... there is an unordered list of bounce pages which are substituted for real pages as needed, one page at a time. No attempt is made to ensure that a run of contiguous physical pages that need to be bounced turns into a corresponding single run of contiguous pages in the bounce zone. In other words, it seems like over time as the list becomes less ordered the bounce logic would increasingly create multi-segment DMA. The reason that seems like a problem is that on every system I've checked (arm and i386) getting to multi-user mode creates about 100 dma tags, and typically only 5 or 6 of those tags allow more than one segment. I understand that this is not "incorrect operation" of the busdma code, but it does seem to be circumstantial evidence that this logic isn't being exercised much, since it appears that few drivers can actually handle multi-segment DMA. While the work I've done so far on arm busdma has concentrated on the buffer allocation and correct operation of the sync routines in the non-bounce cases, I think I'll be looking closely at the map load (currently seems a bit inefficient) and bounce-handling logic soon. > > > > I'm not positive the attached quick-fix is perfect, but it allows one of > > my arm units here to boot and run and pass some basic IO smoke tests. > > I'll be trying it on other arm platforms later today. > > > >>> > >>> Just from an initial quick browsing of the new code for armv4, I notice > >>> these things... > >>> > >>> The new routine __bus_dmamap_mayblock() does (*flags) |= BUS_DMA_WAITOK; > >>> which is a no-op because BUS_DMA_WAITOK is zero. I'm not sure what the > >>> intent was, but I think any modification of the WAIT-related flags would > >>> be conceptually wrong because it overrides the caller's instructions on > >>> whether or not to do a deferred load when resources aren't available. > >> > >> The intent of the original load function seems to be that it is always > >> blocking. Most architectures force the flag. The mbuf and uio load > >> routines don't support blocking at all because the callback lacks the > >> information needed to restart the operation. > >> > > > > The manpage states that bus_dmamap_load() never blocks for any reason. > > The mbuf and uio flavors say they are variations of bus_dmapmap_load(); > > I take the implication of that to mean that they also are defined as > > never blocking. > > > > The docs then say that WAITOK vs. NOWAIT control the scheduling of a > > deferred load, and that NOWAIT is forced in the mbuf and uio cases. > > Your refactored code already handles that by adding NOWAIT to the > > caller-specified flags for mbufs and uio. > > > > So the code will do the right thing now (with your patches), but only > > because (*flags) |= BUS_DMA_WAITOK is a no-op. > > This code is problematic. We can only support WAITOK operations for > bus_dmamap_load() because the callbacks from the swi when bounce pages are > available can only remember one pointer. To fix this for real we'd have > to remember the original type and call the appropriate load function > again. This could be solved by having a { type, pointer, length } > structure that is passed between the front and back-end. This would solve > your mbuf problem as well. I think I will do this next. I'm lost here. I don't understand your point about only being able to remember one pointer. We only ever need to remember one pointer, because a map can only be associated with a single buffer at a time. The fact that a deferred load can't be easily implemented in the case of mbufs and uio might be why the manpage says the NOWAIT flag is implied in those cases. There's no reason why the busdma code can't handle deferred loading for mbufs and uio -- as you say, it seems to only involve storing a bit of extra info for the callback. I think the hard part of the problem is elsewhere. For a transmit mbuf that needs a deferred mapping, does the driver's interface to the network stack allow for the concept of "this call has neither suceeded nor failed, but either outcome is possible, I'll let you know later" (I have no idea what the answer to that is but I can see how it could quickly become complex for drivers and other higher layers to deal with). For a deferred uio mapping, what if the userland pmap is no longer valid when the callback happens? What if userland has freed the memory? I suspect the implied NOWAIT requirement exists to wish away the need to handle such things. My original point was that the WAITOK flag has a value of zero. Your code gives the impression that it forces all callers to accept a deferred mapping, overriding whatever flag they may have passed, but in fact it doesn't do anything at all. Hmm, I just looked at non-arm implementations more closely and I see that some of them have the same logic as your patches, they OR-in a zero in an apparent attempt to override the caller's flags. That's contrary to what the manpage says and IMO contrary to common sense -- if a caller has said that it is unable to handle a deferred mapping callback, it's unlikely that anything is going to work correctly if a deferred callback happens (in fact, it sounds like a crash or panic waiting to happen). I think the only reason the existing code hasn't caused problems is because the flag value is zero and it actually does nothing except mislead when you read it. > By the way I have an mbuf branch that pushes data and mbuf structure into > separate cachelines. It's not only faster on arm and the like but on x86 > as well. So someone should pursue this and you'd have way fewer partial > line flushes. > [...] > You can skip the [mbuf] sync because you know the information in the mbuf header > is not necessary? > Not exactly. In the general case, if the buffer isn't aligned and padded to a cacheline boundary, we don't know anything about the memory before and after the buffer which partially shares a cacheline we're about to operate on, so the existing code attempts to preserve the contents of that unknown memory across the sync operation. (It cannot possibly be reliably sucessful in doing so, but that's another story.) In the case of an mbuf, where the data area following the header is always misaligned to a cacheline, we do know something about the memory before the data buffer: we know that it's an mbuf header which will not be accessed by the cpu while the dma is in progress. Likewise if the length isn't a multiple of the line size we know the data after the mapped length is still part of a buffer which is allocated to a multiple of the line size and the cpu won't touch that memory during the dma. Using that knowledge we can handle a PREREAD or PREWRITE by doing a wbinv on the cacheline (this is no different than the general case), and then we can handle the POSTREAD as if the buffer were aligned. We don't have to attempt to preserve partial cachelines before/after the buffer because we know the cpu hasn't touched that memory during the dma, so no cache line load could have happened. In the changes you mentioned, how do you ensure the mbuf header and data end up in separate cache lines without compile-time knowledge of cache line size? Even if the embedded data buffer in an mbuf becomes cache-aligned, the special mbuf awareness will still be needed to handle the fact that the mapped data length may not be a multiple of the cache line size. We'll still need the special knowledge that it's an mbuf so that we can sync the entire last cacheline without attempting to preserve the part beyond the end of the dma buffer. (If this idea makes you vaguely uneasy, you're not alone. However, I've been assured by Scott and Warner that it's always safe to treat mbufs this way.) > > > > I have patches to fix the mbuf partial sync and other things related to > > partial sync problems, and my next step will be to try to rework them to > > fit in with what you've done. Based on what I've seen so far in your > > refactoring, I think just passing a flag to say it's an mbuf, from the > > MI to the MD mapping routine, will provide enough info. > > I think I described a better approach above that will solve mutliple > problems. Let me know what you think. > > Jeff In the long run we can't just reduce the incidence of partial cacheline flushes, they have to be eliminated completely. We can eliminate them for mbufs because of arbitrary rules for mbufs that are supposedly universally followed in the existing code already. We can also eliminate them for any buffers which are allocated by bus_dmamem_alloc(), basically for reasons similar to mbufs: we can ensure that allocated buffers are always aligned and padded appropriately, and establish a rule that says you must not allow any other access to memory in the allocated buffer during dma. Most especially that means you can't allocate a big buffer and sub-divide it in such a way that there is concurrent cpu and dma access, or multiple concurrent dma operations, within a single buffer returned by bus_dmamem_alloc(). Still unresolved is what to do about the remaining cases -- attempts to do dma in arbitrary buffers not obtained from bus_dmamem_alloc() which are not aligned and padded appropriately. There was some discussion a while back, but no clear resolution. I decided not to get bogged down by that fact and to fix the mbuf and allocated-buffer situations that we know how to deal with for now. -- Ian From owner-freebsd-mips@FreeBSD.ORG Tue Dec 25 00:02:33 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 72E629EE for ; Tue, 25 Dec 2012 00:02:33 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) by mx1.freebsd.org (Postfix) with ESMTP id 315028FC17 for ; Tue, 25 Dec 2012 00:02:33 +0000 (UTC) Received: by mail-pa0-f41.google.com with SMTP id bj3so4302747pad.28 for ; Mon, 24 Dec 2012 16:02:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:x-x-sender:to:cc:subject:in-reply-to :message-id:references:user-agent:mime-version:content-type :x-gm-message-state; bh=I0GFDKOg9qlcIkDxXnHwe8OKNN3dwd/o++IWhz6ZEpY=; b=M9pAii43O2vzHYMqPrqaT1qfTpcVzfyTV5ps3cR+d6lUoCx8b6slpHD1qn1rDwOOCy Yc6Cg4qf0jPjE4sh/2fmgCV/TTnRrtLE/teBRXbMfYnwjIrExNIstj+jRLlpdk5PIHzy yBYUYyRg/qCp6xR+CjvomiSghkohtSyDpHId2VBikXMnnMowrTBHsJAZ5K497ENgMEdj 34QywIGug7HR3vFNOEfphly7ZSW8Xfe3itF0wNfQGm/drm1D+JWbd/tc7ZerQ3F7ad0C X17/GPwmRuxiPmPcOZtQXtwaLjNTnCStcLE6SMOawHc32VnySxitD6Lm/6eDcPCpJ0ye SB6w== X-Received: by 10.68.235.40 with SMTP id uj8mr70483044pbc.98.1356393747173; Mon, 24 Dec 2012 16:02:27 -0800 (PST) Received: from rrcs-66-91-135-210.west.biz.rr.com (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210]) by mx.google.com with ESMTPS id oi3sm13055733pbb.1.2012.12.24.16.02.23 (version=SSLv3 cipher=OTHER); Mon, 24 Dec 2012 16:02:26 -0800 (PST) Date: Mon, 24 Dec 2012 14:02:19 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Ian Lepore Subject: Re: Call for testing and review, busdma changes In-Reply-To: <1356390225.1129.217.camel@revolution.hippie.lan> Message-ID: References: <1355077061.87661.320.camel@revolution.hippie.lan> <1355085250.87661.345.camel@revolution.hippie.lan> <1356381775.1129.181.camel@revolution.hippie.lan> <1356390225.1129.217.camel@revolution.hippie.lan> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Gm-Message-State: ALoCoQkuBqj7t5a/lmb899QtRBO26YkP4qut4OmaxjFfa/GdhlqQ6GxkvKVV09SR9A0aoklqJ+W1 X-Mailman-Approved-At: Tue, 25 Dec 2012 01:27:30 +0000 Cc: powerpc@freebsd.org, marcel@freebsd.org, mips@freebsd.org, John Baldwin , mav@freebsd.org, scottl@freebsd.org, attilio@freebsd.org, kib@freebsd.org, sparc64@freebsd.org, arm@freebsd.org X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2012 00:02:33 -0000 On Mon, 24 Dec 2012, Ian Lepore wrote: > On Mon, 2012-12-24 at 11:21 -1000, Jeff Roberson wrote: >> On Mon, 24 Dec 2012, Ian Lepore wrote: >> >>> On Sun, 2012-12-23 at 14:22 -1000, Jeff Roberson wrote: >>>> On Sun, 9 Dec 2012, Ian Lepore wrote: >>>> >>>>> On Sun, 2012-12-09 at 08:48 -1000, Jeff Roberson wrote: >>>>>> >>>>>> On Sun, 9 Dec 2012, Ian Lepore wrote: >>>>>> >>>>>>> On Sat, 2012-12-08 at 08:51 -1000, Jeff Roberson wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> http://people.freebsd.org/~jeff/physbio.diff >>>>>>>> >>> [...] >>>>>>> >>>>>>> Testing on armv4/v5 platforms, the patches applied and compiled cleanly, >>>>>>> but fault-abort with a NULL deref on what appears to be the first call >>>>>>> to bus_dmamap_load(). I'll dig deeper into debugging that after >>>>>>> browsing the new code so I can figure out where to start debugging. >>>>>> >>>>>> Can you give me the file and line of the fault? >>>>>> >>>>> >>>>> Better than that, I can give you patches (attached). There were two >>>>> little glitches... the allocated slist wasn't getting attached to the >>>>> map, and when building the slist in _bus_dmamap_load_buffer() the slist >>>>> needs to participate in the coalescing logic near the end of the loop. >>>> >>>> I understand the first problem. The second, I'm not sure about. When >>>> we're going through bounce pages we use virtual to virtual. Why do you >>>> need to flush the cache lines for the source addresses? The device will >>>> only do io to the bounce pages so they are the only that need other >>>> synchronization. >>>> >>> >>> As I indicated in my following paragraph, I didn't think my fix for >>> constructing the sync list was perfect, it was just good enough to allow >>> more testing to progress. In particular I don't think it would be right >>> if bouncing were required, but it never is on the arm platforms I have >>> to test with. >>> >>> The logic in your patch didn't work in the normal non-bounce case. The >>> main loop that does the mapping works on a single page at a time. For >>> each page it decides whether to use a bounce page instead, then it >>> either merges the page into the current segment if it can, or starts a >>> new segment. Your patch has it create a new sync list entry for every >>> non-bounced page, but the sync list is sized to maxsegs not maxpages. >>> >>> So your logic writes outside the bounds of the sync list array, and my >>> logic includes the virtual ranges of pages that will be bounced in the >>> sync list (which I think should be harmless anyway, just a bit >>> inefficient). The logic is wrong in both cases, but since bouncing >>> never happens on my hardware I knew my tweak would let me do more >>> testing. >> >> Ok, so I need to add coalescing and perhaps fix the bounds checking for >> virtual ranges. Correct? I can do that simply enough. I have some other >> changes I will make concurrently and then get another build that I would >> like you to test. Thank you for all the feedback so far. >> > > Okay, but did you see that Olivier committed my busdma changes recently? > It makes for a pretty significant by-hand merge with your work. I've > done that merge -- I think I sent it to you last week -- but what > Olivier committed was a wee bit different than what I merged with then. > I'm going to move this patch further along before I re-merge. I want to get buy in on the next set of API changes I'm going to make. > I think the little patch I sent originally to do the sync list > coalescing based on segments is probably good enough for now. It will > result in syncing virtual ranges that are actually being bounced, but > the arm code has always done that due to bugs (it skips the virtual > range sync only if the entire buffer fits in one bounce page). I will look at this next. mips has the same pattern and I want to make sure it's correct there too. > > I just had a close look at the armv6 implementation of mapping and > syncing for the first time, and... oh my. It's creating (malloc'ing!) a > sync list entry for every individual page of every mapped buffer. I > think it should be doing something more like my quickie-patch that bases > the sync list entries on the segments since it has to sync l2 cache > based on physical ranges for some chips. If you look at my branch I made the two arm busdma implementations identical in this regard. > >>> >>> In the past, I've never paid close attention to the bounce logic, I've >>> tended to read around it as if it weren't there. Recently I've begun to >>> look at it and the more I look the more I think that to the degree that >>> it works at all, it works by accident. I think mostly the logic just >>> never gets used on the arm platforms that people are working with, >>> because there'd be more problems reported if it were. >> >> I think it is even rarely used on x86 anymore. The number of poorly >> behaved DMA engines has been shrinking. >> >>> >>> For example... there is no handling of the PREREAD case when syncing >>> bounce pages. A POSTREAD invalidate is insufficient; if any of the >>> cache lines are dirty at the time the DMA starts, there are several ways >>> those lines can get written back to physical memory while the DMA is in >>> progress, corrupting the data in the IO buffer. >>> >>> Another example... there is an unordered list of bounce pages which are >>> substituted for real pages as needed, one page at a time. No attempt is >>> made to ensure that a run of contiguous physical pages that need to be >>> bounced turns into a corresponding single run of contiguous pages in the >>> bounce zone. In other words, it seems like over time as the list >>> becomes less ordered the bounce logic would increasingly create >>> multi-segment DMA. The reason that seems like a problem is that on >>> every system I've checked (arm and i386) getting to multi-user mode >>> creates about 100 dma tags, and typically only 5 or 6 of those tags >>> allow more than one segment. I understand that this is not "incorrect >>> operation" of the busdma code, but it does seem to be circumstantial >>> evidence that this logic isn't being exercised much, since it appears >>> that few drivers can actually handle multi-segment DMA. >>> >>> While the work I've done so far on arm busdma has concentrated on the >>> buffer allocation and correct operation of the sync routines in the >>> non-bounce cases, I think I'll be looking closely at the map load >>> (currently seems a bit inefficient) and bounce-handling logic soon. >> >> I'm shocked that we're sending I/O of larger than a page size to devices >> that only support 1 sg entry. Rather than memcpying and bouncing these >> drivers should be written to chop up the I/O themselves. For network >> devices they should not advertise jumbo frame support and for disk devices >> they can specify the maximum io size as one page already. The original >> pages that we send down would almost never be contiguous until very >> recently so this wouldn't have worked even without bounce pages. >> >>> >>>>> >>>>> I'm not positive the attached quick-fix is perfect, but it allows one of >>>>> my arm units here to boot and run and pass some basic IO smoke tests. >>>>> I'll be trying it on other arm platforms later today. >>>>> > [...] >>> The fact that a deferred load can't be easily implemented in the case of >>> mbufs and uio might be why the manpage says the NOWAIT flag is implied >>> in those cases. There's no reason why the busdma code can't handle >>> deferred loading for mbufs and uio -- as you say, it seems to only >>> involve storing a bit of extra info for the callback. I think the hard >>> part of the problem is elsewhere. For a transmit mbuf that needs a >>> deferred mapping, does the driver's interface to the network stack allow >>> for the concept of "this call has neither suceeded nor failed, but >>> either outcome is possible, I'll let you know later" (I have no idea >>> what the answer to that is but I can see how it could quickly become >>> complex for drivers and other higher layers to deal with). For a >>> deferred uio mapping, what if the userland pmap is no longer valid when >>> the callback happens? What if userland has freed the memory? I suspect >>> the implied NOWAIT requirement exists to wish away the need to handle >>> such things. >> >> Yes for uio the operation will have to be able to work on something other >> than the current pmap. Many busdma implementations remember the pmap >> because that could also be true when sync is called. Also actual uio >> operations to user pages are probably not initiated by anything in tree. >> Maybe some command interfaces for storage drivers. It is definitely an >> edge case. I would rather not support it but I guess it does make such >> things easier to implement and not require a copy. >> >> Most storage drivers have the logic that you describe above. They watch >> for EINPROGRESS. Then the callback can supply an error if the operation >> fails later. I would like to make this more uniform because cam drivers >> are about to support multiple different memory descriptors. It would be a >> shame if they each had different semantics. So I need deferred load of >> many types to work. >> > > Yeah, I've done some low-level storage driver stuff myself (mmc/sd) and > I can see how easy the deferred load solutions are to implement in that > sort of driver that's already structured to operate asychronously. I'm > not very familiar with how network hardware drivers interface with the > rest of the network stack. I have some idea, I'm just not sure of all > the subtleties involved and whether there are any implications for > something like a deferred load. > > This is one of those situations where I tend to say to myself... the > folks who designed this stuff and imposed the "no deferred load" > restriction on mbufs and uio but not other cases were not stupid or > lazy, so they must have had some other reason. I'd want to know what > that was before I went too far with trying to undo it. In network drivers you just discard the mbuf. Networks are lossy. If you persistently are out of bounce buffers you'll fail to transmit. The flags were manually or'd in for these consumers to simplify the implementation. I intend to change the underlying mechanism to support it but not the mbuf load wrappers. > >>> >>> My original point was that the WAITOK flag has a value of zero. Your >>> code gives the impression that it forces all callers to accept a >>> deferred mapping, overriding whatever flag they may have passed, but in >>> fact it doesn't do anything at all. >> >> Yes I realized that. I just made a commit to eliminate that on my branch. >> Thank you for pointing it out. >> >>> >>> Hmm, I just looked at non-arm implementations more closely and I see >>> that some of them have the same logic as your patches, they OR-in a zero >>> in an apparent attempt to override the caller's flags. That's contrary >>> to what the manpage says and IMO contrary to common sense -- if a caller >>> has said that it is unable to handle a deferred mapping callback, it's >>> unlikely that anything is going to work correctly if a deferred callback >>> happens (in fact, it sounds like a crash or panic waiting to happen). I >>> think the only reason the existing code hasn't caused problems is >>> because the flag value is zero and it actually does nothing except >>> mislead when you read it. >> >> Yes, they all seem to respect NOWAIT as well. So oring it in did nothing. >> >>> >>>> By the way I have an mbuf branch that pushes data and mbuf structure into >>>> separate cachelines. It's not only faster on arm and the like but on x86 >>>> as well. So someone should pursue this and you'd have way fewer partial >>>> line flushes. >>>> >>> [...] >>>> You can skip the [mbuf] sync because you know the information in the mbuf header >>>> is not necessary? >>>> >>> >>> Not exactly. In the general case, if the buffer isn't aligned and >>> padded to a cacheline boundary, we don't know anything about the memory >>> before and after the buffer which partially shares a cacheline we're >>> about to operate on, so the existing code attempts to preserve the >>> contents of that unknown memory across the sync operation. (It cannot >>> possibly be reliably sucessful in doing so, but that's another story.) >>> >>> In the case of an mbuf, where the data area following the header is >>> always misaligned to a cacheline, we do know something about the memory >>> before the data buffer: we know that it's an mbuf header which will not >>> be accessed by the cpu while the dma is in progress. Likewise if the >>> length isn't a multiple of the line size we know the data after the >>> mapped length is still part of a buffer which is allocated to a multiple >>> of the line size and the cpu won't touch that memory during the dma. >>> >>> Using that knowledge we can handle a PREREAD or PREWRITE by doing a >>> wbinv on the cacheline (this is no different than the general case), and >>> then we can handle the POSTREAD as if the buffer were aligned. We don't >>> have to attempt to preserve partial cachelines before/after the buffer >>> because we know the cpu hasn't touched that memory during the dma, so no >>> cache line load could have happened. >>> >>> In the changes you mentioned, how do you ensure the mbuf header and data >>> end up in separate cache lines without compile-time knowledge of cache >>> line size? >>> >>> Even if the embedded data buffer in an mbuf becomes cache-aligned, the >>> special mbuf awareness will still be needed to handle the fact that the >>> mapped data length may not be a multiple of the cache line size. We'll >>> still need the special knowledge that it's an mbuf so that we can sync >>> the entire last cacheline without attempting to preserve the part beyond >>> the end of the dma buffer. (If this idea makes you vaguely uneasy, >>> you're not alone. However, I've been assured by Scott and Warner that >>> it's always safe to treat mbufs this way.) >>> >> >> I understand now. It is safe in the usual cases but not in the general >> case. The mbuf API allows for 'ext bufs' which have reference to >> arbitrary external memory. There are no guarantees about the alignment of >> this memory or what comes before or after it. It could really be >> anything. In practice I think you'll be safe with all code that is in >> tree but I'm not 100% sure. >> > > The ext bufs are exactly the part that makes me vaguely uneasy, but I > figured I'd run with what I was told until it was proven unworkable. > > > [...] >>> >>> Still unresolved is what to do about the remaining cases -- attempts to >>> do dma in arbitrary buffers not obtained from bus_dmamem_alloc() which >>> are not aligned and padded appropriately. There was some discussion a >>> while back, but no clear resolution. I decided not to get bogged down >>> by that fact and to fix the mbuf and allocated-buffer situations that we >>> know how to deal with for now. >> >> It sounds like you're hitting the common cases. It seems ok to keep the >> partial flush logic for the less common cases and deal with the >> performance penalty. >> >> Jeff >> > > I've got to keep stressing... it's not a performance penalty thing, it's > a correct operation thing. Partial cacheline flush logic cannot be > written in a way that's guaranteed to always give correct results, > because the software can't prevent the hardware from doing an eviction > and writing back stale data even while the code that attempts to > preserve the data is running. Some day when all the other code is > right, the place where we currently detect a partial flush and try to > preserve the surrounding data will be a panic call. Until then we'll > keep relying on the partial flush usually working right by accident. I understand now. If you're sharing the line with someone who is active you can't do it safely just by flushing and cleaning before the DMA. They could re-dirty the page. I was just thinking about the sync operation and existing dirty data. Technically you should bounce in this case as well since it can't be made safe. Not bouncing for mbufs is a nice optimization I see. Jeff > > But yeah, taking care of the common cases first was my priority. > > -- Ian > > From owner-freebsd-mips@FreeBSD.ORG Tue Dec 25 03:13:19 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CEC52327 for ; Tue, 25 Dec 2012 03:13:19 +0000 (UTC) (envelope-from scott4long@yahoo.com) Received: from nm19.bullet.mail.bf1.yahoo.com (nm19.bullet.mail.bf1.yahoo.com [98.139.212.178]) by mx1.freebsd.org (Postfix) with ESMTP id 759118FC16 for ; Tue, 25 Dec 2012 03:13:19 +0000 (UTC) Received: from [98.139.215.140] by nm19.bullet.mail.bf1.yahoo.com with NNFMP; 25 Dec 2012 03:13:12 -0000 Received: from [98.139.213.3] by tm11.bullet.mail.bf1.yahoo.com with NNFMP; 25 Dec 2012 03:13:12 -0000 Received: from [127.0.0.1] by smtp103.mail.bf1.yahoo.com with NNFMP; 25 Dec 2012 03:13:12 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1356405192; bh=ienAAZglLtbBNsor83kO6OPNJsGRWpJRYSiVFT0tQqw=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Received:Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc:Content-Transfer-Encoding:Message-Id:References:To:X-Mailer; b=wdoP2wMZR800NZ1nu3hDEgMbqcUgrU4R6p1JvMo4WC/4WqzUgf0YjR6BPTcpX+BVwbrF24rMf+rAPg/Kx66E0tVLSTnJbcLMc1uYaWiN8M2bKV1W0jZb9nzik4rjGahChw00xzosT/Hhwqn8DpBZDdMxrE46h/ihvMhbmAvaBmA= X-Yahoo-Newman-Id: 233465.49273.bm@smtp103.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: hDJW8PcVM1l0iy7KkxW1my2pYpqQMHhtseGfw0da88jRdvm gZ1.GIEeIuDLxMoMe6aCGrkTFDyGvMjfolt4MSQYbUHvixI7WwqtgfCbEzlv 4kLqDypbFZ5vpFWVoV7gyYLZblUnYRWwrs2Gjr2V1fyJPcbNsJd8Wg28c_tQ q4K6qDqcysRxLx5T9Wkmr33IxW9rtMUqLTqa9drxawgScADW.UHHqTQ4QLBx HYzNzh.Kj.gC72PUl1dTGcOXxQYgdyx3bZGc6piUmTNddjWRrdzJqYAtJ0x0 hA.2LykQZioWy1OE5732MLw3_Iv1HoKQP43wiYMzo3ty4muqFsvFf6SRvG.A 7R0oygCNlF_sEo.BBPQ96aF83fVlNqD1jZvbVwGbCQScWUg5tHORsjbTudGr j0m8y5OmDN.luvwy0VxK0eXgwKvqFwnYW1IVdtoHRHwVbJySQcYkpoUx_Ze7 RF3WY.CdyYvx7Tg-- X-Yahoo-SMTP: clhABp.swBB7fs.LwIJpv3jkWgo2NU8- Received: from [172.20.10.3] (scott4long@70.193.200.44 with plain) by smtp103.mail.bf1.yahoo.com with SMTP; 24 Dec 2012 19:13:12 -0800 PST Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Call for testing and review, busdma changes From: Scott Long In-Reply-To: <1356390225.1129.217.camel@revolution.hippie.lan> Date: Mon, 24 Dec 2012 22:13:10 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <2D98F70D-4031-4860-BABB-1F4663896234@yahoo.com> References: <1355077061.87661.320.camel@revolution.hippie.lan> <1355085250.87661.345.camel@revolution.hippie.lan> <1356381775.1129.181.camel@revolution.hippie.lan> <1356390225.1129.217.camel@revolution.hippie.lan> To: Ian Lepore X-Mailer: Apple Mail (2.1499) X-Mailman-Approved-At: Tue, 25 Dec 2012 03:28:51 +0000 Cc: powerpc@freebsd.org, marcel@freebsd.org, mips@freebsd.org, John Baldwin , "mav@freebsd.org Motin" , "attilio@FreeBSD.org Rao" , Jeff Roberson , sparc64@freebsd.org, arm@freebsd.org, kib@freebsd.org X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2012 03:13:19 -0000 On Dec 24, 2012, at 6:03 PM, Ian Lepore = wrote: >=20 > Yeah, I've done some low-level storage driver stuff myself (mmc/sd) = and > I can see how easy the deferred load solutions are to implement in = that > sort of driver that's already structured to operate asychronously. = I'm > not very familiar with how network hardware drivers interface with the > rest of the network stack. I have some idea, I'm just not sure of all > the subtleties involved and whether there are any implications for > something like a deferred load. >=20 > This is one of those situations where I tend to say to myself... the > folks who designed this stuff and imposed the "no deferred load" > restriction on mbufs and uio but not other cases were not stupid or > lazy, so they must have had some other reason. I'd want to know what > that was before I went too far with trying to undo it. >=20 Deferring is expensive from a latency standpoint. For disks, this = latency was comparatively small (until recent advances in SSD), so it = didn't matter, but it did matter with network devices. Also, network = drivers already had the concept of dropping mbufs due to resource = shortages, and the strict requirement of guaranteed transactions with = storage didn't apply. Deferring and freezing queues to guarantee = delivery order is a pain in the ass, so the decision was made that it = was cheaper to drop an mbuf on a resource shortage rather than defer. = As for uio's, they're the neglected part of the API and there's really = been no formal direction or master plan put into their evolution. = Anyways, that's my story and I'm sticking to it =3D-) Also, eliminating the concept of deferred load from mbufs then freed us = to look at ways to make the load operation cheaper. There's a lot of = code in _bus_dmamap_load_buffer() that is expensive, but a big one was = the indirect function pointer for the callback in the load wrappers. = The extra storage for filling in the temporary s/g list was also looked = at. Going with direct loads allowed me to remove these and reduce most = of the speed penalties. >=20 >>>=20 >>> Still unresolved is what to do about the remaining cases -- attempts = to >>> do dma in arbitrary buffers not obtained from bus_dmamem_alloc() = which >>> are not aligned and padded appropriately. There was some discussion = a >>> while back, but no clear resolution. I decided not to get bogged = down >>> by that fact and to fix the mbuf and allocated-buffer situations = that we >>> know how to deal with for now. >>=20 Why would these allocations not be handled as normal dynamic buffers = would with bus_dmamap_load()? Scott From owner-freebsd-mips@FreeBSD.ORG Tue Dec 25 11:38:57 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C0AAAA2B; Tue, 25 Dec 2012 11:38:57 +0000 (UTC) (envelope-from ray@freebsd.org) Received: from smtp.dlink.ua (smtp.dlink.ua [193.138.187.146]) by mx1.freebsd.org (Postfix) with ESMTP id 76ABA8FC16; Tue, 25 Dec 2012 11:38:57 +0000 (UTC) Received: from terran (unknown [192.168.10.90]) (Authenticated sender: ray) by smtp.dlink.ua (Postfix) with ESMTPA id D3943C492D; Tue, 25 Dec 2012 13:38:49 +0200 (EET) Date: Tue, 25 Dec 2012 13:39:04 +0200 From: Aleksandr Rybalko To: arm@freebsd.org, mips@freebsd.org, ppc@freebsd.org Subject: FDT changes Message-Id: <20121225133904.8063fb1cd193b3078b9c7596@freebsd.org> X-Mailer: Sylpheed 3.2.0 (GTK+ 2.24.6; amd64-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2012 11:38:57 -0000 Hello embedded hackers! (I'm not on ppc@ list, so answer to all or CC me) I made small patch [0], it's give two features: 1. We see physical addresses, like on all other systems which not use FDT. 2. It's not panic on attempt to do bus_space_map on "reg" propertie (for cases when bus is SPI/I2C/etc.) Please-please-please, give me your comments. If nobody objects, I will commit it at the end of Jan 2013 (to not worry anyone on holidays :) ) [0] http://people.freebsd.org/~ray/2012-12-25_fdt_correct_resource.diff WBW -- Aleksandr Rybalko From owner-freebsd-mips@FreeBSD.ORG Thu Dec 27 15:54:16 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4299CF9D; Thu, 27 Dec 2012 15:54:16 +0000 (UTC) (envelope-from tinderbox@freebsd.org) Received: from freebsd-current.sentex.ca (freebsd-current.sentex.ca [64.7.128.98]) by mx1.freebsd.org (Postfix) with ESMTP id 06B0D8FC12; Thu, 27 Dec 2012 15:54:15 +0000 (UTC) Received: from freebsd-current.sentex.ca (localhost [127.0.0.1]) by freebsd-current.sentex.ca (8.14.5/8.14.5) with ESMTP id qBRFsF54080588; Thu, 27 Dec 2012 10:54:15 -0500 (EST) (envelope-from tinderbox@freebsd.org) Received: (from tinderbox@localhost) by freebsd-current.sentex.ca (8.14.5/8.14.5/Submit) id qBRFsFAk080587; Thu, 27 Dec 2012 15:54:15 GMT (envelope-from tinderbox@freebsd.org) Date: Thu, 27 Dec 2012 15:54:15 GMT Message-Id: <201212271554.qBRFsFAk080587@freebsd-current.sentex.ca> X-Authentication-Warning: freebsd-current.sentex.ca: tinderbox set sender to FreeBSD Tinderbox using -f Sender: FreeBSD Tinderbox From: FreeBSD Tinderbox To: FreeBSD Tinderbox , , Subject: [head tinderbox] failure on mips/mips Precedence: bulk X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.14 List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Dec 2012 15:54:16 -0000 TB --- 2012-12-27 15:34:27 - tinderbox 2.10 running on freebsd-current.sentex.ca TB --- 2012-12-27 15:34:27 - FreeBSD freebsd-current.sentex.ca 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #0: Mon Mar 26 13:54:12 EDT 2012 des@freebsd-current.sentex.ca:/usr/obj/usr/src/sys/GENERIC amd64 TB --- 2012-12-27 15:34:27 - starting HEAD tinderbox run for mips/mips TB --- 2012-12-27 15:34:27 - cleaning the object tree TB --- 2012-12-27 15:34:27 - /usr/local/bin/svn stat /src TB --- 2012-12-27 15:34:30 - At svn revision 244738 TB --- 2012-12-27 15:34:31 - building world TB --- 2012-12-27 15:34:31 - CROSS_BUILD_TESTING=YES TB --- 2012-12-27 15:34:31 - MAKEOBJDIRPREFIX=/obj TB --- 2012-12-27 15:34:31 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2012-12-27 15:34:31 - SRCCONF=/dev/null TB --- 2012-12-27 15:34:31 - TARGET=mips TB --- 2012-12-27 15:34:31 - TARGET_ARCH=mips TB --- 2012-12-27 15:34:31 - TZ=UTC TB --- 2012-12-27 15:34:31 - __MAKE_CONF=/dev/null TB --- 2012-12-27 15:34:31 - cd /src TB --- 2012-12-27 15:34:31 - /usr/bin/make -B buildworld >>> Building an up-to-date make(1) >>> World build started on Thu Dec 27 15:34:35 UTC 2012 >>> Rebuilding the temporary build tree >>> stage 1.1: legacy release compatibility shims >>> stage 1.2: bootstrap tools >>> stage 2.1: cleaning up the object tree >>> stage 2.2: rebuilding the object tree >>> stage 2.3: build tools >>> stage 3: cross tools >>> stage 4.1: building includes >>> stage 4.2: building libraries [...] cc -O -pipe -G0 -DLIBC_SCCS -DINET6 -I/src/lib/libutil -I/src/lib/libutil/../libc/gen/ -std=gnu99 -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c /src/lib/libutil/auth.c -o auth.o cc -O -pipe -G0 -DLIBC_SCCS -DINET6 -I/src/lib/libutil -I/src/lib/libutil/../libc/gen/ -std=gnu99 -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c /src/lib/libutil/expand_number.c -o expand_number.o cc -O -pipe -G0 -DLIBC_SCCS -DINET6 -I/src/lib/libutil -I/src/lib/libutil/../libc/gen/ -std=gnu99 -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c /src/lib/libutil/flopen.c -o flopen.o cc -O -pipe -G0 -DLIBC_SCCS -DINET6 -I/src/lib/libutil -I/src/lib/libutil/../libc/gen/ -std=gnu99 -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c /src/lib/libutil/fparseln.c -o fparseln.o cc -O -pipe -G0 -DLIBC_SCCS -DINET6 -I/src/lib/libutil -I/src/lib/libutil/../libc/gen/ -std=gnu99 -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c /src/lib/libutil/gr_util.c -o gr_util.o cc1: warnings being treated as errors /src/lib/libutil/gr_util.c: In function 'gr_add': /src/lib/libutil/gr_util.c:510: warning: cast discards qualifiers from pointer target type *** [gr_util.o] Error code 1 Stop in /src/lib/libutil. *** [lib/libutil__L] Error code 1 Stop in /src. *** [libraries] Error code 1 Stop in /src. *** [_libraries] Error code 1 Stop in /src. *** Error code 1 Stop in /src. TB --- 2012-12-27 15:54:15 - WARNING: /usr/bin/make returned exit code 1 TB --- 2012-12-27 15:54:15 - ERROR: failed to build world TB --- 2012-12-27 15:54:15 - 907.60 user 176.40 system 1187.78 real http://tinderbox.freebsd.org/tinderbox-head-ss-build-HEAD-mips-mips.full