From owner-freebsd-mips@FreeBSD.ORG Sun Aug 26 18:05:31 2012 Return-Path: Delivered-To: freebsd-mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0553D106566B; Sun, 26 Aug 2012 18:05:31 +0000 (UTC) (envelope-from marktinguely@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id B86078FC08; Sun, 26 Aug 2012 18:05:30 +0000 (UTC) Received: by dadr6 with SMTP id r6so1986829dad.13 for ; Sun, 26 Aug 2012 11:05:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Uw70O9ZjXEv0noKSxgiAWrFGrA7R41RTQBz+00qiyXg=; b=lRMVdUjT02PasXSBwruHEy+P+tM/5iIRceigrGli1+Adb6r+aF6KRIASE2HatMyk9R 0ydHl0W5mtYVWe/O6UM0CRSVtswg5BzKmV/rpcxgCSfl1qORXuv3xn1eRyk+sOjPNYZO UsgNWJwnflgqUBll5iG9sKtc1AeLU+U5T60xqEM1AVlrLAejg3fAJ0VKl6nIr65wZPKz vJwP5oR93/LEJv3bWXYUo7FgQ/NHLIvNydb3nQMXpPpqifD1Z0iBjgEHf5KgLaoJo4x2 DsuoqXjIV95lXHpAIcPj2D1MuD9zcHa9+/pDu30eLi0XrQhFBGxfoOO2HXfgIYMawTcW TgTQ== MIME-Version: 1.0 Received: by 10.68.221.70 with SMTP id qc6mr29050561pbc.92.1346004330201; Sun, 26 Aug 2012 11:05:30 -0700 (PDT) Received: by 10.68.229.227 with HTTP; Sun, 26 Aug 2012 11:05:29 -0700 (PDT) In-Reply-To: <1346002922.1140.56.camel@revolution.hippie.lan> References: <1345757300.27688.535.camel@revolution.hippie.lan> <3A08EB08-2BBF-4B0F-97F2-A3264754C4B7@bsdimp.com> <1345763393.27688.578.camel@revolution.hippie.lan> <1345765503.27688.602.camel@revolution.hippie.lan> <1345766109.27688.606.camel@revolution.hippie.lan> <1346002922.1140.56.camel@revolution.hippie.lan> Date: Sun, 26 Aug 2012 13:05:29 -0500 Message-ID: From: Mark Tinguely To: Ian Lepore Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-arm@freebsd.org, freebsd-arch@freebsd.org, freebsd-mips@freebsd.org, Hans Petter Selasky Subject: Re: Partial cacheline flush problems on ARM and MIPS X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Aug 2012 18:05:31 -0000 On Sun, Aug 26, 2012 at 12:42 PM, Ian Lepore wrote: > On Thu, 2012-08-23 at 22:00 -0600, Warner Losh wrote: >> The bottom line is that you can't mix things like that when cache >> lines are involved. The current code that tries is doomed to failure. >> Doomed. You just can't control all flushes, as Ian's missive >> demonstrates, and trying to accommodate code that does this I don't >> think can possibly work. All the interrupt masking, copying in and >> out, etc I fear is doomed to utter and abject failure. >> > Until last weekend I was in the camp that thought the partial cacheline > flush problem was solvable with sufficiently clever code. Now I agree > that we're doomed to failure and it's time to try another direction. > > We're going to have some implementation work to do in arm and mips > busdma, but I think the larger part of the task is going to be defining > more rigorously how a driver must interact with the busdma system to > function correctly on all types of platforms, and then update existing > drivers to conform. > > The busdma manpage currently has some vague words about the usage and > sequencing of sync ops, such as "If read and write operations are not > preceded and followed by the appropriate synchronization operations, > behavior is undefined." I think we should more explicitly spell out > what the appropriate sequences are. In particular: > > * The PRE and POST operations must occur in pairs; a PREREAD must > be followed eventually by a POSTREAD and a PREWRITE must be > followed by a POSTWRITE. > * The CPU is not allowed to access the mapped memory after a PRE > sync and before the corresponding POST sync. > * The DMA hardware is not allowed to access the mapped memory > after a POST sync and before the next PRE sync. > * Read and write sync operators may be combined in a single call, > PRE and POST operators may not be. E.G., PREREAD|PREWRITE is > allowed, PREREAD|POSTREAD is not. We should note that while > read and write operations may be combined, on some platforms > PREREAD|PREWRITE is needlessly expensive when only a read is > being performed. > > We also need some rules about working with buffers obtained from > bus_dmamem_alloc() and external buffers passed to bus_dmamap_load(). I > think the rule should be that a buffer obtained from bus_dmamem_alloc(), > or more formally any region of memory mapped by a bus_dmamap_load(), is > a single logical object which can only be accessed by one entity at a > time. That means that there cannot be two concurrent DMA operations > happening in different regions of the same buffer, nor can DMA and CPU > access be happening concurrently even if in different parts of the > buffer. > > I've always thought that allocating a dma buffer feels like a big > hassle. You sometimes have to create a tag for the sole purpose of > setting the maxsize to get the buffer size you need when you call > bus_dmamem_alloc(). If bus_dmamem_alloc() took a size parm you could > just use your parent tag, or a generic tag appropriate to all the IO > you're doing for a given device. If you need a variety of buffers for > small control and command and status transfers of different sizes, you > end up having to manage up to a dozen tags and maps and buffers. It's > all very clunky and inconvenient. It's just the sort of thing that > makes you want to allocate a big buffer and subdivide it. Surely we > could do something to make it easier? > > -- Ian I did a quick look at the drivers last summer. Most drivers do the right thing and use memory allocated from bus_dmamem_alloc(). It is easy for us to give them a cache aligned buffer. Some drivers use mbufs - 256 bytes which cache safe. Some drivers directly or indirectly malloc() a buffer and then use it to dma - rather than try to fix them all, I was okay with making the smallest malloc() amount equal to the cache line size. It amounts to getting rid of the 16 byte allocation on some ARM architectures. The power of 2 allocator will then give us cache line safe allocation. A few drivers take a small memory amount from the kernel stack and dma to it <- broken driver. The few drivers that use data from a structure and that memory is not cached aligned <- broken driver. --Mark Tinguely.