From owner-freebsd-mips@FreeBSD.ORG Mon Aug 27 22:08:17 2012 Return-Path: Delivered-To: freebsd-mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E93E4106566C; Mon, 27 Aug 2012 22:08:16 +0000 (UTC) (envelope-from freebsd@damnhippie.dyndns.org) Received: from duck.symmetricom.us (duck.symmetricom.us [206.168.13.214]) by mx1.freebsd.org (Postfix) with ESMTP id 6F7F08FC0A; Mon, 27 Aug 2012 22:08:04 +0000 (UTC) Received: from damnhippie.dyndns.org (daffy.symmetricom.us [206.168.13.218]) by duck.symmetricom.us (8.14.5/8.14.5) with ESMTP id q7RM7q0C094077; Mon, 27 Aug 2012 16:07:59 -0600 (MDT) (envelope-from freebsd@damnhippie.dyndns.org) Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id q7RM7Uu6031952; Mon, 27 Aug 2012 16:07:30 -0600 (MDT) (envelope-from freebsd@damnhippie.dyndns.org) From: Ian Lepore To: Warner Losh In-Reply-To: <9642068B-3C66-42BD-8515-14F734B3FF89@bsdimp.com> References: <6D83AF9D-577B-4C83-84B7-C4E3B32695FC@bsdimp.com> <9642068B-3C66-42BD-8515-14F734B3FF89@bsdimp.com> Content-Type: text/plain; charset="us-ascii" Date: Mon, 27 Aug 2012 16:07:30 -0600 Message-ID: <1346105250.1140.314.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: Tim Kientzle , Hans Petter Selasky , freebsd-arm@freebsd.org, freebsd-mips@freebsd.org, freebsd-arch@freebsd.org Subject: Re: Partial cacheline flush problems on ARM and MIPS X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Aug 2012 22:08:17 -0000 On Mon, 2012-08-27 at 09:40 -0600, Warner Losh wrote: > On Aug 27, 2012, at 9:20 AM, Adrian Chadd wrote: > > That does remind me, I think the ath(4) driver does the same (since it > > allocates its own descriptor block and then treats it as an array of > > descriptors for the hardware to access) - I should ensure that > > sizeof(ath_desc) is aligned on the relevant architecture. It gets > > slightly scary - AR93xx TX descriptors are "L1 cache == 128 byte > > aligned" which is an enormous waste of memory compared to a 16 or 32 > > byte aligned platform. Alas.. > > The problem is with cache line sharing, not necessarily with alignment. If you are only ever using one of them at a time, or if you have perfect hygiene, you can cope with this situation without undue waste. The perfect hygiene might be hard sometimes. This brings up an interesting tangential issue for this busdma discussion. For some controller hardware you allocate a block of memory which is treated as an array of "descriptors" or some other shared control information, you set a register in the hardware to point to that block of memory, and then there is some degree of concurrent access of that memory by hardware and CPU. The interesting part is that some such hardware cannot operate in phases as anticipated by our busdma model. That is, there's no clear demarkation points between "the CPU has exclusive access to the memory" and "the hardware has exclusive access to the memory." Usually for these schemes to work correctly, the memory has to be mapped as uncached, unbuffered, strongly ordered, or whatever combo of those makes sense for a given platform. We have arm drivers that use bus_dmamem_alloc() with the BUS_DMA_COHERENT flag to obtain such memory, even though that wasn't the intended meaning for that flag. If the armv4 busdma implementation were changed to stop honoring the COHERENT flag (it's supposed to be an optional feature) those drivers would stop working. So we need to track down such mistakes and fix them, but the question is: fix them how? I think it may make sense to let busdma handle it, because you may get some advantage from the allocation being made based upon the constraints encoded in the inherited chain of tags for the driver. On the other hand, drivers doing this sort of thing are usually pretty close to the silicon and have a good idea for themselves what the hardware constraints are. We could just say that drivers with such needs should call kmem_alloc_contig() or kmem_alloc_attr() for themselves. If we say it's a thing that busdma should handle, then I think we need: * A flag that is universal across all platforms that means unambiguously that you need memory that is mapped however device-register memory is mapped on that platform (uncached, unbuffered, strongly ordered; I'm tempted to say "whatever pmap_mapdev() does" but I'm not sure that's rigorously correct). * If the request cannot be honored for some reason it has to return failure, not quietly give you regular cached memory instead (which is what BUS_DMA_COHERENT does). * The busdma sequence of sync operations does not apply to memory allocated with this flag, and indeed you must not call the sync functions on such memory. The x86 busdma code recently grew a BUS_DMA_NOCACHE flag, perhaps that's the name that should be supported across all platforms? -- Ian