Date: Tue, 30 Jun 2009 11:28:23 +0200 From: Hans Petter Selasky <hselasky@c2i.net> To: Piotr =?iso-8859-2?q?Zi=EAcik?= <kosmo@semihalf.com> Cc: Rafal Jaworowski <raj@semihalf.com>, freebsd-arm@freebsd.org, thompsa@freebsd.org, freebsd-usb@freebsd.org Subject: Re: CPU Cache and busdma usage in USB Message-ID: <200906301128.26046.hselasky@c2i.net> In-Reply-To: <200906301037.49367.kosmo@semihalf.com> References: <200906231035.43096.kosmo@semihalf.com> <200906291516.58725.hselasky@c2i.net> <200906301037.49367.kosmo@semihalf.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 30 June 2009 10:37:48 Piotr Zi=EAcik wrote: > Monday 29 June 2009 15:16:56 Hans Petter Selasky napisa=B3(a): > > You want to change the flags passed to bus_dmamap_sync() so that the > > flush/invalidate mapping gets right. I understand why your patch makes = it > > work. That's not the problem. > > > > In "src/sys/arm/arm/busdma_machdep.c" there is a function called > > "_bus_dmamap_sync_bp()". If you look at that function you see that it > > only triggers on the "BUS_DMASYNC_PREWRITE" and "BUS_DMASYNC_POSTREAD" > > flags. After your patching only the PREXXXX flags are used, so if bouce > > pages are used on ARM and x86 and amd64 +++, then only > > BUS_DMASYNC_PREWRITE will do anything. This indicates that your patch is > > not fully correct. > Hi, > That is true. I have missed "bounce page" case. I can change flags passed > to bus_dmamap_sync() to fix this on ARM, but this will break MIPS. Right, so there is inconsistency among the platforms in how the BUSDMA is=20 implemented, which should be fixed. > This clearly shows the problem - using side effect of busdma to manage CPU > cache. Current USB implementation relies of this side effect assuming that > bus_dmamap_sync() with given flags will do cpu cache flush/invalidation. > This is not true even on i386 ! i386 handles this differently. > This thread is not about my patch. It is only a fast and dirty hack > making USB working on platforms without hardware cache coherency > and showing the problem. I see two proper solutions: Ok. > > 1. Implement usb_pc_cpu_invalidate() / usb_pc_cpu_flush() with > cpu_*() functions realizing requested operation. > > 2. Using busdma in proper way: > [... prepare data to transfer ...] > bus_dmamap_sync(..., PREREAD | PREWRITE); > [... do transfer ...] > bus_dmamap_sync(..., POSTREAD | POSTWRITE); > [... check results ...] > as manpage says: > <cite> > If read and write operations are not preceded and followed by the > appropriate synchronization operations, behavior is undefined. > </cite> > Requirement cited above is currently not met in USB stack. Funtion > shown in http://fxr.watson.org/fxr/source/dev/usb/controller/ehci.c#L1294 > is just an example of improper usage of bus_dmamap_sync(), which is hidden > under usb_pc_cpu_invalidate(). I'm not violating any rules in busdma. The busdma manual is silent about do= ing=20 multiple sync operations of the same kind in a row. If the sheeva hardware = you=20 are using works differently, you have to fix it. > > This thread started from my question about general usage of usb_pc_cpu_*() > functions. So I am asking once again - why these functions relies on side > effect of busdma instead of simply doing cache flush/invalidation through > cpu_*() functions ? Because busdma is the interface for doing these kinds of operations and not= =20 cpu_xxx(). The DMA memory is allocated using busdma and loaded using busdma= ,=20 and cpu_xxx() is not a part of the busdma interface. > > > Grepping through the source code for ARM, I found a line like this: > > (...) > > You see that it only performs purge and no prior flush. This is what > > needs fixing! If semihalf could go through the "arm/arm/cpufunc.c" file > > and fix those flush and invalidate functions then many people would > > become happy! > > My developement platform is sheeva. CPU functions for this platform are > implemented correctly. > > > Again, it is not a problem in USB firstly, it is a problem in "arm/xxx". > > You are suggesting that the problem is in ARM busdma (and in MIPS busdma, > as on this platform USB also does not work). Could you give me example of > platform _without_ hardware coherency where new USB stack simply works ? If I find time later today I will fix the cache sync operations for AT91RM9= 200=20 which has a built in OHCI, which has been working fine with my new USB stac= k. Can you please add some prints to the "bus_dmamap_sync_buf()" code on ARM, = and=20 verify which cases the CPU is entering, without your patch and with your=20 patch. From my analysis your patch will just change the execution path: With your patch it calls: cpu_dcache_wb_range((vm_offset_t)buf, len); cpu_l2cache_wb_range((vm_offset_t)buf, len); Without your patch it calls: cpu_dcache_wbinv_range((vm_offset_t)buf, len); cpu_l2cache_wbinv_range((vm_offset_t)buf, len); In my view these execution paths are equivalent with regard to clean or cac= he=20 flush. They should both perform a cache flush, which is what the USB code=20 wants to do, but obviously the latter case is not implemented correctly, mo= st=20 likely because it doesn't flush the buffer like expected! Try adding: cpu_dcache_wb_range((vm_offset_t)buf, len); cpu_l2cache_wb_range((vm_offset_t)buf, len); Before: cpu_dcache_wbinv_range((vm_offset_t)buf, len); cpu_l2cache_wbinv_range((vm_offset_t)buf, len); In: sys/arm/arm/busdma_machdep.c Yours, =2D-HPS
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200906301128.26046.hselasky>