Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Jun 2009 11:28:23 +0200
From:      Hans Petter Selasky <hselasky@c2i.net>
To:        Piotr =?iso-8859-2?q?Zi=EAcik?= <kosmo@semihalf.com>
Cc:        Rafal Jaworowski <raj@semihalf.com>, freebsd-arm@freebsd.org, thompsa@freebsd.org, freebsd-usb@freebsd.org
Subject:   Re: CPU Cache and busdma usage in USB
Message-ID:  <200906301128.26046.hselasky@c2i.net>
In-Reply-To: <200906301037.49367.kosmo@semihalf.com>
References:  <200906231035.43096.kosmo@semihalf.com> <200906291516.58725.hselasky@c2i.net> <200906301037.49367.kosmo@semihalf.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 30 June 2009 10:37:48 Piotr Zi=EAcik wrote:
> Monday 29 June 2009 15:16:56 Hans Petter Selasky napisa=B3(a):
> > You want to change the flags passed to bus_dmamap_sync() so that the
> > flush/invalidate mapping gets right. I understand why your patch makes =
it
> > work. That's not the problem.
> >
> > In "src/sys/arm/arm/busdma_machdep.c" there is a function called
> > "_bus_dmamap_sync_bp()". If you look at that function you see that it
> > only triggers on the "BUS_DMASYNC_PREWRITE" and "BUS_DMASYNC_POSTREAD"
> > flags. After your patching only the PREXXXX flags are used, so if bouce
> > pages are used on ARM and x86 and amd64 +++, then only
> > BUS_DMASYNC_PREWRITE will do anything. This indicates that your patch is
> > not fully correct.
>

Hi,

> That is true. I have missed "bounce page" case. I can change flags passed
> to bus_dmamap_sync() to fix this on ARM, but this will break MIPS.

Right, so there is inconsistency among the platforms in how the BUSDMA is=20
implemented, which should be fixed.

> This clearly shows the problem - using side effect of busdma to manage CPU
> cache. Current USB implementation relies of this side effect assuming that
> bus_dmamap_sync() with given flags will do cpu cache flush/invalidation.
> This is not true even on i386 !

i386 handles this differently.

> This thread is not about my patch. It is only a fast and dirty hack
> making USB working on platforms without hardware cache coherency
> and showing the problem. I see two proper solutions:

Ok.

>
> 1. Implement usb_pc_cpu_invalidate() / usb_pc_cpu_flush() with
> cpu_*() functions realizing requested operation.
>
> 2. Using busdma in proper way:
>         [... prepare data to transfer ...]
>         bus_dmamap_sync(..., PREREAD | PREWRITE);
>         [... do transfer ...]
>         bus_dmamap_sync(..., POSTREAD | POSTWRITE);
>         [... check results ...]
> as manpage says:
> 	<cite>
> 	If read and write operations are not preceded and followed by the
> 	appropriate synchronization operations, behavior is undefined.
> 	</cite>
> Requirement cited above is currently not met in USB stack. Funtion
> shown in http://fxr.watson.org/fxr/source/dev/usb/controller/ehci.c#L1294
> is just an example of improper usage of bus_dmamap_sync(), which is hidden
> under usb_pc_cpu_invalidate().

I'm not violating any rules in busdma. The busdma manual is silent about do=
ing=20
multiple sync operations of the same kind in a row. If the sheeva hardware =
you=20
are using works differently, you have to fix it.

>
> This thread started from my question about general usage of usb_pc_cpu_*()
> functions. So I am asking once again - why these functions relies on side
> effect of busdma instead of simply doing cache flush/invalidation through
> cpu_*() functions ?

Because busdma is the interface for doing these kinds of operations and not=
=20
cpu_xxx(). The DMA memory is allocated using busdma and loaded using busdma=
,=20
and cpu_xxx() is not a part of the busdma interface.

>
> > Grepping through the source code for ARM, I found a line like this:
> > (...)
> > You see that it only performs purge and no prior flush. This is what
> > needs fixing! If semihalf could go through the "arm/arm/cpufunc.c" file
> > and fix those flush and invalidate functions then many people would
> > become happy!
>
> My developement platform is sheeva. CPU functions for this platform are
> implemented correctly.
>
> > Again, it is not a problem in USB firstly, it is a problem in "arm/xxx".
>
> You are suggesting that the problem is in ARM busdma (and in MIPS busdma,
> as on this platform USB also does not work). Could you give me example of
> platform _without_ hardware coherency where new USB stack simply works ?

If I find time later today I will fix the cache sync operations for AT91RM9=
200=20
which has a built in OHCI, which has been working fine with my new USB stac=
k.

Can you please add some prints to the "bus_dmamap_sync_buf()" code on ARM, =
and=20
verify which cases the CPU is entering, without your patch and with your=20
patch. From my analysis your patch will just change the execution path:

With your patch it calls:

                cpu_dcache_wb_range((vm_offset_t)buf, len);
                cpu_l2cache_wb_range((vm_offset_t)buf, len);

Without your patch it calls:

                        cpu_dcache_wbinv_range((vm_offset_t)buf, len);
                        cpu_l2cache_wbinv_range((vm_offset_t)buf, len);

In my view these execution paths are equivalent with regard to clean or cac=
he=20
flush. They should both perform a cache flush, which is what the USB code=20
wants to do, but obviously the latter case is not implemented correctly, mo=
st=20
likely because it doesn't flush the buffer like expected!

Try adding:

                cpu_dcache_wb_range((vm_offset_t)buf, len);
                cpu_l2cache_wb_range((vm_offset_t)buf, len);

Before:

                        cpu_dcache_wbinv_range((vm_offset_t)buf, len);
                        cpu_l2cache_wbinv_range((vm_offset_t)buf, len);

In: sys/arm/arm/busdma_machdep.c

Yours,
=2D-HPS




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200906301128.26046.hselasky>