Date: Fri, 4 Jan 2013 22:26:07 -0800 From: Oleksandr Tymoshenko <gonzo@bluezbox.com> To: arm@freebsd.org Subject: Re: Unsolved problem with WB caches on ARMv6 Message-ID: <37274001-B061-4CFF-AEDB-2EE37CC6D267@bluezbox.com> In-Reply-To: <37F24B3C-4600-4E57-96EB-98C91FCD2B72@bluezbox.com> References: <37F24B3C-4600-4E57-96EB-98C91FCD2B72@bluezbox.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_06D23550-DCA8-4DF4-AB99-F3A189064528 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 2012-12-09, at 10:24 PM, Oleksandr Tymoshenko <gonzo@bluezbox.com> = wrote: > Hello, >=20 > One of the long-time issues with FreeBSD/ARMv6 is that Write-Back = cache > mode does not work properly. On PandaBoard changing cache mode to WB = from WT=20 > causesUSB glitches (starting from stalls to network packets = corruption) and random=20 > memory corruptions that manifest themselves as a userland programs = crashes. >=20 > gber@ tracked down one of the bugs several month ago, but it's still = unusable > at least on my setup.=20 >=20 > I spent some time debugging through busdma and USB code but failed to = find > anything fishy. PandaBoard's USB host controller is EHCI. QH and QTDs = are=20 > flushed properly. Corruption pattern in packets is weird: it's not = cacheline-size > it's like chunk of data is just missing from bulk transfer DMA buffer. = L2 cache > is disabled.=20 >=20 > The issue is not reproducible in QEMU.=20 >=20 > Fix for arm/160431 applied to busdma-v6.c didn't help.=20 >=20 > I'm out of ideas for now. May be Ian or Alan will have some = suggestions where to look? Following up on this one. The cause for this issue was combination of = several bugs and bad practices: 1. USB subsystem in general and EHCI driver particularly didn't like = that there is no supper cor cache-coherent memory in busdma subsystem. 2. PL310 driver bugs 3. pmap bugs. Fixes for (1) and (2) have been committed recently and I believe I = finally tracked down all=20 bugs in pmap: - Missing PTE_SYNC in pmap_kremove caused severe memory corruption in = userland applications - Lack of cache flushes when using special PTEs for zeroing or copying = pages. If there are=20 dirty lines for destination memory and page later remapped as a = non-cached region actual content might be overwritten by these dirty lines when cache = eviction happens=20 as a result of applying cache eviction policy or because of wbinv_all = call.=20 - icache sync for new mapping for userland applications. Attached patch addresses these issues. Please review and test. If you see something like this: vm_thread_new: kstack allocation failed panic: kproc_create() failed with 12 KDB: enter: panic =20 apply this patch: = http://people.freebsd.org/~gonzo/patches/queue/arm-autotune-fix.diff Some bits of statistics I gathered while working on this issue.=20 As a test platform I used pandaboard ES with root mounted over NFS and = as a test - =20 buildkernel ran in loop with PANDABOARD as a config file. Average time = for building=20 kernel with L2 cache disabled is about 3 hours. With L2 cache enabled = and=20 write-through as a default mode: 1h10m. With L2 enabled and = writeback-allocate mode=20 as default: 22 minutes.=20 Performance gain on raspberry Pi was marginal though. I blame slow = network connection since it works over USB in PIO mode. When RPi will get faster USB/mmc = support=20 actual difference may be more substantial.=20 --Apple-Mail=_06D23550-DCA8-4DF4-AB99-F3A189064528 Content-Disposition: attachment; filename=pmap-wb-caches-fix.diff Content-Type: application/octet-stream; name="pmap-wb-caches-fix.diff" Content-Transfer-Encoding: 7bit Index: arm/include/pmap.h =================================================================== --- arm/include/pmap.h (revision 245047) +++ arm/include/pmap.h (working copy) @@ -61,7 +61,7 @@ #else #define PTE_NOCACHE 1 #endif -#define PTE_CACHE 4 +#define PTE_CACHE 6 #define PTE_DEVICE 2 #define PTE_PAGETABLE 4 #else Index: arm/arm/pmap-v6.c =================================================================== --- arm/arm/pmap-v6.c (revision 245047) +++ arm/arm/pmap-v6.c (working copy) @@ -193,6 +193,14 @@ #define PMAP_INLINE __inline #endif /* PMAP_DEBUG */ +#ifdef ARM_L2_PIPT +#define pmap_l2cache_wbinv_range(va, pa, size) cpu_l2cache_wbinv_range(pa, size) +#define pmap_l2cache_inv_range(va, pa, size) cpu_l2cache_inv_range(pa, size) +#else +#define pmap_l2cache_wbinv_range(va, pa, size) cpu_l2cache_wbinv_range(va, size) +#define pmap_l2cache_inv_range(va, pa, size) cpu_l2cache_inv_range(va, size) +#endif + extern struct pv_addr systempage; /* @@ -786,11 +794,7 @@ pte = *ptep; cpu_idcache_wbinv_range(va, PAGE_SIZE); -#ifdef ARM_L2_PIPT - cpu_l2cache_wbinv_range(pte & L2_S_FRAME, PAGE_SIZE); -#else - cpu_l2cache_wbinv_range(va, PAGE_SIZE); -#endif + pmap_l2cache_wbinv_range(va, pte & L2_S_FRAME, PAGE_SIZE); if ((pte & L2_S_CACHE_MASK) != pte_l2_s_cache_mode_pt) { /* * Page tables must have the cache-mode set to @@ -2121,6 +2125,7 @@ cpu_tlb_flushD_SE(va); cpu_cpwait(); *pte = 0; + PTE_SYNC(pte); } } @@ -2387,11 +2392,7 @@ pte = *ptep &~ L2_S_CACHE_MASK; cpu_idcache_wbinv_range(tmpva, PAGE_SIZE); -#ifdef ARM_L2_PIPT - cpu_l2cache_wbinv_range(pte & L2_S_FRAME, PAGE_SIZE); -#else - cpu_l2cache_wbinv_range(tmpva, PAGE_SIZE); -#endif + pmap_l2cache_wbinv_range(tmpva, pte & L2_S_FRAME, PAGE_SIZE); *ptep = pte; cpu_tlb_flushID_SE(tmpva); @@ -2754,6 +2755,9 @@ else if (PV_BEEN_REFD(oflags)) cpu_tlb_flushD_SE(va); } + + if ((pmap != pmap_kernel()) && (pmap == &curproc->p_vmspace->vm_pmap)) + cpu_icache_sync_range(va, PAGE_SIZE); } /* @@ -3197,6 +3201,16 @@ else bzero_page(cdstp); + /* + * Although aliasing is not possible if we use + * cdstp temporary mappings with memory that + * will be mapped later as non-cached or with write-through + * caches we might end up overwriting it when calling wbinv_all + * So make sure caches are clean after copy operation + */ + cpu_idcache_wbinv_range(cdstp, size); + pmap_l2cache_wbinv_range(cdstp, phys, size); + mtx_unlock(&cmtx); } @@ -3276,12 +3290,23 @@ *cdst_pte = L2_S_PROTO | dst | pte_l2_s_cache_mode; pmap_set_prot(cdst_pte, VM_PROT_READ | VM_PROT_WRITE, 0); PTE_SYNC(cdst_pte); + cpu_tlb_flushD_SE(csrcp); cpu_tlb_flushD_SE(cdstp); cpu_cpwait(); + /* + * Although aliasing is not possible if we use + * cdstp temporary mappings with memory that + * will be mapped later as non-cached or with write-through + * caches we might end up overwriting it when calling wbinv_all + * So make sure caches are clean after copy operation + */ bcopy_page(csrcp, cdstp); + cpu_idcache_wbinv_range(cdstp, PAGE_SIZE); + pmap_l2cache_wbinv_range(cdstp, dst, PAGE_SIZE); + mtx_unlock(&cmtx); } --Apple-Mail=_06D23550-DCA8-4DF4-AB99-F3A189064528--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?37274001-B061-4CFF-AEDB-2EE37CC6D267>