Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Jan 2013 22:26:07 -0800
From:      Oleksandr Tymoshenko <gonzo@bluezbox.com>
To:        arm@freebsd.org
Subject:   Re: Unsolved problem with WB caches on ARMv6
Message-ID:  <37274001-B061-4CFF-AEDB-2EE37CC6D267@bluezbox.com>
In-Reply-To: <37F24B3C-4600-4E57-96EB-98C91FCD2B72@bluezbox.com>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]

On 2012-12-09, at 10:24 PM, Oleksandr Tymoshenko <gonzo@bluezbox.com> wrote:

> Hello,
> 
> One of the long-time issues with FreeBSD/ARMv6 is that Write-Back cache
> mode does not work properly. On PandaBoard changing cache mode to WB from WT 
> causesUSB glitches (starting from stalls  to network packets corruption) and random 
> memory corruptions that manifest themselves as a userland programs crashes.
> 
> gber@ tracked down one of the bugs several month ago, but it's still unusable
> at least on my setup. 
> 
> I spent some time debugging through busdma and USB code but failed to find
> anything fishy. PandaBoard's USB host controller is EHCI. QH and QTDs are 
> flushed properly. Corruption pattern in packets is weird: it's not cacheline-size
> it's like chunk of data is just missing from bulk transfer DMA buffer. L2 cache
> is disabled. 
> 
> The issue is not reproducible in QEMU. 
> 
> Fix for arm/160431 applied to busdma-v6.c didn't help. 
> 
> I'm out of ideas for now. May be Ian or Alan will have some suggestions where to look?

Following up on this one. The cause for this issue was combination of several bugs and
bad practices:

1. USB subsystem  in general  and EHCI driver particularly didn't like that there is no supper
    cor cache-coherent memory in busdma subsystem.
2. PL310 driver bugs
3. pmap bugs.

Fixes for (1) and (2) have been committed recently and I believe I finally tracked down all 
bugs in pmap:

- Missing PTE_SYNC in pmap_kremove caused severe memory corruption in userland
    applications
- Lack of cache flushes when using special PTEs for zeroing or copying pages. If there are 
    dirty lines for destination memory and page later remapped as a non-cached region
    actual content might be overwritten by these dirty lines when cache eviction happens 
   as a result of applying cache eviction policy or because of wbinv_all call. 
- icache sync for new mapping for userland applications.

Attached patch addresses these issues. Please review and test.
If you see something like this:

vm_thread_new: kstack allocation failed
panic: kproc_create() failed with 12
KDB: enter: panic
 
apply this patch: http://people.freebsd.org/~gonzo/patches/queue/arm-autotune-fix.diff

Some bits of statistics I gathered while working on this issue. 

As a test platform I used pandaboard ES with root mounted over NFS and as a test -  
buildkernel ran in loop with PANDABOARD as a config file. Average time for building 
kernel with L2 cache disabled is about 3 hours. With L2 cache enabled and 
write-through as a default mode: 1h10m. With L2 enabled and writeback-allocate mode 
as default: 22 minutes. 

Performance  gain on raspberry Pi was marginal though. I blame slow network connection
since it works over USB in PIO mode. When RPi will get faster USB/mmc support 
actual difference may be  more substantial. 



[-- Attachment #2 --]
Index: arm/include/pmap.h
===================================================================
--- arm/include/pmap.h	(revision 245047)
+++ arm/include/pmap.h	(working copy)
@@ -61,7 +61,7 @@
 #else
 #define PTE_NOCACHE	1
 #endif
-#define PTE_CACHE	4
+#define PTE_CACHE	6
 #define PTE_DEVICE	2
 #define PTE_PAGETABLE	4
 #else
Index: arm/arm/pmap-v6.c
===================================================================
--- arm/arm/pmap-v6.c	(revision 245047)
+++ arm/arm/pmap-v6.c	(working copy)
@@ -193,6 +193,14 @@
 #define PMAP_INLINE __inline
 #endif  /* PMAP_DEBUG */
 
+#ifdef ARM_L2_PIPT
+#define pmap_l2cache_wbinv_range(va, pa, size) cpu_l2cache_wbinv_range(pa, size)
+#define pmap_l2cache_inv_range(va, pa, size) cpu_l2cache_inv_range(pa, size)
+#else
+#define pmap_l2cache_wbinv_range(va, pa, size) cpu_l2cache_wbinv_range(va, size)
+#define pmap_l2cache_inv_range(va, pa, size) cpu_l2cache_inv_range(va, size)
+#endif
+
 extern struct pv_addr systempage;
 
 /*
@@ -786,11 +794,7 @@
 	pte = *ptep;
 
 	cpu_idcache_wbinv_range(va, PAGE_SIZE);
-#ifdef ARM_L2_PIPT
-	cpu_l2cache_wbinv_range(pte & L2_S_FRAME, PAGE_SIZE);
-#else
-	cpu_l2cache_wbinv_range(va, PAGE_SIZE);
-#endif
+	pmap_l2cache_wbinv_range(va, pte & L2_S_FRAME, PAGE_SIZE);
 	if ((pte & L2_S_CACHE_MASK) != pte_l2_s_cache_mode_pt) {
 		/*
 		 * Page tables must have the cache-mode set to
@@ -2121,6 +2125,7 @@
 		cpu_tlb_flushD_SE(va);
 		cpu_cpwait();
 		*pte = 0;
+		PTE_SYNC(pte);
 	}
 }
 
@@ -2387,11 +2392,7 @@
 
 		pte = *ptep &~ L2_S_CACHE_MASK;
 		cpu_idcache_wbinv_range(tmpva, PAGE_SIZE);
-#ifdef ARM_L2_PIPT
-		cpu_l2cache_wbinv_range(pte & L2_S_FRAME, PAGE_SIZE);
-#else
-		cpu_l2cache_wbinv_range(tmpva, PAGE_SIZE);
-#endif
+		pmap_l2cache_wbinv_range(tmpva, pte & L2_S_FRAME, PAGE_SIZE);
 		*ptep = pte;
 		cpu_tlb_flushID_SE(tmpva);
 
@@ -2754,6 +2755,9 @@
 		else if (PV_BEEN_REFD(oflags))
 			cpu_tlb_flushD_SE(va);
 	}
+
+	if ((pmap != pmap_kernel()) && (pmap == &curproc->p_vmspace->vm_pmap))
+		cpu_icache_sync_range(va, PAGE_SIZE);
 }
 
 /*
@@ -3197,6 +3201,16 @@
 	else
 		bzero_page(cdstp);
 
+	/*
+	 * Although aliasing is not possible if we use 
+	 * cdstp temporary mappings with memory that 
+	 * will be mapped later as non-cached or with write-through 
+	 * caches we might end up overwriting it when calling wbinv_all
+	 * So make sure caches are clean after copy operation
+	 */
+	cpu_idcache_wbinv_range(cdstp, size);
+	pmap_l2cache_wbinv_range(cdstp, phys, size);
+
 	mtx_unlock(&cmtx);
 }
 
@@ -3276,12 +3290,23 @@
 	*cdst_pte = L2_S_PROTO | dst | pte_l2_s_cache_mode;
 	pmap_set_prot(cdst_pte, VM_PROT_READ | VM_PROT_WRITE, 0);
 	PTE_SYNC(cdst_pte);
+
 	cpu_tlb_flushD_SE(csrcp);
 	cpu_tlb_flushD_SE(cdstp);
 	cpu_cpwait();
 
+	/*
+	 * Although aliasing is not possible if we use 
+	 * cdstp temporary mappings with memory that 
+	 * will be mapped later as non-cached or with write-through 
+	 * caches we might end up overwriting it when calling wbinv_all
+	 * So make sure caches are clean after copy operation
+	 */
 	bcopy_page(csrcp, cdstp);
 
+	cpu_idcache_wbinv_range(cdstp, PAGE_SIZE);
+	pmap_l2cache_wbinv_range(cdstp, dst, PAGE_SIZE);
+
 	mtx_unlock(&cmtx);
 }
 
home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?37274001-B061-4CFF-AEDB-2EE37CC6D267>