Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Jan 2013 22:26:07 -0800
From:      Oleksandr Tymoshenko <gonzo@bluezbox.com>
To:        arm@freebsd.org
Subject:   Re: Unsolved problem with WB caches on ARMv6
Message-ID:  <37274001-B061-4CFF-AEDB-2EE37CC6D267@bluezbox.com>
In-Reply-To: <37F24B3C-4600-4E57-96EB-98C91FCD2B72@bluezbox.com>
References:  <37F24B3C-4600-4E57-96EB-98C91FCD2B72@bluezbox.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_06D23550-DCA8-4DF4-AB99-F3A189064528
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii


On 2012-12-09, at 10:24 PM, Oleksandr Tymoshenko <gonzo@bluezbox.com> =
wrote:

> Hello,
>=20
> One of the long-time issues with FreeBSD/ARMv6 is that Write-Back =
cache
> mode does not work properly. On PandaBoard changing cache mode to WB =
from WT=20
> causesUSB glitches (starting from stalls  to network packets =
corruption) and random=20
> memory corruptions that manifest themselves as a userland programs =
crashes.
>=20
> gber@ tracked down one of the bugs several month ago, but it's still =
unusable
> at least on my setup.=20
>=20
> I spent some time debugging through busdma and USB code but failed to =
find
> anything fishy. PandaBoard's USB host controller is EHCI. QH and QTDs =
are=20
> flushed properly. Corruption pattern in packets is weird: it's not =
cacheline-size
> it's like chunk of data is just missing from bulk transfer DMA buffer. =
L2 cache
> is disabled.=20
>=20
> The issue is not reproducible in QEMU.=20
>=20
> Fix for arm/160431 applied to busdma-v6.c didn't help.=20
>=20
> I'm out of ideas for now. May be Ian or Alan will have some =
suggestions where to look?

Following up on this one. The cause for this issue was combination of =
several bugs and
bad practices:

1. USB subsystem  in general  and EHCI driver particularly didn't like =
that there is no supper
    cor cache-coherent memory in busdma subsystem.
2. PL310 driver bugs
3. pmap bugs.

Fixes for (1) and (2) have been committed recently and I believe I =
finally tracked down all=20
bugs in pmap:

- Missing PTE_SYNC in pmap_kremove caused severe memory corruption in =
userland
    applications
- Lack of cache flushes when using special PTEs for zeroing or copying =
pages. If there are=20
    dirty lines for destination memory and page later remapped as a =
non-cached region
    actual content might be overwritten by these dirty lines when cache =
eviction happens=20
   as a result of applying cache eviction policy or because of wbinv_all =
call.=20
- icache sync for new mapping for userland applications.

Attached patch addresses these issues. Please review and test.
If you see something like this:

vm_thread_new: kstack allocation failed
panic: kproc_create() failed with 12
KDB: enter: panic
=20
apply this patch: =
http://people.freebsd.org/~gonzo/patches/queue/arm-autotune-fix.diff

Some bits of statistics I gathered while working on this issue.=20

As a test platform I used pandaboard ES with root mounted over NFS and =
as a test - =20
buildkernel ran in loop with PANDABOARD as a config file. Average time =
for building=20
kernel with L2 cache disabled is about 3 hours. With L2 cache enabled =
and=20
write-through as a default mode: 1h10m. With L2 enabled and =
writeback-allocate mode=20
as default: 22 minutes.=20

Performance  gain on raspberry Pi was marginal though. I blame slow =
network connection
since it works over USB in PIO mode. When RPi will get faster USB/mmc =
support=20
actual difference may be  more substantial.=20



--Apple-Mail=_06D23550-DCA8-4DF4-AB99-F3A189064528
Content-Disposition: attachment;
	filename=pmap-wb-caches-fix.diff
Content-Type: application/octet-stream;
	name="pmap-wb-caches-fix.diff"
Content-Transfer-Encoding: 7bit

Index: arm/include/pmap.h
===================================================================
--- arm/include/pmap.h	(revision 245047)
+++ arm/include/pmap.h	(working copy)
@@ -61,7 +61,7 @@
 #else
 #define PTE_NOCACHE	1
 #endif
-#define PTE_CACHE	4
+#define PTE_CACHE	6
 #define PTE_DEVICE	2
 #define PTE_PAGETABLE	4
 #else
Index: arm/arm/pmap-v6.c
===================================================================
--- arm/arm/pmap-v6.c	(revision 245047)
+++ arm/arm/pmap-v6.c	(working copy)
@@ -193,6 +193,14 @@
 #define PMAP_INLINE __inline
 #endif  /* PMAP_DEBUG */
 
+#ifdef ARM_L2_PIPT
+#define pmap_l2cache_wbinv_range(va, pa, size) cpu_l2cache_wbinv_range(pa, size)
+#define pmap_l2cache_inv_range(va, pa, size) cpu_l2cache_inv_range(pa, size)
+#else
+#define pmap_l2cache_wbinv_range(va, pa, size) cpu_l2cache_wbinv_range(va, size)
+#define pmap_l2cache_inv_range(va, pa, size) cpu_l2cache_inv_range(va, size)
+#endif
+
 extern struct pv_addr systempage;
 
 /*
@@ -786,11 +794,7 @@
 	pte = *ptep;
 
 	cpu_idcache_wbinv_range(va, PAGE_SIZE);
-#ifdef ARM_L2_PIPT
-	cpu_l2cache_wbinv_range(pte & L2_S_FRAME, PAGE_SIZE);
-#else
-	cpu_l2cache_wbinv_range(va, PAGE_SIZE);
-#endif
+	pmap_l2cache_wbinv_range(va, pte & L2_S_FRAME, PAGE_SIZE);
 	if ((pte & L2_S_CACHE_MASK) != pte_l2_s_cache_mode_pt) {
 		/*
 		 * Page tables must have the cache-mode set to
@@ -2121,6 +2125,7 @@
 		cpu_tlb_flushD_SE(va);
 		cpu_cpwait();
 		*pte = 0;
+		PTE_SYNC(pte);
 	}
 }
 
@@ -2387,11 +2392,7 @@
 
 		pte = *ptep &~ L2_S_CACHE_MASK;
 		cpu_idcache_wbinv_range(tmpva, PAGE_SIZE);
-#ifdef ARM_L2_PIPT
-		cpu_l2cache_wbinv_range(pte & L2_S_FRAME, PAGE_SIZE);
-#else
-		cpu_l2cache_wbinv_range(tmpva, PAGE_SIZE);
-#endif
+		pmap_l2cache_wbinv_range(tmpva, pte & L2_S_FRAME, PAGE_SIZE);
 		*ptep = pte;
 		cpu_tlb_flushID_SE(tmpva);
 
@@ -2754,6 +2755,9 @@
 		else if (PV_BEEN_REFD(oflags))
 			cpu_tlb_flushD_SE(va);
 	}
+
+	if ((pmap != pmap_kernel()) && (pmap == &curproc->p_vmspace->vm_pmap))
+		cpu_icache_sync_range(va, PAGE_SIZE);
 }
 
 /*
@@ -3197,6 +3201,16 @@
 	else
 		bzero_page(cdstp);
 
+	/*
+	 * Although aliasing is not possible if we use 
+	 * cdstp temporary mappings with memory that 
+	 * will be mapped later as non-cached or with write-through 
+	 * caches we might end up overwriting it when calling wbinv_all
+	 * So make sure caches are clean after copy operation
+	 */
+	cpu_idcache_wbinv_range(cdstp, size);
+	pmap_l2cache_wbinv_range(cdstp, phys, size);
+
 	mtx_unlock(&cmtx);
 }
 
@@ -3276,12 +3290,23 @@
 	*cdst_pte = L2_S_PROTO | dst | pte_l2_s_cache_mode;
 	pmap_set_prot(cdst_pte, VM_PROT_READ | VM_PROT_WRITE, 0);
 	PTE_SYNC(cdst_pte);
+
 	cpu_tlb_flushD_SE(csrcp);
 	cpu_tlb_flushD_SE(cdstp);
 	cpu_cpwait();
 
+	/*
+	 * Although aliasing is not possible if we use 
+	 * cdstp temporary mappings with memory that 
+	 * will be mapped later as non-cached or with write-through 
+	 * caches we might end up overwriting it when calling wbinv_all
+	 * So make sure caches are clean after copy operation
+	 */
 	bcopy_page(csrcp, cdstp);
 
+	cpu_idcache_wbinv_range(cdstp, PAGE_SIZE);
+	pmap_l2cache_wbinv_range(cdstp, dst, PAGE_SIZE);
+
 	mtx_unlock(&cmtx);
 }
 

--Apple-Mail=_06D23550-DCA8-4DF4-AB99-F3A189064528--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?37274001-B061-4CFF-AEDB-2EE37CC6D267>