Date: Sat, 3 Sep 2011 11:19:17 -0600 (MDT) From: Ian Lepore <freebsd@damnhippie.dyndns.org> To: FreeBSD-gnats-submit@FreeBSD.org Subject: arm/160431: [patch] Disable interrupts during busdma cache sync operations. Message-ID: <201109031719.p83HJHHO098751@revolution.hippie.lan> Resent-Message-ID: <201109031720.p83HKCZl076304@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 160431 >Category: arm >Synopsis: [patch] Disable interrupts during busdma cache sync operations. >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-arm >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Sep 03 17:20:12 UTC 2011 >Closed-Date: >Last-Modified: >Originator: Ian Lepore <freebsd@damnhippie.dyndns.org> >Release: FreeBSD 8.2-RC3 arm >Organization: none >Environment: FreeBSD dvb 8.2-RC3 FreeBSD 8.2-RC3 #49: Tue Feb 15 22:52:14 UTC 2011 root@revolution.hippie.lan:/usr/obj/arm/usr/src/sys/DVB arm >Description: Data can be corrupted when an interrupt occurs while busdma_sync_buf() is handling a buffer that partially overlaps a cache line. One scenario, seen in the real world, was a small IO buffer allocated in the same cache line as an instance of a struct intr_handler. The dma sync code copied the non-DMA data (the intr_handler struct) to a temporary buffer prior to the cache sync, then an interrupt occurs that results in setting the it_need flag in the struct. When control returns to the dma sync code it finishes by copying the saved partial cache line from the temporary buffer back to the intr_handler struct, restoring the it_need flag to zero, and resulting in a threaded interrupt handler not running as needed. Similar sequences can be imagined that would lead to corruption of either the DMA'd data or non-DMA data sharing the same cache line, depending on the timing of the interrupt, and I can't quite convince myself that the problem only occurs in this partial-cacheline-overlap scenario. For example, what happens if execution is in the middle of a cpu_dcache_wbinv_range() operation and an interrupt leads to a context switch wherein the pmap code decides to call cpu_dcache_inv_range()? So to be conservatively safe, this patch disables interrupts for the entire duration bus_dmamap_sync_buf(), not just when partial cache lines are being handled. >How-To-Repeat: It would be very difficult to set up a repeatable test of this condition. We were lucky (!) enough to have it happen repeatably enough to diagnose. >Fix: Problem was discovered in an 8.2 environment, but this diff is to -current. --- diff.tmp begins here --- --- busdma_machdep.c.orig 2010-03-11 14:16:54.000000000 -0700 +++ busdma_machdep.c 2011-09-03 10:15:16.000000000 -0600 @@ -1091,6 +1091,14 @@ static void bus_dmamap_sync_buf(void *buf, int len, bus_dmasync_op_t op) { char _tmp_cl[arm_dcache_align], _tmp_clend[arm_dcache_align]; + uint32_t intr; + + /* Interrupts MUST be disabled when handling partial cacheline flush + * and most likely should be disabled for all flushes. (I know for + * certain interrupts can cause failures on partial flushes, and suspect + * problems could also happen in other scenarios.) + */ + intr = intr_disable(); if ((op & BUS_DMASYNC_PREWRITE) && !(op & BUS_DMASYNC_PREREAD)) { cpu_dcache_wb_range((vm_offset_t)buf, len); @@ -1129,6 +1137,8 @@ bus_dmamap_sync_buf(void *buf, int len, arm_dcache_align - (((vm_offset_t)(buf) + len) & arm_dcache_align_mask)); } + + intr_restore(intr); } static void --- diff.tmp ends here --- >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201109031719.p83HJHHO098751>