Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Jun 2012 15:56:47 -0600
From:      Ian Lepore <freebsd@damnhippie.dyndns.org>
To:        Alexander Motin <mav@freebsd.org>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Cache write-back issue on Marvell SoC (SheevaPlug)
Message-ID:  <1340920607.1110.93.camel@revolution.hippie.lan>
In-Reply-To: <4FE2EDBA.1030505@FreeBSD.org>
References:  <4FE2EDBA.1030505@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2012-06-21 at 12:47 +0300, Alexander Motin wrote:
> Hi.
> 
> Trying to localize regular data corruption during writes (reads seems 
> not affected) to SATA disk on SheevaPlug box I've found out that it is 
> probably result of cache coherency issue. Reading data back shows that 
> each time exactly 32 sequential aligned data bytes are corrupted. That, 
> if I understand correctly, matches single cache line size/offset.
> 
> I've found out that such dirty hack with flushing all D-cache after 
> doing normal bus_dmamap_sync() fixes the situation:
> 
> --- mvs.c       (revision 237359)
> +++ mvs.c       (working copy)
> @@ -1307,6 +1312,10 @@ mvs_dmasetprd(void *arg, bus_dma_segment_t *segs,
>          bus_dmamap_sync(ch->dma.data_tag, slot->dma.data_map,
>              ((slot->ccb->ccb_h.flags & CAM_DIR_IN) ?
>              BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE));
> +#if defined(__arm__)
> +       if (slot->ccb->ccb_h.flags & CAM_DIR_OUT)
> +               cpu_dcache_wbinv_all();
> +#endif
>          if (ch->basic_dma)
>                  mvs_legacy_execute_transaction(slot);
>          else
> 
> Unluckily I have no idea in arm assembler and cache control interfaces. 
> Could somebody recheck existing D-cache range write-back code, because 
> there seems to be a problem?
> 

Since I'm pretty familiar with debugging arm's busdma code, I had a look
at this today.  Nothing is jumping out at me as wrong.  

It appears that the Marvell document describing the MMU commands for
Kirkwood chips is not publicly available (I guess you need a corporate
account or something to get it).  I checked the netbsd implementation
(essentially identical to freebsd), and linux (much simpler code,
apparently we've got room for improvement).  The linux code seems to be
structured to use two different cache flushing schemes, as if different
chip variations might have a different MMU feature set, but I couldn't
find any real information on that.

Have you noticed any pattern in the address of the corrupted blocks?
Especially, is it always the first or last cacheline of the buffer (or
SG segment), or always the first or last line within a page, or anything
like that?  Are there ever multiple corruptions within a single DMA
transfer?  Are the corruptions rare or frequent?  Does it only happen on
large or only on small transfers?

-- Ian




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1340920607.1110.93.camel>