Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Jun 2012 01:17:15 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Ian Lepore <freebsd@damnhippie.dyndns.org>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Cache write-back issue on Marvell SoC (SheevaPlug)
Message-ID:  <4FECD7EB.4030101@FreeBSD.org>
In-Reply-To: <1340920607.1110.93.camel@revolution.hippie.lan>
References:  <4FE2EDBA.1030505@FreeBSD.org> <1340920607.1110.93.camel@revolution.hippie.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 29.06.2012 00:56, Ian Lepore wrote:
> On Thu, 2012-06-21 at 12:47 +0300, Alexander Motin wrote:
>> Hi.
>>
>> Trying to localize regular data corruption during writes (reads seems
>> not affected) to SATA disk on SheevaPlug box I've found out that it is
>> probably result of cache coherency issue. Reading data back shows that
>> each time exactly 32 sequential aligned data bytes are corrupted. That,
>> if I understand correctly, matches single cache line size/offset.
>>
>> I've found out that such dirty hack with flushing all D-cache after
>> doing normal bus_dmamap_sync() fixes the situation:
>>
>> --- mvs.c       (revision 237359)
>> +++ mvs.c       (working copy)
>> @@ -1307,6 +1312,10 @@ mvs_dmasetprd(void *arg, bus_dma_segment_t *segs,
>>           bus_dmamap_sync(ch->dma.data_tag, slot->dma.data_map,
>>               ((slot->ccb->ccb_h.flags & CAM_DIR_IN) ?
>>               BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE));
>> +#if defined(__arm__)
>> +       if (slot->ccb->ccb_h.flags & CAM_DIR_OUT)
>> +               cpu_dcache_wbinv_all();
>> +#endif
>>           if (ch->basic_dma)
>>                   mvs_legacy_execute_transaction(slot);
>>           else
>>
>> Unluckily I have no idea in arm assembler and cache control interfaces.
>> Could somebody recheck existing D-cache range write-back code, because
>> there seems to be a problem?
>>
>
> Since I'm pretty familiar with debugging arm's busdma code, I had a look
> at this today.  Nothing is jumping out at me as wrong.
>
> It appears that the Marvell document describing the MMU commands for
> Kirkwood chips is not publicly available (I guess you need a corporate
> account or something to get it).  I checked the netbsd implementation
> (essentially identical to freebsd), and linux (much simpler code,
> apparently we've got room for improvement).  The linux code seems to be
> structured to use two different cache flushing schemes, as if different
> chip variations might have a different MMU feature set, but I couldn't
> find any real information on that.
>
> Have you noticed any pattern in the address of the corrupted blocks?
> Especially, is it always the first or last cacheline of the buffer (or
> SG segment), or always the first or last line within a page, or anything
> like that?  Are there ever multiple corruptions within a single DMA
> transfer?  Are the corruptions rare or frequent?  Does it only happen on
> large or only on small transfers?

I've seen about half dozen corrupted lines per gigabyte of transferred 
data. I was unable to see any pattern there. I have no information how 
corruptions correlate with DMA, but I've seen corrupted lines in 
different parts of sectors and also several non-consecutive corruptions 
in one sector, so unlikely they can be first or last lines of DMA 
transactions. I've experimented with MAXPHYS of 128K and 512K and found 
no correlation with these I/O sizes, though I haven't tested with much 
shorter I/O. Also I've tried all combinations of command queuing with 
different number of simultaneous I/Os and with one request at a time and 
also found no correlation.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FECD7EB.4030101>