Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Jan 2015 19:48:57 -0700
From:      Ian Lepore <ian@freebsd.org>
To:        Peter Jeremy <peter@rulingia.com>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: read(2) into some addresses doesn't return data on RPi
Message-ID:  <1421030937.14601.153.camel@freebsd.org>
In-Reply-To: <20150110060412.GE77914@server.rulingia.com>
References:  <20150110060412.GE77914@server.rulingia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 2015-01-10 at 17:04 +1100, Peter Jeremy wrote:
> Trying to access the boot partition using mtools consistently fails on my
> RPi because the kernel is returning NULs for the first sector.  The second
> sector is correct.  If I use dd(2) then the expected data is returned.
> 
> This is running 11-current r276818 (but ISTR seeing it on older kernels).
> 
> I did some digging and found that read(2)s of the SD card device return
> successful but do not actually write anything to the buffer for some
> addresses (and they happen to contain all NULs in mtools).  This doesn't
> appear to affect reads of normal files.
> 
> Running the attached program on /dev/mmcsd0s1 gave me the following results:
> - There are no partial reads.  Either all 512 bytes are updated or none are.
> - There are two blocks of addresses 0xbfff0e00 thru 0xbfff0e00 and 0xbfff2e00
>   thru 0xbfff2e00 where reads work on a 32-byte alignment but not otherwise.
> - Reads consistently fail between 0xbfff1e08 and 0xbfff1ff8
> - Reads consistently fail between 0xbfff3e08 and 0xbfff3f?? (I got a hang).
> - The program never completes.  In 3 runs, I've gotten:
>   - panic: null_fetch_syscall_args
>   - kernel hang
>   - panic: malloc: bad malloc type magic
>   I don't have a serial console and so can't debug kernel panics.
> 
> Putting that together, it seems to related to accesses that aren't cache-line
> aligned and cross page boundaries but I'm not sure why it behaves differently
> at different page boundaries.  The hangs/panics suggest that it's writing to
> random other kernel addresses instead.
> 
> Does this ring a bell for anyone?
> 

This turned out to be two problems, both fixed now as of r277038.

The first problem was that the driver wasn't able to handle a dma that
was split across two physically discontiguous pages, and when an IO
isn't aligned to a cacheline the arm busdma logic that auto-bounces it
inherently ends up setting up a split buffer.  Since the dma tag
required a single buffer, the mapping operation would fail with EFBIG.

The second problem was that the rpi sdhci driver was completely ignoring
the status of the busdma mapping calls, so after a failed mapping it
would do the dma anyway, using who-knows-what for a dma address, leading
to later panics or crashes due to corrupted memory.

So first I made it handle errors better, then I made it able to handle
an IO that crosses page boundaries.

I couldn't have done any of it without that program that recreated the
failure and confirmed the fix, thanks Peter!

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1421030937.14601.153.camel>