Date: Sun, 11 Jan 2015 19:48:57 -0700 From: Ian Lepore <ian@freebsd.org> To: Peter Jeremy <peter@rulingia.com> Cc: freebsd-arm@freebsd.org Subject: Re: read(2) into some addresses doesn't return data on RPi Message-ID: <1421030937.14601.153.camel@freebsd.org> In-Reply-To: <20150110060412.GE77914@server.rulingia.com> References: <20150110060412.GE77914@server.rulingia.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 2015-01-10 at 17:04 +1100, Peter Jeremy wrote: > Trying to access the boot partition using mtools consistently fails on my > RPi because the kernel is returning NULs for the first sector. The second > sector is correct. If I use dd(2) then the expected data is returned. > > This is running 11-current r276818 (but ISTR seeing it on older kernels). > > I did some digging and found that read(2)s of the SD card device return > successful but do not actually write anything to the buffer for some > addresses (and they happen to contain all NULs in mtools). This doesn't > appear to affect reads of normal files. > > Running the attached program on /dev/mmcsd0s1 gave me the following results: > - There are no partial reads. Either all 512 bytes are updated or none are. > - There are two blocks of addresses 0xbfff0e00 thru 0xbfff0e00 and 0xbfff2e00 > thru 0xbfff2e00 where reads work on a 32-byte alignment but not otherwise. > - Reads consistently fail between 0xbfff1e08 and 0xbfff1ff8 > - Reads consistently fail between 0xbfff3e08 and 0xbfff3f?? (I got a hang). > - The program never completes. In 3 runs, I've gotten: > - panic: null_fetch_syscall_args > - kernel hang > - panic: malloc: bad malloc type magic > I don't have a serial console and so can't debug kernel panics. > > Putting that together, it seems to related to accesses that aren't cache-line > aligned and cross page boundaries but I'm not sure why it behaves differently > at different page boundaries. The hangs/panics suggest that it's writing to > random other kernel addresses instead. > > Does this ring a bell for anyone? > This turned out to be two problems, both fixed now as of r277038. The first problem was that the driver wasn't able to handle a dma that was split across two physically discontiguous pages, and when an IO isn't aligned to a cacheline the arm busdma logic that auto-bounces it inherently ends up setting up a split buffer. Since the dma tag required a single buffer, the mapping operation would fail with EFBIG. The second problem was that the rpi sdhci driver was completely ignoring the status of the busdma mapping calls, so after a failed mapping it would do the dma anyway, using who-knows-what for a dma address, leading to later panics or crashes due to corrupted memory. So first I made it handle errors better, then I made it able to handle an IO that crosses page boundaries. I couldn't have done any of it without that program that recreated the failure and confirmed the fix, thanks Peter! -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1421030937.14601.153.camel>