Date: Tue, 06 Aug 2013 21:57:12 -0600 From: Ian Lepore <ian@FreeBSD.org> To: freebsd-arm <freebsd-arm@FreeBSD.org> Subject: Strange crash on wandboard Message-ID: <1375847832.3320.97.camel@revolution.hippie.lan>
next in thread | raw e-mail | index | archive | help
Okay, this is so strange I've just got to share it... I've been having trouble with wandboard (solo) bringup and have tracked the problem down to returning from the first interrupt that happens. (It's a clock interrupt, but I don't think that's really germane.) It's as if PULLFRAMEFROMSVCANDEXIT wasn't restoring the registers correctly. At first the corruption hit the PC, which is damn hard to debug. But after figuring out just where it was happening in the code (spinlock_exit()) and inserting some extra debugging printfs, things changed a bit and now a different register is getting blasted. Here's what I get at runtime: clock intr exit returned: intr_event_handle vm_fault(0xc0cca000, e46ab000, 1, 0) -> 1 Fatal kernel mode data abort: 'Translation Fault (S)' trapframe: 0xdd3ffe24 FSR=00000005, FAR=e46abdc0, spsr=600de613 r0 =600001d3, r1 =60000113, r2 =000000c0, r3 =e46abdc0 r4 =c271f620, r5 =c271cbf0, r6 =00000000, r7 =dd3ffea8 r8 =c08d08f4, r9 =00000000, r10=00000000, r11=dd3ffe80 r12=dd3ffe70, ssp=dd3ffe70, slr=c0af2bb4, pc =c0af2be8 [ thread pid 12 tid 100006 ] Stopped at spinlock_exit+0x5c: ldr r1, [r3] db> Here's the asm code around the fault point: c0af2bd4: e10f0000 mrs r0, CPSR c0af2bd8: e1c01002 bic r1, r0, r2 c0af2bdc: e0211003 eor r1, r1, r3 c0af2be0: e121f001 msr CPSR_c, r1 c0af2be4: e59f3024 ldr r3, [pc, #36] ; c0af2c10 c0af2be8: e5931000 ldr r1, [r3] c0af2bec: e3510000 cmp r1, #0 ; 0x0 .... c0af2c10: c0bd6ae4 adcgts r6, sp, r4, ror #21 c0af2c14: c0b4e0e8 adcgts lr, r4, r8, ror #1 Okay, so the msr instruction re-enables interrupts, and the next one loads r3 with constant value 0xc0bd6ae4, then an interrupt happens (other instrumentation in PULLFRAMEFROMSVCANDEXIT on previous runs shows that this is the case every time, 100% reproducible, but that instrumentation destroys registers it shouldn't, so it's not present in the run shown above). So the interrupt happens then control returns to the instruction at c0af2be8, which faults. Now here's the strange part. Look at the fault-time r3 contents. It's the byte-reverse of the value it should have. It's been restored wrong-endian. Just one register from the whole set restored with a single "ldmia sp, {r0-r14}^" instruction. I don't know what to make of it. It seems like a hardware error of some sort. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1375847832.3320.97.camel>