From owner-freebsd-arm@FreeBSD.ORG Wed Aug 7 03:57:22 2013 Return-Path: Delivered-To: freebsd-arm@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 76061A9E for ; Wed, 7 Aug 2013 03:57:22 +0000 (UTC) (envelope-from ian@FreeBSD.org) Received: from mho-01-ewr.mailhop.org (mho-03-ewr.mailhop.org [204.13.248.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4C6292446 for ; Wed, 7 Aug 2013 03:57:21 +0000 (UTC) Received: from c-24-8-230-52.hsd1.co.comcast.net ([24.8.230.52] helo=damnhippie.dyndns.org) by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1V6us7-000C0p-FG for freebsd-arm@FreeBSD.org; Wed, 07 Aug 2013 03:57:15 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id r773vC4i028264 for ; Tue, 6 Aug 2013 21:57:12 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 24.8.230.52 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX1+X7Kydj8HOCL2o6mTKRAyE Subject: Strange crash on wandboard From: Ian Lepore To: freebsd-arm Content-Type: text/plain; charset="us-ascii" Date: Tue, 06 Aug 2013 21:57:12 -0600 Message-ID: <1375847832.3320.97.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Aug 2013 03:57:22 -0000 Okay, this is so strange I've just got to share it... I've been having trouble with wandboard (solo) bringup and have tracked the problem down to returning from the first interrupt that happens. (It's a clock interrupt, but I don't think that's really germane.) It's as if PULLFRAMEFROMSVCANDEXIT wasn't restoring the registers correctly. At first the corruption hit the PC, which is damn hard to debug. But after figuring out just where it was happening in the code (spinlock_exit()) and inserting some extra debugging printfs, things changed a bit and now a different register is getting blasted. Here's what I get at runtime: clock intr exit returned: intr_event_handle vm_fault(0xc0cca000, e46ab000, 1, 0) -> 1 Fatal kernel mode data abort: 'Translation Fault (S)' trapframe: 0xdd3ffe24 FSR=00000005, FAR=e46abdc0, spsr=600de613 r0 =600001d3, r1 =60000113, r2 =000000c0, r3 =e46abdc0 r4 =c271f620, r5 =c271cbf0, r6 =00000000, r7 =dd3ffea8 r8 =c08d08f4, r9 =00000000, r10=00000000, r11=dd3ffe80 r12=dd3ffe70, ssp=dd3ffe70, slr=c0af2bb4, pc =c0af2be8 [ thread pid 12 tid 100006 ] Stopped at spinlock_exit+0x5c: ldr r1, [r3] db> Here's the asm code around the fault point: c0af2bd4: e10f0000 mrs r0, CPSR c0af2bd8: e1c01002 bic r1, r0, r2 c0af2bdc: e0211003 eor r1, r1, r3 c0af2be0: e121f001 msr CPSR_c, r1 c0af2be4: e59f3024 ldr r3, [pc, #36] ; c0af2c10 c0af2be8: e5931000 ldr r1, [r3] c0af2bec: e3510000 cmp r1, #0 ; 0x0 .... c0af2c10: c0bd6ae4 adcgts r6, sp, r4, ror #21 c0af2c14: c0b4e0e8 adcgts lr, r4, r8, ror #1 Okay, so the msr instruction re-enables interrupts, and the next one loads r3 with constant value 0xc0bd6ae4, then an interrupt happens (other instrumentation in PULLFRAMEFROMSVCANDEXIT on previous runs shows that this is the case every time, 100% reproducible, but that instrumentation destroys registers it shouldn't, so it's not present in the run shown above). So the interrupt happens then control returns to the instruction at c0af2be8, which faults. Now here's the strange part. Look at the fault-time r3 contents. It's the byte-reverse of the value it should have. It's been restored wrong-endian. Just one register from the whole set restored with a single "ldmia sp, {r0-r14}^" instruction. I don't know what to make of it. It seems like a hardware error of some sort. -- Ian