Date: Fri, 20 Jun 2014 18:08:04 -0700 From: John-Mark Gurney <jmg@funkthat.com> To: Andrew Turner <andrew@fubar.geek.nz> Cc: arm@FreeBSD.org Subject: Re: AVILA getting close! Message-ID: <20140621010804.GD31367@funkthat.com> In-Reply-To: <20140620200827.1c33c7da@bender.Home> References: <20140618225808.GG31367@funkthat.com> <20140620151023.GZ31367@funkthat.com> <20140620200827.1c33c7da@bender.Home>
next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Turner wrote this message on Fri, Jun 20, 2014 at 20:08 +0100: > On Fri, 20 Jun 2014 08:10:24 -0700 > John-Mark Gurney <jmg@funkthat.com> wrote: > > > John-Mark Gurney wrote this message on Wed, Jun 18, 2014 at 15:58 > > -0700: > > > So, w/ the recent couple of patches that alc has provided, I no > > > longer receive kernel panics on my AVILA board! > > > > > > $ uname -a > > > FreeBSD avila.funkthat.com 11.0-CURRENT FreeBSD 11.0-CURRENT #27 > > > r267333:267349M: Wed Jun 11 09:57:58 PDT 2014 > > > jmg@carbon.funkthat.com:/usr/obj/arm.armeb/usr/src.avila/sys/AVILA > > > arm $ uptime 12:15AM up 1 day, 15 mins, 2 users, load averages: > > > 0.13, 0.11, 0.08 > > > > > > This survived a portsnap extract... This is all over NFS... > > > > > > Though the issue that I'm now having is that some binaries > > > (newsyslog) and sometimes other binaries (awk, grep) core dump... > > > > > > I believe this is an issue w/ rtld, or related... If I compile > > > newsyslog -static, it works fine... Otherwise I get a SIGILL, and > > > that is because it jumps off into the weeds.. Though gdb on arm > > > isn't very useful.. > > > > ok, so the SIGILL only occures under gdb, and this is because single > > stepping into a RAS sequence doesn't work very well... If you set a > > break point on the return (after the RAS sequence), you can get past > > this... > > > > I got to the point in rtld.c code: > > if (obj->pltrel) > > rel = (const Elf_Rel *) ((caddr_t) obj->pltrel + reloff); > > else > > rel = (const Elf_Rel *) ((caddr_t) obj->pltrela + reloffand > > was seeing gdb try to execute the pltrela line, but: i; > > > > and was seeing gdb try to execute the pltrela line, but: > > (gdb) print * (const Elf_Rel *) ((caddr_t) obj->pltrela + reloff) > > Error accessing memory address 0x118: Bad address. > > (gdb) print/x obj->pltrela > > $4 = 0x0 > > (gdb) print /x reloff > > $5 = 0x118 > > (gdb) print obj->pltrel > > $6 = (const Elf_Rel *) 0x94e8 > Based on my copy of newsyslog I built for armeb this looks correct. To > verify it could you dump the .dynamic section from the binary? > Something like 'objdump -s newsyslog' will get it. ok, available at: https://www.funkthat.com/~jmg/20140619/objdump.newsyslog > > Hun? obj->pltrel is non-zero, so it should have executed the other > > line... > > > > I recompiled rtld w/ -O0, and sure enough, newsyslog runs fine... If > > I compile w/o -O, or w/ -O1, it fails... > > > > Comments or suggestions? > > What is the value of rel after the if statement? In the -O/-O1 case the > asm looks like: > > ldr r2, [sp, #20] ; Load obj to r2 > ldr r3, [r2, #124] ; Load obj->pltrel to r3 > cmp r3, #0 ; 0x0 ; if obj->pltrel: > ldrne r2, [sp, #16] ; != NULL: Load reloff to r2 > addne r4, r3, r2 ; != NULL: Add obj->pltrel + reloff to r4 > ldreq r2, [sp, #20] ; == NULL: Load obj to r2 > ldreq r3, [r2, #132] ; == NULL: Load obj->pltrela to r3 > ldreq r2, [sp, #16] ; == NULL: Load reloff to r2 > addeq r4, r2, r3 ; == NULL: Add obj->pltrela + reloff to r4 > > Given this I could see how gdb gets confused. > > It may also pay to get the registers from gdb at this point. Arg! This is frustrating, I'm getting such different behaviors from time to time.. now it isn't having that fault.. but it's getting farther, but... I this is because our in tree gdb is messed up.. But, I am getting farther... now the last break at rtld.c:3651 looks like it's returning a bogus pointer: (gdb) print *req $12 = {name = 0x9190 "__aeabi_read_tp", hash = 0xf008a80, hash_gnu = 0x52432dd3, ventry = 0x2003b1d0, flags = 0x1, defobj_out = 0x2003c400, sym_out = 0x20062454, lockstate = 0xbfffeda0} defobj_out looks bogus to me... We don't have any object mapped there: (gdb) info shared >From To Syms Read Shared Object Library 0x200427c8 0x20048814 Yes /lib/libgcc_s.so.1 0x2007a4e8 0x2017f320 Yes /lib/libc.so.7 0x20018f14 0x2002c99c Yes /libexec/ld-elf.so.1 the data at 0x2003c400 doesn't look like code: (gdb) x/32x 0x2003c400 0x2003c400: 0xd550b87a 0x00000001 0x00000000 0x2003a080 0x2003c410: 0x00000000 0x00000001 0x00000000 0x20051000 0x2003c420: 0x0016a000 0x00143000 0x00000000 0x20051000 0x2003c430: 0x2019dfd0 0x2007a4e8 0x20051034 0x000000a0 0x2003c440: 0x00000000 0x00000007 0x00000002 0x2019bcf0 0x2003c450: 0x00000004 0x00000058 0x00000000 0x00000008 0x2003c460: 0x20051000 0x00000000 0x2019e0b8 0x200713e0 0x2003c470: 0x000040c0 0x00000000 0x00000000 0x200754a0 and then as I stepi out of symlook_global: (gdb) x/6i $pc 0x2001f0b4 <symlook_global+348>: cmp r0, #0 ; 0x0 0x2001f0b8 <symlook_global+352>: moveq r0, #3 ; 0x3 0x2001f0bc <symlook_global+356>: movne r0, #0 ; 0x0 0x2001f0c0 <symlook_global+360>: add sp, sp, #36 ; 0x24 0x2001f0c4 <symlook_global+364>: pop {r4, r5, r6, r7, lr} 0x2001f0c8 <symlook_global+368>: bx lr (gdb) info registers r0 0x20062454 0x20062454 r1 0x933b 0x933b r2 0x0 0x0 r3 0xa4 0xa4 r4 0x0 0x0 r5 0xbfffed3c 0xbfffed3c r6 0xbfffed08 0xbfffed08 r7 0x20037af4 0x20037af4 r8 0x0 0x0 r9 0x1 0x1 r10 0x8a2c 0x8a2c r11 0xbfffed30 0xbfffed30 r12 0x23de 0x23de sp 0xbfffec94 0xbfffec94 lr 0x2001efb0 0x2001efb0 pc 0x2001f0b4 0x2001f0b4 fps 0x0 0x0 cpsr 0x60000010 0x60000010 Then stepi till 0x2001f0c8: (gdb) info registers r0 0x0 0x0 r1 0x933b 0x933b r2 0x0 0x0 r3 0xa4 0xa4 r4 0x2003c000 0x2003c000 r5 0xbfffed3c 0xbfffed3c r6 0x20037af4 0x20037af4 r7 0xbfffece8 0xbfffece8 r8 0x0 0x0 r9 0x1 0x1 r10 0x8a2c 0x8a2c r11 0xbfffed30 0xbfffed30 r12 0x23de 0x23de sp 0xbfffeccc 0xbfffeccc lr 0x2003c000 0x2003c000 pc 0x2001f0c8 0x2001f0c8 fps 0x0 0x0 cpsr 0x20000010 0x20000010 and now the lr is bogus... it transfers control to 0x2003c000 which is before the fault at 0x2003c0f4... And again, this looks like data, not code: (gdb) x/64x 0x2003c000 0x2003c000: 0xd550b87a 0x00000001 0x2003c200 0xbfffffb8 0x2003c010: 0x00000000 0x00000001 0x00000000 0x00008000 0x2003c020: 0x00012000 0x00009000 0x00008000 0x00000000 0x2003c030: 0x00018724 0x00009ca8 0x00008034 0x000000e0 0x2003c040: 0x00008114 0x00000007 0x00000000 0x00000000 0x2003c050: 0x00000000 0x00000000 0x00000000 0x00000000 0x2003c060: 0x00000000 0x00000000 0x0001881c 0x000094a0 0x2003c070: 0x00000048 0x00000000 0x00000000 0x000094e8 0x2003c080: 0x00000308 0x00000000 0x00000000 0x0000888c 0x2003c090: 0x00008f9c 0x000003b1 0x00009430 0x00000002 0x2003c0a0: 0x00000000 0x00000000 0x0000934e 0x00008180 0x2003c0b0: 0x00000061 0x00008304 0x00000071 0x00000061 0x2003c0c0: 0x00000005 0x0000001f 0x0000000a 0x00000071 0x2003c0d0: 0x000084d8 0x00008558 0x000086c8 0x00000000 0x2003c0e0: 0x00000000 0x2003d000 0x00000000 0x00000000 0x2003c0f0: 0x00000000 0x2003c0f0 0x2003b180 0x00000007 If I continue to stepi from here, it will fault at f4... This looks like a stack smash issue as the lr we pop off the stack is incorrect.. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140621010804.GD31367>