Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jun 2014 18:08:04 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Andrew Turner <andrew@fubar.geek.nz>
Cc:        arm@FreeBSD.org
Subject:   Re: AVILA getting close!
Message-ID:  <20140621010804.GD31367@funkthat.com>
In-Reply-To: <20140620200827.1c33c7da@bender.Home>
References:  <20140618225808.GG31367@funkthat.com> <20140620151023.GZ31367@funkthat.com> <20140620200827.1c33c7da@bender.Home>

next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Turner wrote this message on Fri, Jun 20, 2014 at 20:08 +0100:
> On Fri, 20 Jun 2014 08:10:24 -0700
> John-Mark Gurney <jmg@funkthat.com> wrote:
> 
> > John-Mark Gurney wrote this message on Wed, Jun 18, 2014 at 15:58
> > -0700:
> > > So, w/ the recent couple of patches that alc has provided, I no
> > > longer receive kernel panics on my AVILA board!
> > > 
> > > $ uname -a
> > > FreeBSD avila.funkthat.com 11.0-CURRENT FreeBSD 11.0-CURRENT #27
> > > r267333:267349M: Wed Jun 11 09:57:58 PDT 2014
> > > jmg@carbon.funkthat.com:/usr/obj/arm.armeb/usr/src.avila/sys/AVILA
> > > arm $ uptime 12:15AM  up 1 day, 15 mins, 2 users, load averages:
> > > 0.13, 0.11, 0.08
> > > 
> > > This survived a portsnap extract...  This is all over NFS...
> > > 
> > > Though the issue that I'm now having is that some binaries
> > > (newsyslog) and sometimes other binaries (awk, grep) core dump...
> > > 
> > > I believe this is an issue w/ rtld, or related...  If I compile
> > > newsyslog -static, it works fine...  Otherwise I get a SIGILL, and
> > > that is because it jumps off into the weeds..  Though gdb on arm
> > > isn't very useful..
> > 
> > ok, so the SIGILL only occures under gdb, and this is because single
> > stepping into a RAS sequence doesn't work very well...  If you set a
> > break point on the return (after the RAS sequence), you can get past
> > this...
> > 
> > I got to the point in rtld.c code:
> >     if (obj->pltrel)
> >         rel = (const Elf_Rel *) ((caddr_t) obj->pltrel + reloff);
> >     else
> >         rel = (const Elf_Rel *) ((caddr_t) obj->pltrela + reloffand
> > was seeing gdb try to execute the pltrela line, but: i;
> > 
> > and was seeing gdb try to execute the pltrela line, but:
> > (gdb) print * (const Elf_Rel *) ((caddr_t) obj->pltrela + reloff)
> > Error accessing memory address 0x118: Bad address.
> > (gdb) print/x obj->pltrela
> > $4 = 0x0
> > (gdb) print /x reloff
> > $5 = 0x118
> > (gdb) print obj->pltrel
> > $6 = (const Elf_Rel *) 0x94e8
> Based on my copy of newsyslog I built for armeb this looks correct. To
> verify it could you dump the .dynamic section from the binary?
> Something like 'objdump -s newsyslog' will get it.

ok, available at:
https://www.funkthat.com/~jmg/20140619/objdump.newsyslog

> > Hun? obj->pltrel is non-zero, so it should have executed the other
> > line...
> > 
> > I recompiled rtld w/ -O0, and sure enough, newsyslog runs fine... If
> > I compile w/o -O, or w/ -O1, it fails...
> > 
> > Comments or suggestions?
> 
> What is the value of rel after the if statement? In the -O/-O1 case the
> asm looks like:
> 
> ldr     r2, [sp, #20]  ; Load obj to r2
> ldr     r3, [r2, #124] ; Load obj->pltrel to r3
> cmp     r3, #0  ; 0x0  ; if obj->pltrel:
> ldrne   r2, [sp, #16]  ;  != NULL: Load reloff to r2
> addne   r4, r3, r2     ;  != NULL: Add obj->pltrel + reloff to r4
> ldreq   r2, [sp, #20]  ;  == NULL: Load obj to r2
> ldreq   r3, [r2, #132] ;  == NULL: Load obj->pltrela to r3
> ldreq   r2, [sp, #16]  ;  == NULL: Load reloff to r2
> addeq   r4, r2, r3     ;  == NULL: Add obj->pltrela + reloff to r4
> 
> Given this I could see how gdb gets confused.
> 
> It may also pay to get the registers from gdb at this point.

Arg! This is frustrating, I'm getting such different behaviors from
time to time..  now it isn't having that fault..  but it's getting
farther, but...  I this is because our in tree gdb is messed up..

But, I am getting farther...  now the last break at rtld.c:3651
looks like it's returning a bogus pointer:
(gdb) print *req
$12 = {name = 0x9190 "__aeabi_read_tp", hash = 0xf008a80, 
  hash_gnu = 0x52432dd3, ventry = 0x2003b1d0, flags = 0x1, 
  defobj_out = 0x2003c400, sym_out = 0x20062454, lockstate = 0xbfffeda0}

defobj_out looks bogus to me...  We don't have any object mapped
there:
(gdb) info shared  
>From        To          Syms Read   Shared Object Library
0x200427c8  0x20048814  Yes         /lib/libgcc_s.so.1
0x2007a4e8  0x2017f320  Yes         /lib/libc.so.7
0x20018f14  0x2002c99c  Yes         /libexec/ld-elf.so.1

the data at 0x2003c400 doesn't look like code:
(gdb) x/32x 0x2003c400
0x2003c400:     0xd550b87a      0x00000001      0x00000000      0x2003a080
0x2003c410:     0x00000000      0x00000001      0x00000000      0x20051000
0x2003c420:     0x0016a000      0x00143000      0x00000000      0x20051000
0x2003c430:     0x2019dfd0      0x2007a4e8      0x20051034      0x000000a0
0x2003c440:     0x00000000      0x00000007      0x00000002      0x2019bcf0
0x2003c450:     0x00000004      0x00000058      0x00000000      0x00000008
0x2003c460:     0x20051000      0x00000000      0x2019e0b8      0x200713e0
0x2003c470:     0x000040c0      0x00000000      0x00000000      0x200754a0

and then as I stepi out of symlook_global:
(gdb) x/6i $pc
0x2001f0b4 <symlook_global+348>:        cmp     r0, #0  ; 0x0
0x2001f0b8 <symlook_global+352>:        moveq   r0, #3  ; 0x3
0x2001f0bc <symlook_global+356>:        movne   r0, #0  ; 0x0
0x2001f0c0 <symlook_global+360>:        add     sp, sp, #36     ; 0x24
0x2001f0c4 <symlook_global+364>:        pop     {r4, r5, r6, r7, lr}
0x2001f0c8 <symlook_global+368>:        bx      lr
(gdb) info registers
r0             0x20062454       0x20062454
r1             0x933b   0x933b
r2             0x0      0x0
r3             0xa4     0xa4
r4             0x0      0x0
r5             0xbfffed3c       0xbfffed3c
r6             0xbfffed08       0xbfffed08
r7             0x20037af4       0x20037af4
r8             0x0      0x0
r9             0x1      0x1
r10            0x8a2c   0x8a2c
r11            0xbfffed30       0xbfffed30
r12            0x23de   0x23de
sp             0xbfffec94       0xbfffec94
lr             0x2001efb0       0x2001efb0
pc             0x2001f0b4       0x2001f0b4
fps            0x0      0x0
cpsr           0x60000010       0x60000010

Then stepi till 0x2001f0c8:
(gdb) info registers
r0             0x0      0x0
r1             0x933b   0x933b
r2             0x0      0x0
r3             0xa4     0xa4
r4             0x2003c000       0x2003c000
r5             0xbfffed3c       0xbfffed3c
r6             0x20037af4       0x20037af4
r7             0xbfffece8       0xbfffece8
r8             0x0      0x0
r9             0x1      0x1
r10            0x8a2c   0x8a2c
r11            0xbfffed30       0xbfffed30
r12            0x23de   0x23de
sp             0xbfffeccc       0xbfffeccc
lr             0x2003c000       0x2003c000
pc             0x2001f0c8       0x2001f0c8
fps            0x0      0x0
cpsr           0x20000010       0x20000010

and now the lr is bogus...  it transfers control to 0x2003c000 which
is before the fault at 0x2003c0f4...  And again, this looks like data,
not code:
(gdb) x/64x 0x2003c000
0x2003c000:     0xd550b87a      0x00000001      0x2003c200      0xbfffffb8
0x2003c010:     0x00000000      0x00000001      0x00000000      0x00008000
0x2003c020:     0x00012000      0x00009000      0x00008000      0x00000000
0x2003c030:     0x00018724      0x00009ca8      0x00008034      0x000000e0
0x2003c040:     0x00008114      0x00000007      0x00000000      0x00000000
0x2003c050:     0x00000000      0x00000000      0x00000000      0x00000000
0x2003c060:     0x00000000      0x00000000      0x0001881c      0x000094a0
0x2003c070:     0x00000048      0x00000000      0x00000000      0x000094e8
0x2003c080:     0x00000308      0x00000000      0x00000000      0x0000888c
0x2003c090:     0x00008f9c      0x000003b1      0x00009430      0x00000002
0x2003c0a0:     0x00000000      0x00000000      0x0000934e      0x00008180
0x2003c0b0:     0x00000061      0x00008304      0x00000071      0x00000061
0x2003c0c0:     0x00000005      0x0000001f      0x0000000a      0x00000071
0x2003c0d0:     0x000084d8      0x00008558      0x000086c8      0x00000000
0x2003c0e0:     0x00000000      0x2003d000      0x00000000      0x00000000
0x2003c0f0:     0x00000000      0x2003c0f0      0x2003b180      0x00000007

If I continue to stepi from here, it will fault at f4...

This looks like a stack smash issue as the lr we pop off the stack
is incorrect..

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140621010804.GD31367>