Date: Sat, 21 Jun 2014 00:08:28 -0700 From: John-Mark Gurney <jmg@funkthat.com> To: Andrew Turner <andrew@fubar.geek.nz>, arm@FreeBSD.org Subject: Re: AVILA getting close! Message-ID: <20140621070827.GJ31367@funkthat.com> In-Reply-To: <20140621010804.GD31367@funkthat.com> References: <20140618225808.GG31367@funkthat.com> <20140620151023.GZ31367@funkthat.com> <20140620200827.1c33c7da@bender.Home> <20140621010804.GD31367@funkthat.com>
next in thread | previous in thread | raw e-mail | index | archive | help
John-Mark Gurney wrote this message on Fri, Jun 20, 2014 at 18:08 -0700: > Andrew Turner wrote this message on Fri, Jun 20, 2014 at 20:08 +0100: > > On Fri, 20 Jun 2014 08:10:24 -0700 > > John-Mark Gurney <jmg@funkthat.com> wrote: > > > > > John-Mark Gurney wrote this message on Wed, Jun 18, 2014 at 15:58 > > > -0700: > > > > So, w/ the recent couple of patches that alc has provided, I no > > > > longer receive kernel panics on my AVILA board! > > > > > > > > $ uname -a > > > > FreeBSD avila.funkthat.com 11.0-CURRENT FreeBSD 11.0-CURRENT #27 > > > > r267333:267349M: Wed Jun 11 09:57:58 PDT 2014 > > > > jmg@carbon.funkthat.com:/usr/obj/arm.armeb/usr/src.avila/sys/AVILA > > > > arm $ uptime 12:15AM up 1 day, 15 mins, 2 users, load averages: > > > > 0.13, 0.11, 0.08 > > > > > > > > This survived a portsnap extract... This is all over NFS... > > > > > > > > Though the issue that I'm now having is that some binaries > > > > (newsyslog) and sometimes other binaries (awk, grep) core dump... > > > > > > > > I believe this is an issue w/ rtld, or related... If I compile > > > > newsyslog -static, it works fine... Otherwise I get a SIGILL, and > > > > that is because it jumps off into the weeds.. Though gdb on arm > > > > isn't very useful.. > > > > > > ok, so the SIGILL only occures under gdb, and this is because single > > > stepping into a RAS sequence doesn't work very well... If you set a > > > break point on the return (after the RAS sequence), you can get past > > > this... > > > > > > I got to the point in rtld.c code: > > > if (obj->pltrel) > > > rel = (const Elf_Rel *) ((caddr_t) obj->pltrel + reloff); > > > else > > > rel = (const Elf_Rel *) ((caddr_t) obj->pltrela + reloffand > > > was seeing gdb try to execute the pltrela line, but: i; > > > > > > and was seeing gdb try to execute the pltrela line, but: > > > (gdb) print * (const Elf_Rel *) ((caddr_t) obj->pltrela + reloff) > > > Error accessing memory address 0x118: Bad address. > > > (gdb) print/x obj->pltrela > > > $4 = 0x0 > > > (gdb) print /x reloff > > > $5 = 0x118 > > > (gdb) print obj->pltrel > > > $6 = (const Elf_Rel *) 0x94e8 > > Based on my copy of newsyslog I built for armeb this looks correct. To > > verify it could you dump the .dynamic section from the binary? > > Something like 'objdump -s newsyslog' will get it. > > ok, available at: > https://www.funkthat.com/~jmg/20140619/objdump.newsyslog > > > > Hun? obj->pltrel is non-zero, so it should have executed the other > > > line... > > > > > > I recompiled rtld w/ -O0, and sure enough, newsyslog runs fine... If > > > I compile w/o -O, or w/ -O1, it fails... > > > > > > Comments or suggestions? > > > > What is the value of rel after the if statement? In the -O/-O1 case the > > asm looks like: > > > > ldr r2, [sp, #20] ; Load obj to r2 > > ldr r3, [r2, #124] ; Load obj->pltrel to r3 > > cmp r3, #0 ; 0x0 ; if obj->pltrel: > > ldrne r2, [sp, #16] ; != NULL: Load reloff to r2 > > addne r4, r3, r2 ; != NULL: Add obj->pltrel + reloff to r4 > > ldreq r2, [sp, #20] ; == NULL: Load obj to r2 > > ldreq r3, [r2, #132] ; == NULL: Load obj->pltrela to r3 > > ldreq r2, [sp, #16] ; == NULL: Load reloff to r2 > > addeq r4, r2, r3 ; == NULL: Add obj->pltrela + reloff to r4 > > > > Given this I could see how gdb gets confused. > > > > It may also pay to get the registers from gdb at this point. > > Arg! This is frustrating, I'm getting such different behaviors from > time to time.. now it isn't having that fault.. but it's getting > farther, but... I this is because our in tree gdb is messed up.. > > But, I am getting farther... now the last break at rtld.c:3651 > looks like it's returning a bogus pointer: > (gdb) print *req > $12 = {name = 0x9190 "__aeabi_read_tp", hash = 0xf008a80, > hash_gnu = 0x52432dd3, ventry = 0x2003b1d0, flags = 0x1, > defobj_out = 0x2003c400, sym_out = 0x20062454, lockstate = 0xbfffeda0} > > defobj_out looks bogus to me... We don't have any object mapped > there: > (gdb) info shared > >From To Syms Read Shared Object Library > 0x200427c8 0x20048814 Yes /lib/libgcc_s.so.1 > 0x2007a4e8 0x2017f320 Yes /lib/libc.so.7 > 0x20018f14 0x2002c99c Yes /libexec/ld-elf.so.1 > > the data at 0x2003c400 doesn't look like code: > (gdb) x/32x 0x2003c400 > 0x2003c400: 0xd550b87a 0x00000001 0x00000000 0x2003a080 > 0x2003c410: 0x00000000 0x00000001 0x00000000 0x20051000 > 0x2003c420: 0x0016a000 0x00143000 0x00000000 0x20051000 > 0x2003c430: 0x2019dfd0 0x2007a4e8 0x20051034 0x000000a0 > 0x2003c440: 0x00000000 0x00000007 0x00000002 0x2019bcf0 > 0x2003c450: 0x00000004 0x00000058 0x00000000 0x00000008 > 0x2003c460: 0x20051000 0x00000000 0x2019e0b8 0x200713e0 > 0x2003c470: 0x000040c0 0x00000000 0x00000000 0x200754a0 > > and then as I stepi out of symlook_global: > (gdb) x/6i $pc > 0x2001f0b4 <symlook_global+348>: cmp r0, #0 ; 0x0 > 0x2001f0b8 <symlook_global+352>: moveq r0, #3 ; 0x3 > 0x2001f0bc <symlook_global+356>: movne r0, #0 ; 0x0 > 0x2001f0c0 <symlook_global+360>: add sp, sp, #36 ; 0x24 > 0x2001f0c4 <symlook_global+364>: pop {r4, r5, r6, r7, lr} > 0x2001f0c8 <symlook_global+368>: bx lr > (gdb) info registers > r0 0x20062454 0x20062454 > r1 0x933b 0x933b > r2 0x0 0x0 > r3 0xa4 0xa4 > r4 0x0 0x0 > r5 0xbfffed3c 0xbfffed3c > r6 0xbfffed08 0xbfffed08 > r7 0x20037af4 0x20037af4 > r8 0x0 0x0 > r9 0x1 0x1 > r10 0x8a2c 0x8a2c > r11 0xbfffed30 0xbfffed30 > r12 0x23de 0x23de > sp 0xbfffec94 0xbfffec94 > lr 0x2001efb0 0x2001efb0 > pc 0x2001f0b4 0x2001f0b4 > fps 0x0 0x0 > cpsr 0x60000010 0x60000010 > > Then stepi till 0x2001f0c8: > (gdb) info registers > r0 0x0 0x0 > r1 0x933b 0x933b > r2 0x0 0x0 > r3 0xa4 0xa4 > r4 0x2003c000 0x2003c000 > r5 0xbfffed3c 0xbfffed3c > r6 0x20037af4 0x20037af4 > r7 0xbfffece8 0xbfffece8 > r8 0x0 0x0 > r9 0x1 0x1 > r10 0x8a2c 0x8a2c > r11 0xbfffed30 0xbfffed30 > r12 0x23de 0x23de > sp 0xbfffeccc 0xbfffeccc > lr 0x2003c000 0x2003c000 > pc 0x2001f0c8 0x2001f0c8 > fps 0x0 0x0 > cpsr 0x20000010 0x20000010 > > and now the lr is bogus... it transfers control to 0x2003c000 which > is before the fault at 0x2003c0f4... And again, this looks like data, > not code: > (gdb) x/64x 0x2003c000 > 0x2003c000: 0xd550b87a 0x00000001 0x2003c200 0xbfffffb8 > 0x2003c010: 0x00000000 0x00000001 0x00000000 0x00008000 > 0x2003c020: 0x00012000 0x00009000 0x00008000 0x00000000 > 0x2003c030: 0x00018724 0x00009ca8 0x00008034 0x000000e0 > 0x2003c040: 0x00008114 0x00000007 0x00000000 0x00000000 > 0x2003c050: 0x00000000 0x00000000 0x00000000 0x00000000 > 0x2003c060: 0x00000000 0x00000000 0x0001881c 0x000094a0 > 0x2003c070: 0x00000048 0x00000000 0x00000000 0x000094e8 > 0x2003c080: 0x00000308 0x00000000 0x00000000 0x0000888c > 0x2003c090: 0x00008f9c 0x000003b1 0x00009430 0x00000002 > 0x2003c0a0: 0x00000000 0x00000000 0x0000934e 0x00008180 > 0x2003c0b0: 0x00000061 0x00008304 0x00000071 0x00000061 > 0x2003c0c0: 0x00000005 0x0000001f 0x0000000a 0x00000071 > 0x2003c0d0: 0x000084d8 0x00008558 0x000086c8 0x00000000 > 0x2003c0e0: 0x00000000 0x2003d000 0x00000000 0x00000000 > 0x2003c0f0: 0x00000000 0x2003c0f0 0x2003b180 0x00000007 > > If I continue to stepi from here, it will fault at f4... > > This looks like a stack smash issue as the lr we pop off the stack > is incorrect.. So, forgot I could set a watchpoint on the stack address, so I did: (gdb) c Continuing. Watchpoint 9: *0xbfffecc8 Old value = 0x2001f584 New value = 0x2003c000 0x20019960 in donelist_check (dlp=0xbfffed08, obj=0x2003c000) at /usr/src.avila/libexec/rtld-elf/rtld.c:1380 1380 dlp->objs[dlp->num_used++] = obj; I tracked dlp, and it's the donelist pointer, and it is allocated from symlook_default via donelist_init at the top of the function: #define donelist_init(dlp) \ ((dlp)->objs = alloca(obj_count * sizeof (dlp)->objs[0]), \ assert((dlp)->objs != NULL), \ (dlp)->num_alloc = obj_count, \ (dlp)->num_used = 0) Hmm... since obj_count is a global, and we are in a possibly threaded environment, we should probably move assigning num_alloc before the alloca, and then use num_alloc instead of obj_count just to be safe... but taht won't be the issue here, as num_used is 0 in my case... I've looked at the assembly for donelist_init for both the working and non-working case, and besides an extra store in the good version, which I think is just because of the unoptimized code, things look the same to me... I guess the next step is to make sure that the Donelist pointer is the same... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140621070827.GJ31367>