Date: Wed, 7 Nov 2018 11:53:20 -0800 From: Mark Millard <marklmi26-fbsd@yahoo.com> To: Konstantin Belousov <kostikbel@gmail.com> Cc: svn-src-head@freebsd.org Subject: Re: svn commit: r339876 - head/libexec/rtld-elf Message-ID: <7757A519-9262-40CC-A3F6-77AD243DDB28@yahoo.com> In-Reply-To: <8FFCF603-6315-4D1C-858B-FC7233C17DD7@yahoo.com> References: <8E5A5F3A-F1A7-4702-A2F7-65D74CC5B2E5@yahoo.com> <20181102004101.GI5335@kib.kiev.ua> <E44F5772-1F8A-40B8-9C4E-B8362B768F37@yahoo.com> <003A49D7-6E8B-4775-A70B-E0EB44505D4B@yahoo.com> <20181102113827.GM5335@kib.kiev.ua> <7B29A4C8-228D-41CB-B594-98DFA456E9C8@yahoo.com> <20181102155234.GN5335@kib.kiev.ua> <E93B3880-281E-482C-9DA7-851398543B97@yahoo.com> <20181102185014.GP5335@kib.kiev.ua> <8FFCF603-6315-4D1C-858B-FC7233C17DD7@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[I trace code associated with bl <00001322.plt_call.getenv> in the two contexts and extend the range over which things appear to match: up to some point after the branch b <__glink_PLTresolve> .] On 2018-Nov-6, at 19:12, Mark Millard <marklmi26-fbsd@yahoo.com> wrote: > [I've present a little information about the longer-existing > failure's odd backtrace for /libexec/ld-elf.so.1 /bin/ls > --but on powerpc64 FreeBSD instead of 32-bit powerpc FreeBSD.] >=20 > On 2018-Nov-2, at 11:50, Konstantin Belousov <kostikbel at gmail.com> = wrote: >=20 >> On Fri, Nov 02, 2018 at 10:38:08AM -0700, Mark Millard wrote: >>> On 2018-Nov-2, at 8:52 AM, Konstantin Belousov <kostikbel at = gmail.com> wrote: >>>=20 >>>> . . . >>>=20 >>> That seems better. But it crashes during /bin/ls execution >>> ( 0x0180???? addresses ), apparently in a library routine >>> ( 0x41?????? addresses ): >>>=20 >>> Program received signal SIGSEGV, Segmentation fault. >>> 0x411220b4 in ?? () >>> (gdb) bt >>> #0 0x411220b4 in ?? () >>> #1 0x4112200c in ?? () >>> #2 0x01803c84 in ?? () >>> #3 0x018023b4 in ?? () >>> #4 0x010121a0 in .rtld_start () at = /usr/src/libexec/rtld-elf/powerpc/rtld_start.S:112 >>>=20 >>> Using a normal gdb run of /bin/ls suggests: >>>=20 >>> #2 0x01803c84 in ?? () should be in main and seems to be: bl = 0x1818914 <getopt_long@plt> >>> #3 0x018023b4 in ?? () should be in _start >>>=20 >>> Looking in the test context: >>>=20 >>> 0x1803c80: bl 0x1818914 >>> 0x1803c84: cmpwi cr7,r3,-1 >>>=20 >>> and: >>>=20 >>> 0x1818914: li r11,59 >>> 0x1818918: b 0x18186f4 >>>=20 >>> and: >>>=20 >>> 0x18186f4: rlwinm r11,r11,2,0,29 >>> 0x18186f8: addis r11,r11,386 >>> 0x18186fc: lwz r11,-30316(r11) >>> 0x1818700: mtctr r11 >>> 0x1818704: bctr >>>=20 >>> Breaking at the bctr and using info reg: >>>=20 >>> r11 0x4125ffa0 1093009312 >>>=20 >>> It looks like there is some amount of >>> activity before the traceback addresses >>> show up. >>>=20 >>> I've not found a good way to fill in the "in ??()" >>> (or analogous) information. The addresses 0x411220?? >>> do not match up with a normal run of /bin/ls from >>> gdb: the addresses can not be accessed. >>>=20 >>>=20 >>>=20 >>> It does appear that the code is in /lib/libc.so.7 in the >>> test context: >>>=20 >>> Breakpoint 2, reloc_non_plt (obj=3D0x41041600, obj_rtld=3D0x41104b57, = flags=3D4, lockstate=3D0x0) at = /usr/src/libexec/rtld-elf/powerpc/reloc.c:338 >>> . . . >>>=20 >> There seems to be an issue with the direct execution mode on ppc. >> Even otherwise working ld-elf.so.1 segfaults if I try to use it as >> standalone binary. >>=20 >> But if I specify patched ld-elf.so.1 as the interpreter for some = program, >> using 'cc -Wl,-I,<path>/ld-elf.so.1' it works. So I see there two = bugs, >> one is regression due to textsize calculation, which should be fixed = by >> my patch. Another is the direct exec problem. >=20 > I've got a little more information about the odd backtrace > from the /libexec/ld-elf.so.1 /bin/ls failure that the > prior patch allowed getting to, although for a powerpc64 > example context. >=20 > The information is only identifying where the code was > in /bin/ls and /lib/libc.so.1 in the backtrace. For > libc.so.1 I found the same code sequences in a gdb of > /bin/ls directly, matching one first, using the addresses > vs. in the /libexec/ld-elf.so.1 /bin/ls process to > find offsets for going back and forth, and then used > that two find the 2nd backtrace addresses material. >=20 > Overall it suggests to me that (in somewhat=20 > symbolic terms): >=20 > bl <00001322.plt_call.getenv> >=20 > eventually lead to executing the wrong code. >=20 >=20 > The supporting detail is as follows. >=20 > The /libexec/ld-elf.so.1 part of the backtrace was > easy to find where the code was: >=20 > (gdb) run /bin/ls > Starting program: /libexec/ld-elf.so.1 /bin/ls >=20 > Program received signal SIGSEGV, Segmentation fault. > 0x000000080118d81c in ?? () > (gdb) bt > #0 0x000000080118d81c in ?? () > #1 0x000000080118d920 in ?? () > #2 0x0000000010002558 in ?? () > #3 0x00000000100037b0 in ?? () > #4 0x0000000001018450 in ._rtld_start () at = /usr/src/libexec/rtld-elf/powerpc64/rtld_start.S:104 > Backtrace stopped: frame did not save the PC >=20 > (gdb)=20 > 101 ld %r7,128(%r1) /* exit proc */ > 102 ld %r8,136(%r1) /* ps_strings */ > 103=09 > 104 blrl /* _start(argc, argv, envp, obj, cleanup, = ps_strings) */ > 105=09 > 106 li %r0,1 /* _exit() */ > 107 sc >=20 >=20 > The /bin/ls part of the backtrace was easy to find > were the code was: >=20 > (gdb) symbol-file /bin/ls > Load new symbol table from "/bin/ls"? (y or n) y > Reading symbols from /bin/ls...Reading symbols from = /usr/lib/debug//bin/ls.debug...done. > done. > (gdb) bt > #0 0x000000080118d81c in ?? () > #1 0x000000080118d920 in ?? () > #2 0x0000000010002558 in main (argc=3D<optimized out>, = argv=3D0x80134bdb0) at /usr/src/bin/ls/ls.c:268 > #3 0x00000000100037b0 in _start (argc=3D<optimized out>, = argv=3D0x3fffffffffffdb70, env=3D0x3fffffffffffdb88, obj=3D<optimized = out>, cleanup=3D<optimized out>, ps_strings=3D<optimized out>) > at /usr/src/lib/csu/powerpc64/crt1.c:96 > #4 0x0000000001018450 in ?? () > #5 0x0000000000000000 in ?? () >=20 > (gdb) fr 3=20 > #3 0x00000000100037b0 in _start (argc=3D<optimized out>, = argv=3D0x3fffffffffffdb70, env=3D0x3fffffffffffdb88, obj=3D<optimized = out>, cleanup=3D<optimized out>, ps_strings=3D<optimized out>) > at /usr/src/lib/csu/powerpc64/crt1.c:96 > 96 exit(main(argc, argv, env)); > (gdb) down > #2 0x0000000010002558 in main (argc=3D<optimized out>, = argv=3D0x80134bdb0) at /usr/src/bin/ls/ls.c:268 > 268 while ((ch =3D getopt_long(argc, argv, >=20 >=20 >=20 > For the messy lib.libc.so.1 part of the backtrace both > addresses are in getopt_internal. I show extractions from > the the gdb /bin/ls output because it has helpful symbolic > information displayed. But that means that the addresses > are offset from those in the bt for the failure process. >=20 > For #1 0x000000080118d920 in ?? () I end up with: >=20 > (gdb) x/32i 0x81019b6c0+0xad0-0x880 > 0x81019b910 <getopt_internal+592>: stw r9,0(r18) > 0x81019b914 <getopt_internal+596>: addis r3,r2,-5 > 0x81019b918 <getopt_internal+600>: addi r3,r3,30120 > 0x81019b91c <getopt_internal+604>: bl 0x81018dfe0 = <00001322.plt_call.getenv> > 0x81019b920 <getopt_internal+608>: ld r2,40(r1) >=20 > (The machine code around it all matches around > 0x000000080118d920 in the failure context.) >=20 > The getenv call in the source is the 2nd line of: >=20 > if (posixly_correct =3D=3D -1 || optreset) > posixly_correct =3D (getenv("POSIXLY_CORRECT") !=3D = NULL); >=20 > For #0 0x000000080118d81c in ?? () I end up with: >=20 > (gdb) x/32i 0x81019b6c0+0xad0-0x880-0x110 > 0x81019b800 <getopt_internal+320>: bne cr7,0x81019b868 = <getopt_internal+424> > 0x81019b804 <getopt_internal+324>: lwa r5,0(r29) > 0x81019b808 <getopt_internal+328>: stw r17,0(r18) > 0x81019b80c <getopt_internal+332>: cmpw cr7,r5,r19 > 0x81019b810 <getopt_internal+336>: bge cr7,0x81019ba60 = <getopt_internal+928> > 0x81019b814 <getopt_internal+340>: rldicr r9,r5,3,60 > 0x81019b818 <getopt_internal+344>: ldx r10,r20,r9 > 0x81019b81c <getopt_internal+348>: lbz r9,0(r10) >=20 > with the failure being that r10 is zero in that last > line above. Again the surrounding code matches. >=20 > The source code line is reported to be: >=20 > if (*(place =3D nargv[optind]) !=3D '-' || >=20 > I got the line number information from breakpoints 3 and 4 > below (from the gdb /bin/ls process): >=20 > (gdb) info br > Num Type Disp Enb Address What > 1 breakpoint keep y 0x0000000010002360 in main at = /usr/src/bin/ls/ls.c:231 > breakpoint already hit 1 time > 3 breakpoint keep y 0x000000081019b81c in getopt_internal = at /usr/src/lib/libc/stdlib/getopt_long.c:411 > 4 breakpoint keep y 0x000000081019b91c in getopt_internal = at /usr/src/lib/libc/stdlib/getopt_long.c:379 >=20 > Line 379 has the getenv call, matching the machine code showing > the call. >=20 > (I set the breakpoints just as a way of using "info br" to list > the information later.) >=20 > Overall this seems to suggest that: >=20 > bl <00001322.plt_call.getenv> >=20 > lead to something odd happening and got to the wrong > code. >=20 > That is all the additional information that I have > at this point. I hope it is of some use. >=20 I'll note that the normal cases execution does the getenv call but does not execute the lbz r9,0(r10) related code. I'll also note that for the libc.so.1 code the /libexec/ld-elf.so.1 /bin/ls code addresses are bigger than the /bin/ls addresses by: 0x81019b920 - 0x80118d920 =3D 0xF00E000 I use this to go back and forth, checking for matching code as I go. Presenting the normal /bin/ls use in gdb first for up to b <__glink_PLTresolve> : I'd already shown: 0x81019b91c <getopt_internal+604>: bl 0x81018dfe0 = <00001322.plt_call.getenv> Looking: 0x81018dfe0 <00001322.plt_call.getenv>: std r2,40(r1) 0x81018dfe4 <00001322.plt_call.getenv+4>: ld r12,480(r2) 0x81018dfe8 <00001322.plt_call.getenv+8>: mtctr r12 0x81018dfec <00001322.plt_call.getenv+12>: ld r11,496(r2) 0x81018dff0 <00001322.plt_call.getenv+16>: ld r2,488(r2) 0x81018dff4 <00001322.plt_call.getenv+20>: cmpldi r2,0 0x81018dff8 <00001322.plt_call.getenv+24>: bnectr+=20 0x81018dffc <00001322.plt_call.getenv+28>: b 0x81030f3dc = <getenv@plt> As for getenv@pl : 0x81030f3dc <getenv@plt>: li r0,925 0x81030f3e0 <getenv@plt+4>: b 0x81030d6c8 <__glink_PLTresolve> Note that 0x81018dfe0 - 0xF00E000 =3D 0x80117ffe0 . Back in the /libexec/ld-elf.so.1 /bin/ls context: (gdb) bt #0 0x000000080118d81c in ?? () #1 0x000000080118d920 in ?? () [Just after the bl = <00001322.plt_call.getenv> .] #2 0x0000000010002558 in ?? () #3 0x00000000100037b0 in ?? () #4 0x0000000001018450 in ?? () #5 0x0000000000000000 in ?? () (gdb) x/i 0x000000080118d920-0x4 0x80118d91c: bl 0x80117ffe0 So matching what was calculated earlier. (gdb) x/32i 0x81018dfe0-0xf00e000=20 0x80117ffe0: std r2,40(r1) 0x80117ffe4: ld r12,480(r2) 0x80117ffe8: mtctr r12 0x80117ffec: ld r11,496(r2) 0x80117fff0: ld r2,488(r2) 0x80117fff4: cmpldi r2,0 0x80117fff8: bnectr+=20 0x80117fffc: b 0x8013013dc (gdb) x/2i 0x8013013dc 0x8013013dc: li r0,925 0x8013013e0: b 0x8012ff6c8 0x81030d6c8 - 0x8012ff6c8 =3D 0xF00E000 Still matching tp to this point. So the two contexts seem to match up to some point after: b <__glink_PLTresolve> . I've not looked beyond this. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7757A519-9262-40CC-A3F6-77AD243DDB28>