Date: Wed, 7 Nov 2018 15:09:39 -0800 From: Mark Millard <marklmi26-fbsd@yahoo.com> To: Konstantin Belousov <kostikbel@gmail.com> Cc: svn-src-head@freebsd.org Subject: Re: svn commit: r339876 - head/libexec/rtld-elf Message-ID: <554EC215-3BC7-409A-A9C2-FFB15C039266@yahoo.com> In-Reply-To: <7757A519-9262-40CC-A3F6-77AD243DDB28@yahoo.com> References: <8E5A5F3A-F1A7-4702-A2F7-65D74CC5B2E5@yahoo.com> <20181102004101.GI5335@kib.kiev.ua> <E44F5772-1F8A-40B8-9C4E-B8362B768F37@yahoo.com> <003A49D7-6E8B-4775-A70B-E0EB44505D4B@yahoo.com> <20181102113827.GM5335@kib.kiev.ua> <7B29A4C8-228D-41CB-B594-98DFA456E9C8@yahoo.com> <20181102155234.GN5335@kib.kiev.ua> <E93B3880-281E-482C-9DA7-851398543B97@yahoo.com> <20181102185014.GP5335@kib.kiev.ua> <8FFCF603-6315-4D1C-858B-FC7233C17DD7@yahoo.com> <7757A519-9262-40CC-A3F6-77AD243DDB28@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[I note what I've failed to find a way to investigate.] On 2018-Nov-7, at 11:53, Mark Millard <marklmi26-fbsd at yahoo.com> = wrote: > [I trace code associated with bl <00001322.plt_call.getenv> > in the two contexts and extend the range over which things > appear to match: up to some point after the branch > b <__glink_PLTresolve> .] >=20 > On 2018-Nov-6, at 19:12, Mark Millard <marklmi26-fbsd at yahoo.com> = wrote: >=20 >> [I've present a little information about the longer-existing >> failure's odd backtrace for /libexec/ld-elf.so.1 /bin/ls >> --but on powerpc64 FreeBSD instead of 32-bit powerpc FreeBSD.] >>=20 >> On 2018-Nov-2, at 11:50, Konstantin Belousov <kostikbel at gmail.com> = wrote: >>=20 >>> On Fri, Nov 02, 2018 at 10:38:08AM -0700, Mark Millard wrote: >>>> On 2018-Nov-2, at 8:52 AM, Konstantin Belousov <kostikbel at = gmail.com> wrote: >>>>=20 >>>>> . . . >>>>=20 >>>> That seems better. But it crashes during /bin/ls execution >>>> ( 0x0180???? addresses ), apparently in a library routine >>>> ( 0x41?????? addresses ): >>>>=20 >>>> Program received signal SIGSEGV, Segmentation fault. >>>> 0x411220b4 in ?? () >>>> (gdb) bt >>>> #0 0x411220b4 in ?? () >>>> #1 0x4112200c in ?? () >>>> #2 0x01803c84 in ?? () >>>> #3 0x018023b4 in ?? () >>>> #4 0x010121a0 in .rtld_start () at = /usr/src/libexec/rtld-elf/powerpc/rtld_start.S:112 >>>>=20 >>>> Using a normal gdb run of /bin/ls suggests: >>>>=20 >>>> #2 0x01803c84 in ?? () should be in main and seems to be: bl = 0x1818914 <getopt_long@plt> >>>> #3 0x018023b4 in ?? () should be in _start >>>>=20 >>>> Looking in the test context: >>>>=20 >>>> 0x1803c80: bl 0x1818914 >>>> 0x1803c84: cmpwi cr7,r3,-1 >>>>=20 >>>> and: >>>>=20 >>>> 0x1818914: li r11,59 >>>> 0x1818918: b 0x18186f4 >>>>=20 >>>> and: >>>>=20 >>>> 0x18186f4: rlwinm r11,r11,2,0,29 >>>> 0x18186f8: addis r11,r11,386 >>>> 0x18186fc: lwz r11,-30316(r11) >>>> 0x1818700: mtctr r11 >>>> 0x1818704: bctr >>>>=20 >>>> Breaking at the bctr and using info reg: >>>>=20 >>>> r11 0x4125ffa0 1093009312 >>>>=20 >>>> It looks like there is some amount of >>>> activity before the traceback addresses >>>> show up. >>>>=20 >>>> I've not found a good way to fill in the "in ??()" >>>> (or analogous) information. The addresses 0x411220?? >>>> do not match up with a normal run of /bin/ls from >>>> gdb: the addresses can not be accessed. >>>>=20 >>>>=20 >>>>=20 >>>> It does appear that the code is in /lib/libc.so.7 in the >>>> test context: >>>>=20 >>>> Breakpoint 2, reloc_non_plt (obj=3D0x41041600, obj_rtld=3D0x41104b57,= flags=3D4, lockstate=3D0x0) at = /usr/src/libexec/rtld-elf/powerpc/reloc.c:338 >>>> . . . >>>>=20 >>> There seems to be an issue with the direct execution mode on ppc. >>> Even otherwise working ld-elf.so.1 segfaults if I try to use it as >>> standalone binary. >>>=20 >>> But if I specify patched ld-elf.so.1 as the interpreter for some = program, >>> using 'cc -Wl,-I,<path>/ld-elf.so.1' it works. So I see there two = bugs, >>> one is regression due to textsize calculation, which should be fixed = by >>> my patch. Another is the direct exec problem. >>=20 >> I've got a little more information about the odd backtrace >> from the /libexec/ld-elf.so.1 /bin/ls failure that the >> prior patch allowed getting to, although for a powerpc64 >> example context. >>=20 >> The information is only identifying where the code was >> in /bin/ls and /lib/libc.so.1 in the backtrace. For >> libc.so.1 I found the same code sequences in a gdb of >> /bin/ls directly, matching one first, using the addresses >> vs. in the /libexec/ld-elf.so.1 /bin/ls process to >> find offsets for going back and forth, and then used >> that two find the 2nd backtrace addresses material. >>=20 >> Overall it suggests to me that (in somewhat=20 >> symbolic terms): >>=20 >> bl <00001322.plt_call.getenv> >>=20 >> eventually lead to executing the wrong code. >>=20 >>=20 >> The supporting detail is as follows. >>=20 >> The /libexec/ld-elf.so.1 part of the backtrace was >> easy to find where the code was: >>=20 >> (gdb) run /bin/ls >> Starting program: /libexec/ld-elf.so.1 /bin/ls >>=20 >> Program received signal SIGSEGV, Segmentation fault. >> 0x000000080118d81c in ?? () >> (gdb) bt >> #0 0x000000080118d81c in ?? () >> #1 0x000000080118d920 in ?? () >> #2 0x0000000010002558 in ?? () >> #3 0x00000000100037b0 in ?? () >> #4 0x0000000001018450 in ._rtld_start () at = /usr/src/libexec/rtld-elf/powerpc64/rtld_start.S:104 >> Backtrace stopped: frame did not save the PC >>=20 >> (gdb)=20 >> 101 ld %r7,128(%r1) /* exit proc */ >> 102 ld %r8,136(%r1) /* ps_strings */ >> 103=09 >> 104 blrl /* _start(argc, argv, envp, obj, cleanup, = ps_strings) */ >> 105=09 >> 106 li %r0,1 /* _exit() */ >> 107 sc >>=20 >>=20 >> The /bin/ls part of the backtrace was easy to find >> were the code was: >>=20 >> (gdb) symbol-file /bin/ls >> Load new symbol table from "/bin/ls"? (y or n) y >> Reading symbols from /bin/ls...Reading symbols from = /usr/lib/debug//bin/ls.debug...done. >> done. >> (gdb) bt >> #0 0x000000080118d81c in ?? () >> #1 0x000000080118d920 in ?? () >> #2 0x0000000010002558 in main (argc=3D<optimized out>, = argv=3D0x80134bdb0) at /usr/src/bin/ls/ls.c:268 >> #3 0x00000000100037b0 in _start (argc=3D<optimized out>, = argv=3D0x3fffffffffffdb70, env=3D0x3fffffffffffdb88, obj=3D<optimized = out>, cleanup=3D<optimized out>, ps_strings=3D<optimized out>) >> at /usr/src/lib/csu/powerpc64/crt1.c:96 >> #4 0x0000000001018450 in ?? () >> #5 0x0000000000000000 in ?? () >>=20 >> (gdb) fr 3=20 >> #3 0x00000000100037b0 in _start (argc=3D<optimized out>, = argv=3D0x3fffffffffffdb70, env=3D0x3fffffffffffdb88, obj=3D<optimized = out>, cleanup=3D<optimized out>, ps_strings=3D<optimized out>) >> at /usr/src/lib/csu/powerpc64/crt1.c:96 >> 96 exit(main(argc, argv, env)); >> (gdb) down >> #2 0x0000000010002558 in main (argc=3D<optimized out>, = argv=3D0x80134bdb0) at /usr/src/bin/ls/ls.c:268 >> 268 while ((ch =3D getopt_long(argc, argv, >>=20 >>=20 >>=20 >> For the messy lib.libc.so.1 part of the backtrace both >> addresses are in getopt_internal. I show extractions from >> the the gdb /bin/ls output because it has helpful symbolic >> information displayed. But that means that the addresses >> are offset from those in the bt for the failure process. >>=20 >> For #1 0x000000080118d920 in ?? () I end up with: >>=20 >> (gdb) x/32i 0x81019b6c0+0xad0-0x880 >> 0x81019b910 <getopt_internal+592>: stw r9,0(r18) >> 0x81019b914 <getopt_internal+596>: addis r3,r2,-5 >> 0x81019b918 <getopt_internal+600>: addi r3,r3,30120 >> 0x81019b91c <getopt_internal+604>: bl 0x81018dfe0 = <00001322.plt_call.getenv> >> 0x81019b920 <getopt_internal+608>: ld r2,40(r1) >>=20 >> (The machine code around it all matches around >> 0x000000080118d920 in the failure context.) >>=20 >> The getenv call in the source is the 2nd line of: >>=20 >> if (posixly_correct =3D=3D -1 || optreset) >> posixly_correct =3D (getenv("POSIXLY_CORRECT") !=3D = NULL); >>=20 >> For #0 0x000000080118d81c in ?? () I end up with: >>=20 >> (gdb) x/32i 0x81019b6c0+0xad0-0x880-0x110 >> 0x81019b800 <getopt_internal+320>: bne cr7,0x81019b868 = <getopt_internal+424> >> 0x81019b804 <getopt_internal+324>: lwa r5,0(r29) >> 0x81019b808 <getopt_internal+328>: stw r17,0(r18) >> 0x81019b80c <getopt_internal+332>: cmpw cr7,r5,r19 >> 0x81019b810 <getopt_internal+336>: bge cr7,0x81019ba60 = <getopt_internal+928> >> 0x81019b814 <getopt_internal+340>: rldicr r9,r5,3,60 >> 0x81019b818 <getopt_internal+344>: ldx r10,r20,r9 >> 0x81019b81c <getopt_internal+348>: lbz r9,0(r10) >>=20 >> with the failure being that r10 is zero in that last >> line above. Again the surrounding code matches. >>=20 >> The source code line is reported to be: >>=20 >> if (*(place =3D nargv[optind]) !=3D '-' || >>=20 >> I got the line number information from breakpoints 3 and 4 >> below (from the gdb /bin/ls process): >>=20 >> (gdb) info br >> Num Type Disp Enb Address What >> 1 breakpoint keep y 0x0000000010002360 in main at = /usr/src/bin/ls/ls.c:231 >> breakpoint already hit 1 time >> 3 breakpoint keep y 0x000000081019b81c in getopt_internal = at /usr/src/lib/libc/stdlib/getopt_long.c:411 >> 4 breakpoint keep y 0x000000081019b91c in getopt_internal = at /usr/src/lib/libc/stdlib/getopt_long.c:379 >>=20 >> Line 379 has the getenv call, matching the machine code showing >> the call. >>=20 >> (I set the breakpoints just as a way of using "info br" to list >> the information later.) >>=20 >> Overall this seems to suggest that: >>=20 >> bl <00001322.plt_call.getenv> >>=20 >> lead to something odd happening and got to the wrong >> code. >>=20 >> That is all the additional information that I have >> at this point. I hope it is of some use. >>=20 >=20 > I'll note that the normal cases execution does the > getenv call but does not execute the lbz r9,0(r10) > related code. >=20 > I'll also note that for the libc.so.1 code > the /libexec/ld-elf.so.1 /bin/ls code > addresses are bigger than the /bin/ls > addresses by: >=20 > 0x81019b920 - 0x80118d920 =3D 0xF00E000 >=20 > I use this to go back and forth, checking for matching > code as I go. >=20 > Presenting the normal /bin/ls use in gdb first for > up to b <__glink_PLTresolve> : >=20 > I'd already shown: >=20 > 0x81019b91c <getopt_internal+604>: bl 0x81018dfe0 = <00001322.plt_call.getenv> >=20 > Looking: >=20 > 0x81018dfe0 <00001322.plt_call.getenv>: std r2,40(r1) > 0x81018dfe4 <00001322.plt_call.getenv+4>: ld r12,480(r2) > 0x81018dfe8 <00001322.plt_call.getenv+8>: mtctr r12 > 0x81018dfec <00001322.plt_call.getenv+12>: ld r11,496(r2) > 0x81018dff0 <00001322.plt_call.getenv+16>: ld r2,488(r2) > 0x81018dff4 <00001322.plt_call.getenv+20>: cmpldi r2,0 > 0x81018dff8 <00001322.plt_call.getenv+24>: bnectr+=20 > 0x81018dffc <00001322.plt_call.getenv+28>: b 0x81030f3dc = <getenv@plt> >=20 > As for getenv@pl : >=20 > 0x81030f3dc <getenv@plt>: li r0,925 > 0x81030f3e0 <getenv@plt+4>: b 0x81030d6c8 <__glink_PLTresolve> >=20 >=20 > Note that 0x81018dfe0 - 0xF00E000 =3D 0x80117ffe0 . >=20 > Back in the /libexec/ld-elf.so.1 /bin/ls context: >=20 > (gdb) bt > #0 0x000000080118d81c in ?? () > #1 0x000000080118d920 in ?? () [Just after the bl = <00001322.plt_call.getenv> .] > #2 0x0000000010002558 in ?? () > #3 0x00000000100037b0 in ?? () > #4 0x0000000001018450 in ?? () > #5 0x0000000000000000 in ?? () >=20 > (gdb) x/i 0x000000080118d920-0x4 > 0x80118d91c: bl 0x80117ffe0 >=20 > So matching what was calculated earlier. >=20 > (gdb) x/32i 0x81018dfe0-0xf00e000=20 > 0x80117ffe0: std r2,40(r1) > 0x80117ffe4: ld r12,480(r2) > 0x80117ffe8: mtctr r12 > 0x80117ffec: ld r11,496(r2) > 0x80117fff0: ld r2,488(r2) > 0x80117fff4: cmpldi r2,0 > 0x80117fff8: bnectr+=20 > 0x80117fffc: b 0x8013013dc >=20 > (gdb) x/2i 0x8013013dc > 0x8013013dc: li r0,925 > 0x8013013e0: b 0x8012ff6c8 >=20 > 0x81030d6c8 - 0x8012ff6c8 =3D 0xF00E000 >=20 > Still matching tp to this point. >=20 > So the two contexts seem to match up to > some point after: b <__glink_PLTresolve> . >=20 > I've not looked beyond this. [ Based on normal-case text for better symbolic information . . . ] For the failing context and its use of the below code (presentation edited): 00001322.plt_call.getenv>: std r2,40(r1) 00001322.plt_call.getenv+4>: ld r12,480(r2) 00001322.plt_call.getenv+8>: mtctr r12 00001322.plt_call.getenv+12>: ld r11,496(r2) 00001322.plt_call.getenv+16>: ld r2,488(r2) 00001322.plt_call.getenv+20>: cmpldi r2,0 00001322.plt_call.getenv+24>: bnectr+=20 I've not come up with a way to investigate the potential indirect jump (bnectr+) and what sets up for it. (The branch following the bnectr+ seems okay.) Similarly relative to the bctr in (edited): (gdb) disass __glink_PLTresolve Dump of assembler code for function __glink_PLTresolve: 0>: mflr r12 +4>: bcl 20,4*cr7+so, <__glink_PLTresolve+8> +8>: mflr r11 +12>: ld r2,-16(r11) +16>: mtlr r12 +20>: add r11,r2,r11 +24>: ld r12,0(r11) +28>: ld r2,8(r11) +32>: mtctr r12 +36>: ld r11,16(r11) +40>: bctr End of assembler dump. Registers such as ctr, r12, and r11 seem to have been replaced by the time of the crash. (r12 seems to point to strncmp and ctr has the value 0xf . r11 is 0x0 .) At this point, it does not look like I'll be much help for analyzing this failure on the powerpc families. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?554EC215-3BC7-409A-A9C2-FFB15C039266>