Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 Sep 1996 14:21:21 +0800
From:      Peter Wemm <peter@spinner.DIALix.COM>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        freebsd-current@FreeBSD.org
Subject:   Re: BLOAT in minimal programs 
Message-ID:  <199609270621.OAA06513@spinner.DIALix.COM>
In-Reply-To: Your message of "Fri, 27 Sep 1996 12:01:44 %2B1000." <199609270201.MAA05537@godzilla.zeta.org.au> 

next in thread | previous in thread | raw e-mail | index | archive | help

Bruce Evans wrote:
> >peter@spinner[11:19pm]/tmp-973> cc -c foo.c
> >peter@spinner[11:19pm]/tmp-974> cc -v -static -o foo.exe foo.o
> >gcc version 2.7.2.1
> > /usr/bin/ld -e start -dc -dp -Bstatic -o foo.exe /usr/lib/scrt0.o foo.o /us
    r/lib/libgcc.a -lc /usr/lib/libgcc.a
> >
> >[add the -M flag to ld to see the map output ]
> >
> >peter@spinner[11:20pm]/tmp-975> /usr/bin/ld -M -e start -dc -dp -Bstatic -o 
    foo.exe /usr/lib/scrt0.o foo.o /usr/lib/libgcc.a -lc /usr/lib/libgcc.a 
| he
    ad
> >/usr/lib/libgcc.a(__main.o) needed due to ___main
> 
> It's easier to add the -M to the cc command:
> 
> 	cc -static -o foo foo.c -Wl,-M
> 	                        ^^^^ pass following options to linker
> 				    ^^ desired linker flag

Ahh, doesn't suprise me.  I was more interested in showing the dependency 
list sequence than the most efficient way of getting it. :-]

> >So, in a nutshell if you want a small static program that doesn't use
> >C++ anywhere:
> >
> >peter@spinner[11:27pm]/tmp-985> cat foo.c
> >__main() { /* dummy stub */ }
> >
> >main()
> >{
> >}
> 
> Except that exit() should be attached to atexit() there if stdio
> is linked.  Currently we use the special method of calling stdio's
> _cleanup() from exit() through the function pointer __cleanup.

This seems to date back to the dark ages as far as I can see.  There's a 
reference to something like _cleanup() or (*_cleanup)() in the fake exit() 
routine in the older libgcc2.c

> >peter@spinner[11:27pm]/tmp-988> size foo.exe
> >text    data    bss     dec     hex
> >4096    4096    0       8192    2000
> >
> >Now, you can't get smaller than that without ELF.
> 
> I got considerably smaller sizes using a.out under Minix, from a
> space-optimized stdio (1200 bytes for putc(), 4528 bytes for printf())
> and a __LDPGSZ of 16.  How does ELF handle paging if its sizes aren't
> multiples of PAGE_SIZE?

We can produce a 272 byte nmagic or omagic file, but we cannot execute 
them. This is probably a weakness in our a.out image activator, but I 
doubt many people care about read/write text, non-demand-paged executables.

ELF handles it by double-mapping the pages.  The first page is mapped 
PROT_READ|PROT_EXECUTE and MAP_SHARED, while the same page is mapped again 
PROT_ALL + MAP_PRIVATE.  So, some of the text appears before the 
"official" start of the data segment and so on.

However, the elf crt1.o that we currently have pulls in stdio via errx and 
strerror, so I can't quite test the same thing under elf for comparison 
until jdp's elf crt1.o gets the "brutal optimization" treatment.  It only 
has a single compile mode, which has dynamic code support.  The a.out case 
has a special "static only" mode, so it's not yet an apples-vs-apples 
comparison.

Doing it dynamic though:

peter@spinner[1:51pm]/tmp-230> elf-cc -s -o foo.exe foo.c
peter@spinner[1:51pm]/tmp-231> elf-size foo.exe
text    data    bss     dec     hex     filename
556     799     8       1363    553     foo.exe
peter@spinner[1:51pm]/tmp-232> l foo.exe 
3 -rwxr-xr-x  1 peter  bin  2680 Sep 27 13:51 foo.exe*

Versus a.out:

peter@spinner[1:52pm]/tmp-233> cc -s -o foo.exe foo.c
peter@spinner[1:52pm]/tmp-234> size foo.exe 
text    data    bss     dec     hex
4096    4096    0       8192    2000
peter@spinner[1:52pm]/tmp-235> l foo.exe 
 8 -rwxr-xr-x  1 peter  bin  8192 Sep 27 13:52 foo.exe*

Using objdump to look at the headers reveals the virtual address space 
internals:

peter@spinner[2:05pm]/tmp-259> elf-objdump --headers foo.exe

foo.exe:     file format a.out-i386-freebsd

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000fe0  00001020  00001020  00000020  2**3
                  CONTENTS, ALLOC, LOAD, CODE
  1 .data         00001000  00002000  00002000  00001000  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00003000  00003000  00000000  2**3
                  ALLOC

peter@spinner[2:05pm]/tmp-260> elf-objdump --headers efoo.exe

efoo.exe:     file format elf32-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .interp       00000019  080480d4  080480d4  000000d4  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .hash         00000048  080480f0  080480f0  000000f0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .dynsym       000000d0  08048138  08048138  00000138  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .dynstr       00000071  08048208  08048208  00000208  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .rel.plt      00000010  0804827c  0804827c  0000027c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .init         00000006  0804828c  0804828c  0000028c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  6 .plt          00000030  08048294  08048294  00000294  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  7 .text         000001f0  080482c4  080482c4  000002c4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  8 .fini         00000006  080484b4  080484b4  000004b4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  9 .rodata       000000d5  080484ba  080484ba  000004ba  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 .data         00000004  08049590  08049590  00000590  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 11 .ctors        00000008  08049594  08049594  00000594  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 12 .dtors        00000008  0804959c  0804959c  0000059c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 13 .got          00000014  080495a4  080495a4  000005a4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 14 .dynamic      00000070  080495b8  080495b8  000005b8  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 15 .bss          00000008  08049628  08049628  00000628  2**2
                  ALLOC
 16 .note         00000050  00000000  00000000  00000628  2**0
                  CONTENTS, READONLY
 17 .comment      00000048  00000000  00000000  00000678  2**0
                  CONTENTS, READONLY

As you can see, there's a *LOT* of extra stuff in the ELF headers.  A lot 
of people have looked at this and said "Aargh! It'll never be as fast as 
our a.out dynamic implementation!".  Well, relax, what you see there is 
the detailed information.  The kernel executable loader and dynamic linker 
have custom tables optimised specifically for them:
peter@spinner[2:07pm]/tmp-262> elf-objdump --private-headers efoo.exe 

efoo.exe:     file format elf32-i386

Program Header:
    PHDR off    0x00000034 vaddr 0x08048034 paddr 0x08048034 align 2**2
         filesz 0x000000a0 memsz 0x000000a0 flags r-x
  INTERP off    0x000000d4 vaddr 0x080480d4 paddr 0x080480d4 align 2**0
         filesz 0x00000019 memsz 0x00000019 flags r--
    LOAD off    0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12
         filesz 0x0000058f memsz 0x0000058f flags r-x
    LOAD off    0x00000590 vaddr 0x08049590 paddr 0x08049590 align 2**12
         filesz 0x00000098 memsz 0x000000a0 flags rw-
 DYNAMIC off    0x000005b8 vaddr 0x080495b8 paddr 0x080495b8 align 2**2
         filesz 0x00000070 memsz 0x00000070 flags rw-

Dynamic Section:
  NEEDED      libc.so.1
  INIT        0x804828c
  FINI        0x80484b4
  HASH        0x80480f0
  STRTAB      0x8048208
  SYMTAB      0x8048138
  STRSZ       0x71
  SYMENT      0x10
  DEBUG       0x0
  PLTGOT      0x80495a4
  PLTRELSZ    0x10
  PLTREL      0x11
  JMPREL      0x804827c

The "program header" is for the executable loader.  PHDR is for kernel, 
approximately equivalent of the a.out entry address and is not used once 
the executable is launched.  The "INTERP" section is so that the kernel 
loads the ld.so and the executable in one go, rather than the a.out case 
where the kernel loads the executable and the executable's crt0.o mmap's 
ld.so via a heap of syscalls.  The ELF format is more efficient here.  The 
two LOAD sections are the text and combined data+bss sections.  DYNAMIC is 
for the ld.so to find it's header quickly.

SVR4 goes a little further than we would.  They have ld.so built into 
libc.so.1, so the kernel loads the executable, the dynamic linker and 
libc.so all in a single go.  Unfortunately for us, this means we can't do 
versioning or support LD_PRELOAD very well.  Although, thinking about it, 
there would be nothing stopping us doing it and having the major number 
specified (ie: specify libc.so.3), and after starting, we simply compare a 
compiled-in minor version number of the libc.so.3 that we got and make 
sure it's new enough.  This would be effectively the same as what we do 
with a.out, where we load the "latest" minor number, and print a warning 
if it's not new enough.  I guess LD_PRELOAD wouldn't be too hard to 
support if ld.so builds it's symbol search paths properly.

> Bruce

Cheers,
-Peter





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609270621.OAA06513>