From owner-freebsd-current Thu Sep 26 23:22:08 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id XAA00802 for current-outgoing; Thu, 26 Sep 1996 23:22:08 -0700 (PDT) Received: from spinner.DIALix.COM (root@spinner.DIALix.COM [192.203.228.67]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id XAA00670 for ; Thu, 26 Sep 1996 23:21:58 -0700 (PDT) Received: from spinner.DIALix.COM (peter@localhost.DIALix.oz.au [127.0.0.1]) by spinner.DIALix.COM (8.7.6/8.7.3) with ESMTP id OAA06513; Fri, 27 Sep 1996 14:21:21 +0800 (WST) Message-Id: <199609270621.OAA06513@spinner.DIALix.COM> X-Mailer: exmh version 1.6.7 5/3/96 To: Bruce Evans cc: freebsd-current@FreeBSD.org Subject: Re: BLOAT in minimal programs In-reply-to: Your message of "Fri, 27 Sep 1996 12:01:44 +1000." <199609270201.MAA05537@godzilla.zeta.org.au> Date: Fri, 27 Sep 1996 14:21:21 +0800 From: Peter Wemm Sender: owner-current@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk Bruce Evans wrote: > >peter@spinner[11:19pm]/tmp-973> cc -c foo.c > >peter@spinner[11:19pm]/tmp-974> cc -v -static -o foo.exe foo.o > >gcc version 2.7.2.1 > > /usr/bin/ld -e start -dc -dp -Bstatic -o foo.exe /usr/lib/scrt0.o foo.o /us r/lib/libgcc.a -lc /usr/lib/libgcc.a > > > >[add the -M flag to ld to see the map output ] > > > >peter@spinner[11:20pm]/tmp-975> /usr/bin/ld -M -e start -dc -dp -Bstatic -o foo.exe /usr/lib/scrt0.o foo.o /usr/lib/libgcc.a -lc /usr/lib/libgcc.a | he ad > >/usr/lib/libgcc.a(__main.o) needed due to ___main > > It's easier to add the -M to the cc command: > > cc -static -o foo foo.c -Wl,-M > ^^^^ pass following options to linker > ^^ desired linker flag Ahh, doesn't suprise me. I was more interested in showing the dependency list sequence than the most efficient way of getting it. :-] > >So, in a nutshell if you want a small static program that doesn't use > >C++ anywhere: > > > >peter@spinner[11:27pm]/tmp-985> cat foo.c > >__main() { /* dummy stub */ } > > > >main() > >{ > >} > > Except that exit() should be attached to atexit() there if stdio > is linked. Currently we use the special method of calling stdio's > _cleanup() from exit() through the function pointer __cleanup. This seems to date back to the dark ages as far as I can see. There's a reference to something like _cleanup() or (*_cleanup)() in the fake exit() routine in the older libgcc2.c > >peter@spinner[11:27pm]/tmp-988> size foo.exe > >text data bss dec hex > >4096 4096 0 8192 2000 > > > >Now, you can't get smaller than that without ELF. > > I got considerably smaller sizes using a.out under Minix, from a > space-optimized stdio (1200 bytes for putc(), 4528 bytes for printf()) > and a __LDPGSZ of 16. How does ELF handle paging if its sizes aren't > multiples of PAGE_SIZE? We can produce a 272 byte nmagic or omagic file, but we cannot execute them. This is probably a weakness in our a.out image activator, but I doubt many people care about read/write text, non-demand-paged executables. ELF handles it by double-mapping the pages. The first page is mapped PROT_READ|PROT_EXECUTE and MAP_SHARED, while the same page is mapped again PROT_ALL + MAP_PRIVATE. So, some of the text appears before the "official" start of the data segment and so on. However, the elf crt1.o that we currently have pulls in stdio via errx and strerror, so I can't quite test the same thing under elf for comparison until jdp's elf crt1.o gets the "brutal optimization" treatment. It only has a single compile mode, which has dynamic code support. The a.out case has a special "static only" mode, so it's not yet an apples-vs-apples comparison. Doing it dynamic though: peter@spinner[1:51pm]/tmp-230> elf-cc -s -o foo.exe foo.c peter@spinner[1:51pm]/tmp-231> elf-size foo.exe text data bss dec hex filename 556 799 8 1363 553 foo.exe peter@spinner[1:51pm]/tmp-232> l foo.exe 3 -rwxr-xr-x 1 peter bin 2680 Sep 27 13:51 foo.exe* Versus a.out: peter@spinner[1:52pm]/tmp-233> cc -s -o foo.exe foo.c peter@spinner[1:52pm]/tmp-234> size foo.exe text data bss dec hex 4096 4096 0 8192 2000 peter@spinner[1:52pm]/tmp-235> l foo.exe 8 -rwxr-xr-x 1 peter bin 8192 Sep 27 13:52 foo.exe* Using objdump to look at the headers reveals the virtual address space internals: peter@spinner[2:05pm]/tmp-259> elf-objdump --headers foo.exe foo.exe: file format a.out-i386-freebsd Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000fe0 00001020 00001020 00000020 2**3 CONTENTS, ALLOC, LOAD, CODE 1 .data 00001000 00002000 00002000 00001000 2**3 CONTENTS, ALLOC, LOAD, DATA 2 .bss 00000000 00003000 00003000 00000000 2**3 ALLOC peter@spinner[2:05pm]/tmp-260> elf-objdump --headers efoo.exe efoo.exe: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn 0 .interp 00000019 080480d4 080480d4 000000d4 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .hash 00000048 080480f0 080480f0 000000f0 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .dynsym 000000d0 08048138 08048138 00000138 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .dynstr 00000071 08048208 08048208 00000208 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .rel.plt 00000010 0804827c 0804827c 0000027c 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .init 00000006 0804828c 0804828c 0000028c 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 6 .plt 00000030 08048294 08048294 00000294 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 7 .text 000001f0 080482c4 080482c4 000002c4 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 8 .fini 00000006 080484b4 080484b4 000004b4 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 9 .rodata 000000d5 080484ba 080484ba 000004ba 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 10 .data 00000004 08049590 08049590 00000590 2**2 CONTENTS, ALLOC, LOAD, DATA 11 .ctors 00000008 08049594 08049594 00000594 2**2 CONTENTS, ALLOC, LOAD, DATA 12 .dtors 00000008 0804959c 0804959c 0000059c 2**2 CONTENTS, ALLOC, LOAD, DATA 13 .got 00000014 080495a4 080495a4 000005a4 2**2 CONTENTS, ALLOC, LOAD, DATA 14 .dynamic 00000070 080495b8 080495b8 000005b8 2**2 CONTENTS, ALLOC, LOAD, DATA 15 .bss 00000008 08049628 08049628 00000628 2**2 ALLOC 16 .note 00000050 00000000 00000000 00000628 2**0 CONTENTS, READONLY 17 .comment 00000048 00000000 00000000 00000678 2**0 CONTENTS, READONLY As you can see, there's a *LOT* of extra stuff in the ELF headers. A lot of people have looked at this and said "Aargh! It'll never be as fast as our a.out dynamic implementation!". Well, relax, what you see there is the detailed information. The kernel executable loader and dynamic linker have custom tables optimised specifically for them: peter@spinner[2:07pm]/tmp-262> elf-objdump --private-headers efoo.exe efoo.exe: file format elf32-i386 Program Header: PHDR off 0x00000034 vaddr 0x08048034 paddr 0x08048034 align 2**2 filesz 0x000000a0 memsz 0x000000a0 flags r-x INTERP off 0x000000d4 vaddr 0x080480d4 paddr 0x080480d4 align 2**0 filesz 0x00000019 memsz 0x00000019 flags r-- LOAD off 0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12 filesz 0x0000058f memsz 0x0000058f flags r-x LOAD off 0x00000590 vaddr 0x08049590 paddr 0x08049590 align 2**12 filesz 0x00000098 memsz 0x000000a0 flags rw- DYNAMIC off 0x000005b8 vaddr 0x080495b8 paddr 0x080495b8 align 2**2 filesz 0x00000070 memsz 0x00000070 flags rw- Dynamic Section: NEEDED libc.so.1 INIT 0x804828c FINI 0x80484b4 HASH 0x80480f0 STRTAB 0x8048208 SYMTAB 0x8048138 STRSZ 0x71 SYMENT 0x10 DEBUG 0x0 PLTGOT 0x80495a4 PLTRELSZ 0x10 PLTREL 0x11 JMPREL 0x804827c The "program header" is for the executable loader. PHDR is for kernel, approximately equivalent of the a.out entry address and is not used once the executable is launched. The "INTERP" section is so that the kernel loads the ld.so and the executable in one go, rather than the a.out case where the kernel loads the executable and the executable's crt0.o mmap's ld.so via a heap of syscalls. The ELF format is more efficient here. The two LOAD sections are the text and combined data+bss sections. DYNAMIC is for the ld.so to find it's header quickly. SVR4 goes a little further than we would. They have ld.so built into libc.so.1, so the kernel loads the executable, the dynamic linker and libc.so all in a single go. Unfortunately for us, this means we can't do versioning or support LD_PRELOAD very well. Although, thinking about it, there would be nothing stopping us doing it and having the major number specified (ie: specify libc.so.3), and after starting, we simply compare a compiled-in minor version number of the libc.so.3 that we got and make sure it's new enough. This would be effectively the same as what we do with a.out, where we load the "latest" minor number, and print a warning if it's not new enough. I guess LD_PRELOAD wouldn't be too hard to support if ld.so builds it's symbol search paths properly. > Bruce Cheers, -Peter