From owner-freebsd-current Thu Jul 31 21:10:17 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id VAA24223 for current-outgoing; Thu, 31 Jul 1997 21:10:17 -0700 (PDT) Received: from austin.polstra.com (austin.polstra.com [206.213.73.10]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id VAA24218 for ; Thu, 31 Jul 1997 21:10:13 -0700 (PDT) Received: from austin.polstra.com (jdp@localhost) by austin.polstra.com (8.8.6/8.8.5) with ESMTP id VAA03660 for ; Thu, 31 Jul 1997 21:10:12 -0700 (PDT) Message-Id: <199708010410.VAA03660@austin.polstra.com> To: current@freebsd.org Subject: Re: core group topics In-Reply-To: <199707311142.EAA28471@implode.root.com> References: <199707311142.EAA28471@implode.root.com> Organization: Polstra & Co., Seattle, WA Date: Thu, 31 Jul 1997 21:10:12 -0700 From: John Polstra Sender: owner-freebsd-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In article <199707311142.EAA28471@implode.root.com>, David Greenman wrote: > The only issue I have against ELF is that I'm concerned that the overhead > for processing the much more sophisticated header at exec time might have a > serious impact on exec performance (something I'm particularly sensitive to > since I wrote the a.out exec code for FreeBSD). This is a common concern, but I'm convinced it's not a problem. I think ELF can be loaded as fast as a.out, within the limits of measurability. Well, almost as fast, anyway. People look at the ELF spec and they think, "Gads, all those different kinds of sections! It must be hard to load one of these things." But to load an ELF program or shared library, you don't even look at the section table. Instead, you use the Program Header, which is specially constructed just for this purpose. The Program Header describes text, data, and bss segments, just like a.out does. It's not much different from a.out in complexity. To illustrate, the loop that does the loading in the ELF bootloader looks like this: printf("segments:"); for (i = 0; i < head.e_phnum; i++) { ph = (Elf32_Phdr*)(phbuf + head.e_phentsize * i); if (ph->p_type == PT_LOAD) { ph->p_vaddr &= ADDR_MASK; printf(" 0x%x-0x%x", ph->p_vaddr, ph->p_vaddr + ph->p_memsz); if (ph->p_filesz > 0) { poff = ph->p_offset; xread((void *)ph->p_vaddr, ph->p_filesz); } if (ph->p_filesz < ph->p_memsz) { pbzero((void *)(ph->p_vaddr + ph->p_filesz), ph->p_memsz - ph->p_filesz); } } } printf(" \n"); The loop typically iterates over 6-8 items. Of those, only 2 satisfy the "(ph->p_type == PT_LOAD)" condition -- the rest do nothing. I think it's virtually as fast as loading an a.out file, and nobody has tried to optimize it yet. The kernel exec code in "imgact_elf.c" looks more complicated. But that's only because it contains a whole bunch of debugging cruft. There's a lot of room for optimization in it. John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Self-knowledge is always bad news." -- John Barth