Date: Sat, 3 Feb 2024 12:12:35 +0100 From: Mateusz Guzik <mjguzik@gmail.com> To: David Chisnall <theraven@freebsd.org> Cc: Brooks Davis <brooks@freebsd.org>, current@freebsd.org Subject: Re: libc/libsys split coming soon Message-ID: <CAGudoHEAxoqjmKZ91zwVa2BFjrkZriqZW_pLtNnQ5207yVr2Cg@mail.gmail.com> In-Reply-To: <7E8133B7-4BD5-42AB-8B16-A10F59295F28@FreeBSD.org> References: <Zb1tTz5LXuVQ5Caj@spindle.one-eyed-alien.net> <CAGudoHGoSW4Zwa9PT_O7AeQX%2B8Q9PKmoQBm9HrmdKD5PNfE0uw@mail.gmail.com> <7E8133B7-4BD5-42AB-8B16-A10F59295F28@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2/3/24, David Chisnall <theraven@freebsd.org> wrote: > On 3 Feb 2024, at 09:15, Mateusz Guzik <mjguzik@gmail.com> wrote: >> >> Binary startup is very slow, for example execve of a hello world >> binary in a Linux-based chroot on FreeBSD is faster by a factor of 2 >> compared to a native one. As such perf-wise this looks like a step in >> the wrong direction. > > Have you profiled this? Is the Linux version using BIND_NOW (which comes > with a load of problems, but it often the default for Linux systems and > reduces the number of slow-path entries into rtld)? Do they trigger the > same number of CoW faults? Is there a path in rtld that=E2=80=99s slower= than the > equivalent ld-linux.so path? > I only profiled FreeBSD, it was 4 years ago. I have neither time nor interest in working on this. Relevant excerpts from profiling an fexecve loop: Sampling what syscalls was being executed when in kernel mode (or trap): syscalls: pread 108 fstat 162 issetugid 250 sigprocmask 303 read 310 mprotect 341 open 380 close 1547 mmap 2787 trap 5421 [snip] In userspace most of the time is spent here: ld-elf.so.1`memset 406 ld-elf.so.1`matched_symbol 431 ld-elf.so.1`strcmp 1078 ld-elf.so.1`reloc_non_plt 1102 ld-elf.so.1`symlook_obj 1102 ld-elf.so.1`find_symdef 1439 find_symdef iterates a linked list, which I presume induces strcmp calls due to unwanted entries. [snip] Full profile user: libc.so.7`__je_extent_heap_new 71 libc.so.7`__vdso_clock_gettime 73 libc.so.7`memset 75 ld-elf.so.1`_rtld 83 ld-elf.so.1`getenv 85 libc.so.7`__je_malloc_mutex_boot 132 ld-elf.so.1`reloc_plt 148 ld-elf.so.1`__crt_malloc 163 ld-elf.so.1`symlook_default 166 ld-elf.so.1`digest_dynamic1 184 libc.so.7`__je_malloc_mutex_init 205 ld-elf.so.1`symlook_global 281 ld-elf.so.1`memset 406 ld-elf.so.1`matched_symbol 431 ld-elf.so.1`strcmp 1078 ld-elf.so.1`reloc_non_plt 1102 ld-elf.so.1`symlook_obj 1102 ld-elf.so.1`find_symdef 1439 kernel: kernel`vm_reserv_alloc_page 89 kernel`amd64_syscall 95 kernel`0xffffffff80 102 kernel`vm_page_alloc_domain_after 114 kernel`vm_object_deallocate 117 kernel`vm_map_pmap_enter 122 kernel`pmap_enter_object 140 kernel`uma_zalloc_arg 148 kernel`vm_map_lookup_entry 148 kernel`pmap_try_insert_pv_entry 156 kernel`vm_fault_dirty 168 kernel`pagecopy 177 kernel`vm_fault 260 kernel`get_pv_entry 265 kernel`pagezero_erms 367 kernel`pmap_enter_quick_locked 380 kernel`pmap_enter 432 kernel`0xffffffff80 1126 kernel`0xffffffff80 2017 kernel`trap 2097 syscalls: pread 108 fstat 162 issetugid 250 sigprocmask 303 read 310 mprotect 341 open 380 close 1547 mmap 2787 trap 5421 Counting fexecve: dtrace -n 'fbt::sys_fexecve:entry { @[count] =3D stack(); } tick-30s { exit= (0); }' dtrace script, can be run as: dtrace -w -x aggsize=3D128M -s script.d assumes binary name is a.out syscall::fexecve:entry { self->inexec =3D 1; } syscall::fexecve:return { self->inexec =3D 0; } fbt::trap:entry { self->trap =3D 1; } fbt::trap:return { self->trap =3D 0; } profile:::profile-4999 /execname =3D=3D "a.out" && arg1 && self->inexec =3D=3D 0/ { @user[usym(arg1)] =3D count(); } profile:::profile-4999 /execname =3D=3D "a.out" && arg0 && self->inexec =3D=3D 0 && self->trap =3D= =3D 0/ { @kernel[sym(arg0)] =3D count(); @kernel_syscall[stringof(`syscallnames[curthread->td_sa.code])] =3D count(); } profile:::profile-4999 /execname =3D=3D "a.out" && arg0 && self->inexec =3D=3D 0 && self->trap =3D= =3D 1/ { @kernel[sym(arg0)] =3D count(); @kernel_syscall["trap"] =3D count(); } tick-5s { system("clear"); trunc(@user, 20); trunc(@kernel, 20); printa("%40A %@16d\n", @user); printa("%40a %@16d\n", @kernel); clear(@user); clear(@kernel); trunc(@kernel_syscall, 10); printf("\t\t\tsyscalls:\n"); printa("%40s %@16d\n", @kernel_syscall); clear(@kernel_syscall); } fexecve-loop.c: #include <sys/types.h> #include <sys/time.h> #include <err.h> #include <fcntl.h> #include <stdlib.h> #include <stdio.h> #include <unistd.h> int main(int argc, char **argv) { char buf[8]; char *cargv[3]; int fd; switch (argc) { case 1: fd =3D open(argv[0], O_EXEC); if (fd =3D=3D -1) err(1, "open"); cargv[0] =3D argv[0]; cargv[1] =3D buf; sprintf(cargv[1], "%d", fd); cargv[2] =3D NULL; break; case 2: fd =3D atoi(argv[1]); cargv[0] =3D argv[0]; cargv[1] =3D argv[1]; cargv[2] =3D NULL; break; default: printf("invalid argc %d\n", argc); exit(1); } fexecve(fd, cargv, NULL); err(1, "fexecve"); --=20 Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHEAxoqjmKZ91zwVa2BFjrkZriqZW_pLtNnQ5207yVr2Cg>