Date: Thu, 27 Nov 2003 11:48:47 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Peter Wemm <peter@wemm.org> Cc: current@freebsd.org Subject: Re: fork speed vs /bin/sh Message-ID: <200311271948.hARJml0E096645@apollo.backplane.com> References: <20031127190413.6E8152A8FC@canning.wemm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:What this shows is that vfork() is 3 times faster than fork() on static :binaries, and 9 times faster on dynamic binaries. If people are :worried about a 40% slowdown, then perhaps they'd like to investigate :a speedup that works no matter whether its static or dynamic? There is :a reason that popen(3) uses vfork(). /bin/sh should too, regardless of :whether its dynamic or static. csh/tcsh already uses vfork() for the :same reason. : :NetBSD have already taken advantage of this speedup and their /bin/sh uses :vfork(). Some enterprising individual who cares about /bin/sh speed should :check out that. Start looking near #ifdef DO_SHAREDVFORK. That isn't really a fair comparison because your vfork is hitting a degenerate case and isn't actually doing anything significant. You really need to exec() something. I've included a program below that [v]fork/exec's "./sh -c exit 0" 5000 times. Dell2550, 2xCPU (MP build), DFly 0.000u 4.095s 0:02.53 161.6% 154+107k 0+0io 0pf+0w VFORK/EXEC STATIC SH 0.000u 6.681s 0:04.04 165.3% 94+97k 0+0io 0pf+0w FORK/EXEC STATIC SH 0.500u 16.844s 0:16.34 106.1% 53+84k 0+0io 0pf+0w VFORK/EXEC DYNAMIC SH 0.093u 18.303s 0:23.86 77.0% 42+79k 0+0io 0pf+0w FORK/EXEC DYNAMIC SH Athlon64, 2xCPU (UP), DFly 0.078u 0.687s 0:00.74 101.3% 399+226k 0+0io 0pf+0w VFORK/EXEC STATIC SH 0.117u 0.968s 0:01.07 100.0% 273+208k 0+0io 0pf+0w FORK/EXEC STATIC SH 2.218u 2.484s 0:04.71 99.5% 121+180k 0+0io 1pf+0w VFORK/EXEC DYNAMIC SH 2.281u 2.773s 0:04.98 101.4% 113+179k 0+0io 0pf+0w FORK/EXEC DYNAMIC SH 1.304u 2.289s 0:03.60 99.4% 121+180k 0+0io 0pf+0w VFORK/EXEC DYNAMIC SH WITH PREBINDING. 1.296u 2.648s 0:03.90 100.7% 112+180k 0+0io 1pf+0w FORK/EXEC DYNAMIC SH WITH PREBINDING. These results were rather unexpected, actually. I'm not sure why the numbers on the DELL box are so bad with a dynamic 'sh' but I suspect that the dynamic linking is blowing out the L1 cache. In anycase, taking the Athlon64 system the difference between static and dynamic is around 4 seconds while the difference between vfork and fork is only around 0.25 seconds, so while moving to vfork() helps it doesn't help all that much. Unless you happen to be hitting a boundary condition on the L1 cache, that is. If that is presumably the case on the Dell box (which only has a 16K L1 cache where as the AMD64 has a 64K L1 cache), then the difference is around 14 seconds between vfork static and vfork dynamic verses an additional 8 seconds going from vfork to fork. Vfork would probably be a significant improvement on the DELL box. Prebinding generates around a 20% overhead improvement for the dynamic 'sh' on the Athlon64 but on the Dell2550 prebinding actually made things go slower (not shown above), from 23.8 seconds to 26 seconds. I think there is an edge case due to prebinding having a greater L1 cache impact. For larger, more complex programs prebinding shows definite, if small, improvements. -Matt /* * CD into the directory containing the ./sh executable before running */ #include <sys/types.h> #include <stdio.h> #include <unistd.h> main() { int i; pid_t pid; for (i = 0; i < 5000; ++i) { if ((pid = vfork()) == 0) { /* <<<<< CHANGE THIS FORK/VFORK */ execl("./sh", "./sh", "-c", "exit", "0", NULL); write(2, "problem\n", 8); _exit(1); } if (pid > 0) waitpid(pid, NULL, 0); } return(0); }
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200311271948.hARJml0E096645>