Date: Tue, 25 Apr 1995 14:40:21 +1000 From: Bruce Evans <bde@zeta.org.au> To: terry@cs.weber.edu, toor@jsdinc.root.com Cc: geli.com!rcarter@implode.root.com, hackers@FreeBSD.org, jkh@violet.berkeley.edu Subject: Re: benchmark hell.. Message-ID: <199504250440.OAA15562@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>The correct way to run comparative benchmarks is to boot a DOS disk >and fdisk/mbr the same machine and installl on the same machine over >and over with the different OS's. Not "identical hardware", the same >machine. What is ``DOS''? :-> The correct way is to boot a boot disk for the OS being tested and erase all traces of the previous OS... >The first is context switch. There are several significan differences >in the way context switch takes place in BSD and Linux. The BSD model >for the actual switch itself is very close to the UnixWare/Solaris model, >but is missing delayed storage of the FPU registers on a switch. This is >because BSD really doesn't have its act together regarding the FPU, and >can't really be corrected until it does. On hardware that does proper Actually, this is because FreeBSD doesn't waste a whole 108 bytes in the proc table for the FPU state and no one wants to handle the complications and probable slowness of updating paged-out FPU contexts after delayed FPU context switches. >exception handling (like the Pentiums tested), the FPU context can be >thrown out to the process it belongs to after being delayed over several >context switches previous on the basis of "uses FPU" being set in the >process or not, and a soft interrupt of the FPU as if trapping to an >emulator to tag the first reference in each process. Pretty much all >the UNIX implementations and Linux do this, but BSD does not. >It should be pretty obvious that for a benchmark, when there is a single >program doing FPU crap, that the FPU delayed switchout means no switch >actually occurs during the running of the benchmark. You can think of >this as a benchmark cheat, since it is a large locality of reference >hack, in effect. It takes a fairly special benchmark to demonstrate the speed advantages of delayed context switches. If there are multiple processes all using the FPU then non-delayed switching is slighly faster. If there are many more processes not using the FPU than there are processes using it, then most context switches don't switch the FPU. For real processes, those using the FPU a lot are likely to be CPU hogs that get context switched very rarely so the extra cost for immediately switching the FPU context is insignificant. FreeBSD's low level context switching is faster than Linux's because hardware tasking is not used. Perhaps there is a lot more bloat in other layers of the context switching. (Yes, there is. E.g., calling microtime() for each context switch is very expensive except on Pentiums). microtime() has to be called so that FreeBSD can do better timing statistics and scheduling than Linux. ) However, for real processes, context switching is relatively rare, so small differences (less than a factor of 2-10) in the speed of context switching don't matter. >The system call overhead in BSD is typically larger. This is because >of address range checking for copyin/copyout operations. Linux has Actually, this is because FreeBSD has more layers. >split this up into a seperate check call and copy operations, which is >more prone to programmer error leaving security holes than an integral >copy/check, but they have an advantage when it comes to multiple use >memoy regions because of this (areas that are copied from several times >or which are copied both in and out). Actually, copyin/copyout are faster in FreeBSD, except on 386's. For copyin, the check consists of setting up a fault handler, checking that the addresses are covered by the user segment registers, and letting the h/w check for page faults. For copyout, the page tables have to be checked directly only for 386's. I think the Linux advantage for syscalls is that copyin is usually not used at all. The args are in registers. >Linux, as part of this, has no copyinstr. Instead, they use a routine >called "getpathname". This not only allows them to special case the >code, it also allows them greater flexibility than traditional copyinstr >implementations when it comes to internationalization. Since the only copyinstr() is poorly implemented iin FreeBSD. However, I've never seen it showing up in profiling output. >Finally, the pipe overhead is traceable to system call overhead, the pipe >implementation itself, and the file system stack coeelescing being a >little less than desirable. This seems likely. The BYTE benchmark article didn't mention the exact syscalls used so it's not clear if the pipe benchmark is valid. lmbench has a "syscall" overhead benchmark that actually tests i/o of one byte to a file. Linux is much faster because there are less vfs layers, not because syscalls are faster. Pipe benchmarks involving small amounts of data (as would be best if pipes are being used for process synchronization) are likely to have to same problem. Pipe benchmarks involving a large amount of data should reduce to benchmarking bcopy() (at east is the implmentation is naive enough to always actually do the copy). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199504250440.OAA15562>