Date: Tue, 15 Aug 2006 07:10:47 +0800 From: "Intron" <mag@intron.ac> To: freebsd-hackers@freebsd.org Subject: The optimization of malloc(3): FreeBSD vs GNU libc Message-ID: <courier.44E102F7.00004C34@intron.ac>
next in thread | raw e-mail | index | archive | help
One day, a friend told me that his program was 3 times slower under FreeBSD 6.1 than under GNU/Linux (from Redhat 7.2 to Fedora Core 5). I was astonished by the real repeatable performance difference on AMD Athlon XP 2500+ (1.8GHz, 512KB L2 Cache). After hacking, I found that the problem is nested in malloc(3) of FreeBSD libc. Download the testing program: http://ftp.intron.ac/tmp/fdtd.tar.bz2 You may try to compile the program WITHOUT the macro "MY_MALLOC" defined (in Makefile) to use malloc(3) provided by FreeBSD 6.1. Then, time the running of the binary (on Athlon XP 2500+): #/usr/bin/time ./fdtd.FreeBSD 500 500 1000 ... 165.24 real 164.19 user 0.02 sys Please try to recompile the program (Remember to "make clean") WITH the macro "MY_MALLOC" defined (in Makefile) to use my own simple implementation of malloc(3) (i.e. my_malloc() in cal.c). And time the running again: #/usr/bin/time ./fdtd.FreeBSD 500 500 1000 ... 50.41 real 49.95 user 0.04 sys You may repeat this testing again and again. I guess this kind of performance difference comes from: 1. His program uses malloc(3) to obtain so many small memory blocks. 2. In this case, FreeBSD malloc(3) obtains small memory blocks from kernel and pass them to application. But malloc(3) of GNU libc obtains large memory blocks from kernel and splits & reallocates them in small blocks to application. You may verify my judgement with truss(1). 3. The way of FreeBSD malloc(3) makes VM page mapping too chaotic, which reduces the efficiency of CPU L2 Cache. In contrast, my my_malloc() simulates the behavior of GNU libc malloc(3) partially and avoids the over-chaos. Callgrind is broken under FreeBSD, or I will verify my guess with it. I have also verified the program on Intel Pentium 4 511 (2.8GHz, 1MB L2 cache, running FreeBSD 6.1 i386 though this CPU supports EM64T) >/usr/bin/time ./fdtd.FreeBSD 500 500 1000 ... 185.30 real 184.28 user 0.02 sys >/usr/bin/time ./fdtd.FreeBSD 500 500 1000 ... 36.31 real 35.94 user 0.03 sys NOTE: you probably cannot see the performance difference on CPU with small L2 cache such as Intel Celeron 1.7GHz with 128 KB L2 Cache. ------------------------------------------------------------------------ From Beijing, China
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?courier.44E102F7.00004C34>