Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Aug 2006 16:04:00 +0800
From:      "Intron" <mag@intron.ac>
To:        Brooks Davis <brooks@one-eyed-alien.net>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: The optimization of malloc(3): FreeBSD vs GNU libc
Message-ID:  <courier.44E17FF0.00006B85@intron.ac>
In-Reply-To: <20060814231504.GB69362@lor.one-eyed-alien.net>
References:  <courier.44E102F7.00004C34@intron.ac> <20060814231504.GB69362@lor.one-eyed-alien.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Brooks Davis wrote:

> On Tue, Aug 15, 2006 at 07:10:47AM +0800, Intron wrote:
>> One day, a friend told me that his program was 3 times slower under
>> FreeBSD 6.1 than under GNU/Linux (from Redhat 7.2 to Fedora Core 5).
>> I was astonished by the real repeatable performance difference on
>> AMD Athlon XP 2500+ (1.8GHz, 512KB L2 Cache).
>> 
>> After hacking, I found that the problem is nested in malloc(3) of
>> FreeBSD libc.
>> 
>> Download the testing program: http://ftp.intron.ac/tmp/fdtd.tar.bz2
>> 
>> You may try to compile the program WITHOUT the macro "MY_MALLOC"
>> defined (in Makefile) to use malloc(3) provided by FreeBSD 6.1.
>> Then, time the running of the binary (on Athlon XP 2500+):
>> 
>> #/usr/bin/time ./fdtd.FreeBSD 500 500 1000
>> ...
>>        165.24 real       164.19 user         0.02 sys
>> 
>> Please try to recompile the program (Remember to "make clean")
>> WITH the macro "MY_MALLOC" defined (in Makefile) to use my own
>> simple implementation of malloc(3) (i.e. my_malloc() in cal.c).
>> And time the running again:
>> 
>> #/usr/bin/time ./fdtd.FreeBSD 500 500 1000
>> ...
>>        50.41 real        49.95 user         0.04 sys
>> 
>> You may repeat this testing again and again.
>> 
>> I guess this kind of performance difference comes from:
>> 
>> 1. His program uses malloc(3) to obtain so many small memory blocks.
>> 
>> 2. In this case, FreeBSD malloc(3) obtains small memory blocks from
>>    kernel and pass them to application. 
>> 
>>    But malloc(3) of GNU libc obtains large memory blocks from kernel
>>    and splits & reallocates them in small blocks to application.
>> 
>>    You may verify my judgement with truss(1).
>> 
>> 3. The way of FreeBSD malloc(3) makes VM page mapping too chaotic, which
>>    reduces the efficiency of CPU L2 Cache. In contrast, my my_malloc()
>>    simulates the behavior of GNU libc malloc(3) partially and avoids
>>    the over-chaos.
>> 
>> Callgrind is broken under FreeBSD, or I will verify my guess with it.
>> 
>> I have also verified the program on Intel Pentium 4 511 (2.8GHz, 1MB
>> L2 cache, running FreeBSD 6.1 i386 though this CPU supports EM64T)
>> 
>> >/usr/bin/time ./fdtd.FreeBSD 500 500 1000
>> ...
>>       185.30 real       184.28 user         0.02 sys
>> 
>> >/usr/bin/time ./fdtd.FreeBSD 500 500 1000
>> ...
>>        36.31 real        35.94 user         0.03 sys
>> 
>> NOTE: you probably cannot see the performance difference on CPU with
>>    small L2 cache such as Intel Celeron 1.7GHz with 128 KB L2 Cache.
> 
> In CURRENT we've replaced phkmalloc with jemalloc.  It would be useful
> to see how this benchmark performs with that.  I believe it does similar
> things.
> 
> -- Brooke

You're right.

Now with truss(1) I can see that malloc(3) on 7.0-CURRENT (4 days ago)
calls brk(2) to obtain 2MB each time. I will continue my testing.

------------------------------------------------------------------------
                                                From Beijing, China




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?courier.44E17FF0.00006B85>