From owner-freebsd-hackers@FreeBSD.ORG Mon Aug 14 23:38:14 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6764916A4DF for ; Mon, 14 Aug 2006 23:38:14 +0000 (UTC) (envelope-from vkushnir@i.kiev.ua) Received: from horse.iptelecom.net.ua (horse.iptelecom.net.ua [212.9.224.8]) by mx1.FreeBSD.org (Postfix) with ESMTP id A9CC443D45 for ; Mon, 14 Aug 2006 23:38:13 +0000 (GMT) (envelope-from vkushnir@i.kiev.ua) Received: from h99.243.159.dialup.iptcom.net ([213.159.243.99]:44013 "EHLO kushnir1.kiev.ua" ident: "SOCKFAULT1" whoson: "vkushnir") by horse.iptelecom.net.ua with ESMTP id S1220795AbWHNXiM (INRCPT ); Tue, 15 Aug 2006 02:38:12 +0300 Received: from kushnir1.kiev.ua (kushnir1.kiev.ua [10.0.0.1]) by kushnir1.kiev.ua (8.13.7/8.13.6) with ESMTP id k7ENc6NV000880; Tue, 15 Aug 2006 02:38:06 +0300 (EEST) (envelope-from vkushnir@i.kiev.ua) Date: Tue, 15 Aug 2006 02:38:06 +0300 (EEST) From: Vladimir Kushnir X-X-Sender: vkushnir@kushnir1.kiev.ua To: Brooks Davis In-Reply-To: <20060814231504.GB69362@lor.one-eyed-alien.net> Message-ID: <20060815023505.N1988@kushnir1.kiev.ua> References: <20060814231504.GB69362@lor.one-eyed-alien.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@freebsd.org, Intron Subject: Re: The optimization of malloc(3): FreeBSD vs GNU libc X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Aug 2006 23:38:14 -0000 Sorry for intrusion. On Mon, 14 Aug 2006, Brooks Davis wrote: > On Tue, Aug 15, 2006 at 07:10:47AM +0800, Intron wrote: >> One day, a friend told me that his program was 3 times slower under >> FreeBSD 6.1 than under GNU/Linux (from Redhat 7.2 to Fedora Core 5). >> I was astonished by the real repeatable performance difference on >> AMD Athlon XP 2500+ (1.8GHz, 512KB L2 Cache). >> >> After hacking, I found that the problem is nested in malloc(3) of >> FreeBSD libc. >> >> Download the testing program: http://ftp.intron.ac/tmp/fdtd.tar.bz2 >> >> You may try to compile the program WITHOUT the macro "MY_MALLOC" >> defined (in Makefile) to use malloc(3) provided by FreeBSD 6.1. >> Then, time the running of the binary (on Athlon XP 2500+): >> >> #/usr/bin/time ./fdtd.FreeBSD 500 500 1000 >> ... >> 165.24 real 164.19 user 0.02 sys >> >> Please try to recompile the program (Remember to "make clean") >> WITH the macro "MY_MALLOC" defined (in Makefile) to use my own >> simple implementation of malloc(3) (i.e. my_malloc() in cal.c). >> And time the running again: >> >> #/usr/bin/time ./fdtd.FreeBSD 500 500 1000 >> ... >> 50.41 real 49.95 user 0.04 sys >> >> You may repeat this testing again and again. >> >> I guess this kind of performance difference comes from: >> >> 1. His program uses malloc(3) to obtain so many small memory blocks. >> >> 2. In this case, FreeBSD malloc(3) obtains small memory blocks from >> kernel and pass them to application. >> >> But malloc(3) of GNU libc obtains large memory blocks from kernel >> and splits & reallocates them in small blocks to application. >> >> You may verify my judgement with truss(1). >> >> 3. The way of FreeBSD malloc(3) makes VM page mapping too chaotic, which >> reduces the efficiency of CPU L2 Cache. In contrast, my my_malloc() >> simulates the behavior of GNU libc malloc(3) partially and avoids >> the over-chaos. >> >> Callgrind is broken under FreeBSD, or I will verify my guess with it. >> >> I have also verified the program on Intel Pentium 4 511 (2.8GHz, 1MB >> L2 cache, running FreeBSD 6.1 i386 though this CPU supports EM64T) >> >>> /usr/bin/time ./fdtd.FreeBSD 500 500 1000 >> ... >> 185.30 real 184.28 user 0.02 sys >> >>> /usr/bin/time ./fdtd.FreeBSD 500 500 1000 >> ... >> 36.31 real 35.94 user 0.03 sys >> >> NOTE: you probably cannot see the performance difference on CPU with >> small L2 cache such as Intel Celeron 1.7GHz with 128 KB L2 Cache. > > In CURRENT we've replaced phkmalloc with jemalloc. It would be useful > to see how this benchmark performs with that. I believe it does similar > things. > > -- Brooke > On -CURENT amd64 (Athlon64 3000+, 512k L2 cache): With jemalloc (without MY_MALLOS): ~/fdtd> /usr/bin/time ./fdtd.FreeBSD 500 500 1000 ... 116.34 real 113.69 user 0.00 sys With MY_MALLOC: ~/fdtd> /usr/bin/time ./fdtd.FreeBSD 500 500 1000 ... 45.30 real 44.29 user 0.00 sys Regards, Vladimir