From owner-freebsd-current@FreeBSD.ORG Tue Mar 4 19:34:09 2008 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 175501065670; Tue, 4 Mar 2008 19:34:09 +0000 (UTC) (envelope-from jasone@freebsd.org) Received: from canonware.com (canonware.com [64.183.146.166]) by mx1.freebsd.org (Postfix) with ESMTP id F21A78FC20; Tue, 4 Mar 2008 19:34:08 +0000 (UTC) (envelope-from jasone@freebsd.org) Received: from [127.0.0.1] (unknown [192.168.168.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by canonware.com (Postfix) with ESMTP id B84C31298D4; Tue, 4 Mar 2008 11:18:49 -0800 (PST) Message-ID: <47CD9F87.4000509@freebsd.org> Date: Tue, 04 Mar 2008 11:14:15 -0800 From: Jason Evans User-Agent: Thunderbird 2.0.0.12 (X11/20080227) MIME-Version: 1.0 To: gnn@freebsd.org References: <677e3b3e0802280915x3f29e79cqe6093b5d7bfba975@mail.gmail.com> <7ifxv7pnei.wl%gnn@neville-neil.com> In-Reply-To: <7ifxv7pnei.wl%gnn@neville-neil.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: current@freebsd.org Subject: Re: Differences in malloc between 6 and 7? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Mar 2008 19:34:09 -0000 gnn@freebsd.org wrote: > One of the folks I'm working with found this. The following code, > which yes, is just an example, is 1/2 as fast on 7.0-RELEASE as on > 6.3. Where should I look to find out why? There is a definite performance problem an arena_run_alloc(), but I'm happy to report that it was fixed in -current a while back. I plan to MFC malloc to RELENG_7 within the next few weeks. In a nutshell, the arena_run_alloc() performance problem is due to using a linear search to find sufficiently large runs of mapped (but currently unused) pages. There are caching mechanisms that speed up the searches to some degree, but there are still some linear aspects to the algorithm, so as memory usage increases, the searches take progressively longer. In -current, this problem is solved by maintaining red-black trees, so that arena_run_alloc() does a O(lg n) tree search, rather than a O(n) iterative search. It's worth mentioning that the benchmark is of marginal use, due to a simple (but common) flaw. At a minimum, a malloc benchmark should touch all allocated memory at least once. Otherwise, the benchmark is IMO too far removed from reality to measure anything of value, since memory access patterns look nothing like those of an actual application that dynamically allocates memory. Both phkmalloc and jemalloc use data structures that are mostly disjunct from the allocations (no headers), so the benchmark never even faults most pages in. This is especially true for phkmalloc, so jemalloc is unjustly penalized. If we were to include, say, dlmalloc in this comparison, it would be even more heavily penalized due to touching the pages while modifying allocation headers. Jason