From owner-freebsd-performance@FreeBSD.ORG Fri Feb 1 19:07:22 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D059516A46B; Fri, 1 Feb 2008 19:07:22 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 9455E13C4D5; Fri, 1 Feb 2008 19:07:22 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 449F448030; Fri, 1 Feb 2008 14:07:21 -0500 (EST) Date: Fri, 1 Feb 2008 19:07:21 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Alexander Motin In-Reply-To: <47A2C2A2.5040109@FreeBSD.org> Message-ID: <20080201185435.X88034@fledge.watson.org> References: <47A25412.3010301@FreeBSD.org> <47A25A0D.2080508@elischer.org> <47A2C2A2.5040109@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Julian Elischer Subject: Re: Memory allocation performance X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Feb 2008 19:07:22 -0000 On Fri, 1 Feb 2008, Alexander Motin wrote: > That was actually my second question. As there is only 512 items by default > and they are small in size I can easily preallocate them all on boot. But is > it a good way? Why UMA can't do just the same when I have created zone with > specified element size and maximum number of objects? What is the principal > difference? Alexander, I think we should drill down in the analysis a bit and see if we can figure out what's going on with UMA. What UMA essentially does is ask the VM for pages, and then pack objects into pages. It maintains some meta-data, and depending on the relative sizes of objects and pages, it may store it in the page or potentially elsewhere. Either way, it looks very much an array of struct object. It has a few extra layers of wrapping in order to maintain stats, per-CPU caches, object life cycle, etc. When INVARIANTS is turned off, allocation from the per-CPU cache consists of pulling objects in and out of one of two per-CPU queues. So I guess the question is: where are the cycles going? Are we suffering excessive cache misses in managing the slabs? Are you effectively "cycling through" objects rather than using a smaller set that fits better in the cache? Is some bit of debugging enabled that shouldn't be, perhaps due to a failure of ifdefs? BTW, UMA does let you set the size of buckets, so you can try tuning the bucket size. For starts, try setting the zone flag UMA_ZONE_MAXBUCKET. It would be very helpful if you could try doing some analysis with hwpmc -- "high resolution profiling" is of increasingly limited utility with modern CPUs, where even a high frequency timer won't run very often. It's also quite subject to cycle events that align with other timers in the system. Robert N M Watson Computer Laboratory University of Cambridge