Date: Fri, 1 Feb 2008 19:07:21 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: Alexander Motin <mav@FreeBSD.org> Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Julian Elischer <julian@elischer.org> Subject: Re: Memory allocation performance Message-ID: <20080201185435.X88034@fledge.watson.org> In-Reply-To: <47A2C2A2.5040109@FreeBSD.org> References: <47A25412.3010301@FreeBSD.org> <47A25A0D.2080508@elischer.org> <47A2C2A2.5040109@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 1 Feb 2008, Alexander Motin wrote: > That was actually my second question. As there is only 512 items by default > and they are small in size I can easily preallocate them all on boot. But is > it a good way? Why UMA can't do just the same when I have created zone with > specified element size and maximum number of objects? What is the principal > difference? Alexander, I think we should drill down in the analysis a bit and see if we can figure out what's going on with UMA. What UMA essentially does is ask the VM for pages, and then pack objects into pages. It maintains some meta-data, and depending on the relative sizes of objects and pages, it may store it in the page or potentially elsewhere. Either way, it looks very much an array of struct object. It has a few extra layers of wrapping in order to maintain stats, per-CPU caches, object life cycle, etc. When INVARIANTS is turned off, allocation from the per-CPU cache consists of pulling objects in and out of one of two per-CPU queues. So I guess the question is: where are the cycles going? Are we suffering excessive cache misses in managing the slabs? Are you effectively "cycling through" objects rather than using a smaller set that fits better in the cache? Is some bit of debugging enabled that shouldn't be, perhaps due to a failure of ifdefs? BTW, UMA does let you set the size of buckets, so you can try tuning the bucket size. For starts, try setting the zone flag UMA_ZONE_MAXBUCKET. It would be very helpful if you could try doing some analysis with hwpmc -- "high resolution profiling" is of increasingly limited utility with modern CPUs, where even a high frequency timer won't run very often. It's also quite subject to cycle events that align with other timers in the system. Robert N M Watson Computer Laboratory University of Cambridge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080201185435.X88034>