Date: Wed, 7 Jan 2004 11:49:16 +1100 From: Peter Jeremy <peterjeremy@optushome.com.au> To: Alexander Leidinger <Alexander@Leidinger.net> Cc: current@freebsd.org Subject: Re: perl malloc slow? Message-ID: <20040107004916.GB25474@server.vk2pj.dyndns.org> In-Reply-To: <20040106171740.201f6581@Magellan.Leidinger.net> References: <200401061536.i06FadY90540@alogis.com> <20040106171740.201f6581@Magellan.Leidinger.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 06, 2004 at 05:17:40PM +0100, Alexander Leidinger wrote: >On Tue, 06 Jan 2004 15:36:34 +0000 >Holger Kipp <Holger.Kipp@alogis.com> wrote: > >> Kris Kennaway wrote: "For any algorithm one can come up with a workload that >> makes it perform badly. The trick is making it perform well for >> common workloads, which FreeBSD's malloc does." >> (http://lists.freebsd.org/pipermail/freebsd-hackers/2003-September/003059.html) >> >> Even so, if Perl malloc and Linux malloc are O(n) with a small constant and >> FreeBSD malloc behaves like O(n*n) or worse with a large constant in this >> easy example (of a real-world applikation), this behaviour should imho be >> investigated.... > >You need to make a context switch for every malloc call. This isn't true. There are no circumstances under which phkmalloc requires a context switch. Since Unix uses a pre-emptive scheduler, a context switch may occur at any time, though it is preferentially performed during system calls. If the free memory pool managed by phkmalloc has insufficient space to fulfil the request, or is excessively large following a free() then it will use brk(2) to allocate/return additional memory. The kernel may choose to schedule an alternative process during the brk() call. > If the Zlib module allocates alot of small memory >regions with the same size one after another and frees alot of them >after a short period of time, this results in a huge amount of context >switches in the non-perl-malloc case and in nearly no context switch in >the with-perl-malloc case. Phkmalloc makes it trivial to trace malloc() family calls. Actually tracing calls to malloc et al whilst running Compress::Zlib::memGunzip on 4.9-STABLE shows that there are relatively few calls - about 500 calls whilst uncompressing 284K to 1484K in my test. The reason for the poor performance is that most (~72%) of the calls are of the form "realloc(ptr, size+=4096)", where size starts at 4097. This probably triggers the worst-case performance for any general purpose malloc - O(n^2). This is due to the requirement to continually copy the buffer contents to a new buffer. Similar behaviour is demonstrated by perl -e '$x=350; $y = ""; print ".\n"; while ($x-- > 0) { $y .= "." x 4096;} print length($y), "\n";' It's not clear why the builtin perl malloc is so much faster in this case. A quick check of the perl malloc code suggests that it uses a geometric-progression bucket arrangement (whereas phkmalloc appears to use page-sized buckets for large allocations) - this would significantly reduce the number of realloc() copies. >The right thing to do in such a case is to optimize the application to >use e.g. a ring buffer. A static number of malloc() calls at startup, >use them for a long time, free() them. Or to manage a malloc resource >pool on it's own. This is what perl does in the with-perl-malloc case. In this particular case, the right thing to do is to use a larger- than-default buffer size in Compress::Zlib::memGunzip. Patching the code to use a buffer size matching the input string size resulted in an order of magnitude speedup in my test case (13.9 seconds to 1.16 seconds). I similar problem presumably exists with Compress::Zlib::memGzip, though I haven't checked this. I have sent a suggested patch to the Compress::Zlib author. Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040107004916.GB25474>