Date: Sun, 11 Dec 2005 23:30:23 -0500 From: Kris Kennaway <kris@obsecurity.org> To: Kris Kennaway <kris@obsecurity.org> Cc: Julian Elischer <julian@elischer.org>, Jason Evans <jasone@canonware.com>, Claus Guttesen <kometen@gmail.com>, David Xu <davidxu@freebsd.org>, current@freebsd.org Subject: Re: New libc malloc patch Message-ID: <20051212043023.GA16678@xor.obsecurity.org> In-Reply-To: <20051212012907.GA13640@xor.obsecurity.org> References: <B6653214-2181-4342-854D-323979D23EE8@canonware.com> <Pine.LNX.4.53.0511291121360.27754@regurgitate.ugcs.caltech.edu> <0B746373-8C29-4ADF-9218-311AE08F3834@canonware.com> <b41c75520512031245q48521143m@mail.gmail.com> <7318D807-9086-4817-A40B-50D6960880FB@canonware.com> <b41c75520512040451t360eb01u@mail.gmail.com> <12CA5E15-D006-441D-A24C-1BCD1A69D740@canonware.com> <439CC5DA.3080103@elischer.org> <439CC939.5080703@freebsd.org> <20051212012907.GA13640@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--sm4nu43k4a2Rpi4c Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun, Dec 11, 2005 at 08:29:07PM -0500, Kris Kennaway wrote: > I'll try to test this on a 4 CPU amd64 machine next. phkmalloc: # ./malloc-test 1024 10000000 1 Starting test with 1 thread... Thread 5298176 adjusted timing: 4.173052 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 2 Starting test with 2 threads... Thread 5299200 adjusted timing: 325.108643 seconds for 10000000 requests of 1024 bytes. Thread 5298176 adjusted timing: 325.202485 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 3 Starting test with 3 threads... Thread 5414912 adjusted timing: 1133.238459 seconds for 10000000 requests of 1024 bytes. Thread 5299200 adjusted timing: 1134.525255 seconds for 10000000 requests of 1024 bytes. Thread 5298176 adjusted timing: 1134.539555 seconds for 10000000 requests of 1024 bytes. jemalloc: # ./malloc-test 1024 10000000 1 Starting test with 1 thread... Thread 1073760528 adjusted timing: 3.777175 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 2 Starting test with 2 threads... Thread 1073760560 adjusted timing: 3.851702 seconds for 10000000 requests of 1024 bytes. Thread 1073761584 adjusted timing: 3.887943 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 3 Starting test with 3 threads... Thread 1073760528 adjusted timing: 3.866206 seconds for 10000000 requests of 1024 bytes. Thread 1073761552 adjusted timing: 13.382795 seconds for 10000000 requests of 1024 bytes. Thread 1073762688 adjusted timing: 14.407229 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 4 Starting test with 4 threads... Thread 1073760528 adjusted timing: 3.782923 seconds for 10000000 requests of 1024 bytes. Thread 1073763792 adjusted timing: 6.668655 seconds for 10000000 requests of 1024 bytes. Thread 1073762688 adjusted timing: 14.346569 seconds for 10000000 requests of 1024 bytes. Thread 1073761584 adjusted timing: 14.680211 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 5 Starting test with 5 threads... Thread 1073760560 adjusted timing: 4.748248 seconds for 10000000 requests of 1024 bytes. Thread 1073761584 adjusted timing: 9.898153 seconds for 10000000 requests of 1024 bytes. Thread 1073764896 adjusted timing: 13.019884 seconds for 10000000 requests of 1024 bytes. Thread 1073762688 adjusted timing: 15.326908 seconds for 10000000 requests of 1024 bytes. Thread 1073763792 adjusted timing: 15.442164 seconds for 10000000 requests of 1024 bytes. So it's 1.1 times faster for single-threaded, and 107 times faster with 3 threads. With libthr instead of libpthread: phkmalloc: # ./malloc-test 1024 10000000 1 Starting test with 1 thread... Thread 5255680 adjusted timing: 2.357247 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 2 Starting test with 2 threads... Thread 5256192 adjusted timing: 10.964918 seconds for 10000000 requests of 1024 bytes. Thread 5255680 adjusted timing: 11.001288 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 3 Starting test with 3 threads... Thread 5255680 adjusted timing: 17.467754 seconds for 10000000 requests of 1024 bytes. Thread 5256704 adjusted timing: 17.724583 seconds for 10000000 requests of 1024 bytes. Thread 5256192 adjusted timing: 17.913381 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 4 Starting test with 4 threads... Thread 5255680 adjusted timing: 42.715420 seconds for 10000000 requests of 1024 bytes. Thread 5256192 adjusted timing: 43.481252 seconds for 10000000 requests of 1024 bytes. Thread 5256704 adjusted timing: 43.871452 seconds for 10000000 requests of 1024 bytes. Thread 5257216 adjusted timing: 43.887820 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 5 Starting test with 5 threads... Thread 5255680 adjusted timing: 139.316332 seconds for 10000000 requests of 1024 bytes. Thread 5257216 adjusted timing: 140.117720 seconds for 10000000 requests of 1024 bytes. Thread 5256192 adjusted timing: 140.134057 seconds for 10000000 requests of 1024 bytes. Thread 5256704 adjusted timing: 140.855289 seconds for 10000000 requests of 1024 bytes. Thread 5257728 adjusted timing: 140.865934 seconds for 10000000 requests of 1024 bytes. jemalloc: # ./malloc-test 1024 10000000 1 Starting test with 1 thread... Thread 1073742416 adjusted timing: 1.366353 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 2 Starting test with 2 threads... Thread 1073742416 adjusted timing: 1.429485 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 1.530733 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 3 Starting test with 3 threads... Thread 1073742416 adjusted timing: 1.419813 seconds for 10000000 requests of 1024 bytes. Thread 1073743376 adjusted timing: 1.432790 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 1.490218 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 4 Starting test with 4 threads... Thread 1073743376 adjusted timing: 1.447554 seconds for 10000000 requests of 1024 bytes. Thread 1073742416 adjusted timing: 1.503659 seconds for 10000000 requests of 1024 bytes. Thread 1073743856 adjusted timing: 1.503937 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 1.504926 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 5 Starting test with 5 threads... Thread 1073743376 adjusted timing: 1.595239 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 1.689753 seconds for 10000000 requests of 1024 bytes. Thread 1073742416 adjusted timing: 1.750115 seconds for 10000000 requests of 1024 bytes. Thread 1073744336 adjusted timing: 1.744271 seconds for 10000000 requests of 1024 bytes. Thread 1073743856 adjusted timing: 1.890269 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 6 Starting test with 6 threads... Thread 1073743856 adjusted timing: 1.847653 seconds for 10000000 requests of 1024 bytes. Thread 1073742416 adjusted timing: 2.018481 seconds for 10000000 requests of 1024 bytes. Thread 1073743376 adjusted timing: 2.059817 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 2.129204 seconds for 10000000 requests of 1024 bytes. Thread 1073744336 adjusted timing: 2.223751 seconds for 10000000 requests of 1024 bytes. Thread 1073744816 adjusted timing: 2.293809 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 20 Starting test with 20 threads... Thread 1073744816 adjusted timing: 5.113769 seconds for 10000000 requests of 1024 bytes. Thread 1073751136 adjusted timing: 4.973369 seconds for 10000000 requests of 1024 bytes. Thread 1073750176 adjusted timing: 5.295912 seconds for 10000000 requests of 1024 bytes. Thread 1073745296 adjusted timing: 5.502331 seconds for 10000000 requests of 1024 bytes. Thread 1073743856 adjusted timing: 5.614890 seconds for 10000000 requests of 1024 bytes. Thread 1073744336 adjusted timing: 5.608690 seconds for 10000000 requests of 1024 bytes. Thread 1073752096 adjusted timing: 5.555465 seconds for 10000000 requests of 1024 bytes. Thread 1073748736 adjusted timing: 5.650922 seconds for 10000000 requests of 1024 bytes. Thread 1073748256 adjusted timing: 6.608054 seconds for 10000000 requests of 1024 bytes. Thread 1073750656 adjusted timing: 7.144998 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 7.390905 seconds for 10000000 requests of 1024 bytes. Thread 1073746256 adjusted timing: 7.364728 seconds for 10000000 requests of 1024 bytes. Thread 1073742416 adjusted timing: 7.556064 seconds for 10000000 requests of 1024 bytes. Thread 1073749216 adjusted timing: 7.357179 seconds for 10000000 requests of 1024 bytes. Thread 1073752576 adjusted timing: 7.349483 seconds for 10000000 requests of 1024 bytes. c Thread 1073747776 adjusted timing: 7.375179 seconds for 10000000 requests of 1024 bytes. Thread 1073751616 adjusted timing: 7.557854 seconds for 10000000 requests of 1024 bytes. Thread 1073743376 adjusted timing: 7.915978 seconds for 10000000 requests of 1024 bytes. Thread 1073749696 adjusted timing: 7.795219 seconds for 10000000 requests of 1024 bytes. Thread 1073745776 adjusted timing: 8.007392 seconds for 10000000 requests of 1024 bytes. So libthr is *much* faster than libpthread with both malloc implementations, but jemalloc is still 1.7 times faster for 1 thread and 80 times faster for 5 threads than phkmalloc. Kris P.S. Holy crap :) --sm4nu43k4a2Rpi4c Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFDnPzeWry0BWjoQKURAnVqAJ9cJGJuCWOLnIKy1Y+V6DEyZeUrWwCgxOzF X+0gquCFzLB20OwCt+7qhVc= =rZUQ -----END PGP SIGNATURE----- --sm4nu43k4a2Rpi4c--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051212043023.GA16678>