Date: Sun, 20 Jan 2013 20:26:19 -0700 From: Ian Lepore <ian@FreeBSD.org> To: Adrian Chadd <adrian@FreeBSD.org> Cc: Jason Evans <jasone@FreeBSD.org>, freebsd-arch@FreeBSD.org Subject: Re: [rfc] enabling MALLOC_PRODUCTION on -HEAD for now, until jemalloc has been taught to have some run time selectable debug options Message-ID: <1358738779.32417.380.camel@revolution.hippie.lan> In-Reply-To: <CAJ-VmomY_jy5s_pgjpjDXZpN54HpKykD-5tWjU6TG6Z7eR=eOQ@mail.gmail.com> References: <CAJ-VmomY_jy5s_pgjpjDXZpN54HpKykD-5tWjU6TG6Z7eR=eOQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--=-pHExhQ4eXymyCAIGbJ0z Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Sat, 2013-01-19 at 22:26 -0800, Adrian Chadd wrote: > Hi, > > I'd like to enable MALLOC_PRODUCTION on -HEAD. > > I'm currently recompiling my libc on this g4 powerbook because the > -HEAD snapshots don't have it enabled by default; just to get some > damned decent performance out of this thing. > > I'll work with Jason and others (eg Ian) who have a vested interest in > trying to get it to run better out of the box, but still have the > debug options available for people who wish to debug things. > I've been investigating this today and have some information. With MALLOC_PRODUCTION defined there is no problem, even on small embedded systems. Without MALLOC_PRODUCTION we've basically got two problems: * Every program has a minimum resident size of about 8MiB, and that's fatal on a small-memory embedded system. * Performance is bad. This is at least in part due to the expense of faulting in 8MiB of zeroed pages, and that's especially noticible in utilities that should be small and fast. There could be other causes as well. I think I've tracked the cause of the 8MiB resident size to a particular sanity check, which validates whether memory that was supposed to have been zeroed actually was. I think this check makes sense in some cases, and not in others. It almost certainly doesn't make sense if the memory was freshly obtained from mmap(). I want to talk to Jason about a proper robust fix, but to help learn more about the performance problem, I'm attaching a little test patch that disables the suspect validity check. It would be good if a few folks running -current could apply this and build without MALLOC_PRODUCTION defined, and see if the system feels more usable than it does without the patch. It's likely to make the most difference on a slower or older system. It's possible that this patch helps with the memory usage, but doesn't help enough with performance. I'm not in a good position to do real-world performance testing myself right now. In terms of non-real-world testing, I was using a trivial little app that was basically: int main(void) {malloc(64); return 0;} and a little shell script to time running 100 iterations of that in a loop. Without the patch it took 24 seconds, with the patch 2 seconds, on a medium-wimpy embeded arm system. It's probably too much to hope that a 12:1 improvement will scale up to non-trivial apps. -- Ian --=-pHExhQ4eXymyCAIGbJ0z Content-Disposition: inline; filename="jemalloc_test.diff" Content-Type: text/x-patch; name="jemalloc_test.diff"; charset="us-ascii" Content-Transfer-Encoding: 7bit Index: contrib/jemalloc/src/chunk.c =================================================================== --- contrib/jemalloc/src/chunk.c (revision 245695) +++ contrib/jemalloc/src/chunk.c (working copy) @@ -195,12 +203,7 @@ prof_gdump(); } if (config_debug && *zero && ret != NULL) { - size_t i; - size_t *p = (size_t *)(uintptr_t)ret; - VALGRIND_MAKE_MEM_DEFINED(ret, size); - for (i = 0; i < size / sizeof(size_t); i++) - assert(p[i] == 0); } assert(CHUNK_ADDR2BASE(ret) == ret); return (ret); --=-pHExhQ4eXymyCAIGbJ0z--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1358738779.32417.380.camel>