Date: Sat, 26 Jul 2008 20:18:05 -0700 From: Doug Hardie <bc979@lafn.org> To: Giorgos Keramidas <keramida@ceid.upatras.gr> Cc: Kris Kennaway <kris@freebsd.org>, freebsd-questions@freebsd.org Subject: Re: malloc options Message-ID: <E3C0FC23-5548-475A-8A7E-6ED00009CAF8@lafn.org> In-Reply-To: <87y73ohylt.fsf@kobe.laptop> References: <EE1CF633-524E-4AE3-8224-685D71652F36@lafn.org> <488BBCFD.1090309@FreeBSD.org> <B20991AB-6D2F-4E9E-BC68-2073EFE598AF@lafn.org> <87y73ohylt.fsf@kobe.laptop>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 26, 2008, at 19:03, Giorgos Keramidas wrote: > While that's understandable, the current malloc() has undergone quite > extensive testing by Jason Evans and a lot of people who use it in > FreeBSD 7.X or later. Its ability to expose bugs in this way was > deemed > important enough that it is now used by other projects too. while in general I like the new approach, this problem has been a killer. I did find a number of errors in my own code where I was not allocating enough space for some things. Those showed up instantly with 7.0 and were easy to fix. > > > What Kris wrote in: > > Finally, there is no way to revert to the "old approach" > because the new allocator is completely new; it allocates > memory based on its own strategy. None of the malloc options > affect the behaviour of correct programs (but some of them > can help to improve performance, or to debug incorrect > programs). > > is a bit important. Even if you tweak enough options the new malloc() > may *not* work similarly enough for the program to keep working. If > you > are lsing money _right_ _now_ because of problems in the program, it > may > be worth going back to 6-STABLE and the old malloc() until the bugs of > the program have been fixed by the developers. Unfortunately that is not possible. We upgraded the hardware and some of the components were not supported very well under 6.x. Despite several weeks of testing of the new hardware and 7.0, the problem did not arise till several weeks after going into production. It takes about a week of real time before the problem tends to become visible. By compressing the workload I have been able to setup a test machine such that it takes 2-4 days before it occurs. > > >> Not surprising but I seem to recall that when it was first introduced >> into stable that there was some discussion on how to make it look >> more >> like the old malloc. I couldn't find that via a search though. > > If all else fails, you can try forward-porting phkmalloc to 7.X but > it's > not necessarily easier than going temporarily back to 6.X and fixing > the > program to work correctly on 7.X. > > It basically all boils down to ``How much time do you want to spend > with > a possibly crashing service?'' > > There's definitely a bug somewhere and you ultimately need it > resolved. > It is highly unlikely that it is in malloc() itself, but you can > probably use its debugging features to help you find out if it is a > bug > in malloc() (see the preprocessor define MALLOC_PRODUCTION in > libc/stdlib/malloc.c), or if it a bug in the program using malloc() > and > _where_ it may be. > > The new malloc() also includes an option that can dump 'utrace' debug > output of all the malloc(), calloc(), realloc(), posix_memalign() and > free() calls of malloc.c. If you haven't tried it already, it may be > another useful tool to help you track down where the bug is. > > Tracing a program's malloc usage with the 'U' option is relatively > easy > to do if you spawn just *this* program with MALLOC_OPTIONS='U': > > # ktrace env MALLOC_OPTIONS='U' your-program-here > > Then you can dump the 'utrace' entries logged by ktrace, with: > > # kdump [optionally, more kdump options] -f ktrace.out > > You should see something like this: > > $ kdump -T -t u -f ktrace.out | head -40 > 26674 ls 1217123351.156040 USER malloc_init() > 26674 ls 1217123351.156369 USER 0x8101000 = malloc(4096) > 26674 ls 1217123351.156515 USER 0x8102000 = malloc(2560) > 26674 ls 1217123351.156611 USER 0x8103800 = malloc(2048) > 26674 ls 1217123351.156702 USER 0x810b020 = malloc(20) > 26674 ls 1217123351.156881 USER free(0x8101000) > 26674 ls 1217123351.157074 USER 0x8101000 = malloc(3191) > 26674 ls 1217123351.157191 USER 0x810c000 = malloc(4096) > 26674 ls 1217123351.157369 USER 0x810d000 = malloc(3219) > 26674 ls 1217123351.157431 USER free(0x8101000) > 26674 ls 1217123351.157538 USER free(0x810c000) > 26674 ls 1217123351.157743 USER 0x810e400 = malloc(524) > 26674 ls 1217123351.157865 USER 0x8104000 = malloc(1280) > 26674 ls 1217123351.157922 USER 0x8101040 = malloc(89) > 26674 ls 1217123351.157975 USER 0x81010a0 = malloc(90) > 26674 ls 1217123351.158065 USER 0x8101100 = malloc(89) > 26674 ls 1217123351.158170 USER free(0x8101100) > [...] > > If your bug is a double-free bug, then a bit of post-processing of > this > will quickly reveal if there *is* a double free bug when a duplicate > free() call is found. Then you can dump more ktrace records, in an > effort to pinpoint the exact place where the original allocation > happens, and you can keep going from there. > > If you see data changing 'under your feet' it's quite likely that you > are trying to use data after it has been freed. A nice option that > you > can _enable_ to catch that in action is 'J'. By dumping the > unexpected > data and using the info from malloc.conf(5)'s description of 'J' you > may > find useful bits of information to track the bug down: > > J Each byte of new memory allocated by malloc(), realloc() > or reallocf() will be initialized to 0xa5. All memory > returned by free(), realloc() or reallocf() will be > initialized to 0x5a. This is intended for debugging and > will impact performance negatively. > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E3C0FC23-5548-475A-8A7E-6ED00009CAF8>