FreeBSD Mail Archives

Date:      Sat, 26 Jul 2008 20:18:05 -0700
From:      Doug Hardie <bc979@lafn.org>
To:        Giorgos Keramidas <keramida@ceid.upatras.gr>
Cc:        Kris Kennaway <kris@freebsd.org>, freebsd-questions@freebsd.org
Subject:   Re: malloc options
Message-ID:  <E3C0FC23-5548-475A-8A7E-6ED00009CAF8@lafn.org>
In-Reply-To: <87y73ohylt.fsf@kobe.laptop>
References:  <EE1CF633-524E-4AE3-8224-685D71652F36@lafn.org> <488BBCFD.1090309@FreeBSD.org> <B20991AB-6D2F-4E9E-BC68-2073EFE598AF@lafn.org> <87y73ohylt.fsf@kobe.laptop>


On Jul 26, 2008, at 19:03, Giorgos Keramidas wrote:

> While that's understandable, the current malloc() has undergone quite
> extensive testing by Jason Evans and a lot of people who use it in
> FreeBSD 7.X or later.  Its ability to expose bugs in this way was  
> deemed
> important enough that it is now used by other projects too.

while in general I like the new approach, this problem has been a  
killer.  I did find a number of errors in my own code where I was not  
allocating enough space for some things.  Those showed up instantly  
with 7.0 and were easy to fix.

>
>
> What Kris wrote in:
>
>    Finally, there is no way to revert to the "old approach"
>    because the new allocator is completely new; it allocates
>    memory based on its own strategy.  None of the malloc options
>    affect the behaviour of correct programs (but some of them
>    can help to improve performance, or to debug incorrect
>    programs).
>
> is a bit important.  Even if you tweak enough options the new malloc()
> may *not* work similarly enough for the program to keep working.  If  
> you
> are lsing money _right_ _now_ because of problems in the program, it  
> may
> be worth going back to 6-STABLE and the old malloc() until the bugs of
> the program have been fixed by the developers.

Unfortunately that is not possible.  We upgraded the hardware and some  
of the components were not supported very well under 6.x.  Despite  
several weeks of testing of the new hardware and 7.0, the problem did  
not arise till several weeks after going into production.  It takes  
about a week of real time before the problem tends to become visible.   
By compressing the workload I have been able to setup a test machine  
such that it takes 2-4 days before it occurs.

>
>
>> Not surprising but I seem to recall that when it was first introduced
>> into stable that there was some discussion on how to make it look  
>> more
>> like the old malloc.  I couldn't find that via a search though.
>
> If all else fails, you can try forward-porting phkmalloc to 7.X but  
> it's
> not necessarily easier than going temporarily back to 6.X and fixing  
> the
> program to work correctly on 7.X.
>
> It basically all boils down to ``How much time do you want to spend  
> with
> a possibly crashing service?''
>
> There's definitely a bug somewhere and you ultimately need it  
> resolved.
> It is highly unlikely that it is in malloc() itself, but you can
> probably use its debugging features to help you find out if it is a  
> bug
> in malloc() (see the preprocessor define MALLOC_PRODUCTION in
> libc/stdlib/malloc.c), or if it a bug in the program using malloc()  
> and
> _where_ it may be.
>
> The new malloc() also includes an option that can dump 'utrace' debug
> output of all the malloc(), calloc(), realloc(), posix_memalign() and
> free() calls of malloc.c.  If you haven't tried it already, it may be
> another useful tool to help you track down where the bug is.
>
> Tracing a program's malloc usage with the 'U' option is relatively  
> easy
> to do if you spawn just *this* program with MALLOC_OPTIONS='U':
>
>    # ktrace env MALLOC_OPTIONS='U' your-program-here
>
> Then you can dump the 'utrace' entries logged by ktrace, with:
>
>    # kdump [optionally, more kdump options] -f ktrace.out
>
> You should see something like this:
>
>    $ kdump -T -t u -f ktrace.out | head -40
>     26674 ls       1217123351.156040 USER  malloc_init()
>     26674 ls       1217123351.156369 USER  0x8101000 = malloc(4096)
>     26674 ls       1217123351.156515 USER  0x8102000 = malloc(2560)
>     26674 ls       1217123351.156611 USER  0x8103800 = malloc(2048)
>     26674 ls       1217123351.156702 USER  0x810b020 = malloc(20)
>     26674 ls       1217123351.156881 USER  free(0x8101000)
>     26674 ls       1217123351.157074 USER  0x8101000 = malloc(3191)
>     26674 ls       1217123351.157191 USER  0x810c000 = malloc(4096)
>     26674 ls       1217123351.157369 USER  0x810d000 = malloc(3219)
>     26674 ls       1217123351.157431 USER  free(0x8101000)
>     26674 ls       1217123351.157538 USER  free(0x810c000)
>     26674 ls       1217123351.157743 USER  0x810e400 = malloc(524)
>     26674 ls       1217123351.157865 USER  0x8104000 = malloc(1280)
>     26674 ls       1217123351.157922 USER  0x8101040 = malloc(89)
>     26674 ls       1217123351.157975 USER  0x81010a0 = malloc(90)
>     26674 ls       1217123351.158065 USER  0x8101100 = malloc(89)
>     26674 ls       1217123351.158170 USER  free(0x8101100)
>     [...]
>
> If your bug is a double-free bug, then a bit of post-processing of  
> this
> will quickly reveal if there *is* a double free bug when a duplicate
> free() call is found.  Then you can dump more ktrace records, in an
> effort to pinpoint the exact place where the original allocation
> happens, and you can keep going from there.
>
> If you see data changing 'under your feet' it's quite likely that you
> are trying to use data after it has been freed.  A nice option that  
> you
> can _enable_ to catch that in action is 'J'.  By dumping the  
> unexpected
> data and using the info from malloc.conf(5)'s description of 'J' you  
> may
> find useful bits of information to track the bug down:
>
>     J   Each byte of new memory allocated by malloc(), realloc()
>         or reallocf() will be initialized to 0xa5.  All memory
>         returned by free(), realloc() or reallocf() will be
>         initialized to 0x5a.  This is intended for debugging and
>         will impact performance negatively.
>
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E3C0FC23-5548-475A-8A7E-6ED00009CAF8>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation