Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Apr 2006 13:22:11 -0700
From:      Jason Evans <jasone@FreeBSD.org>
To:        Paul Allen <nospam@ugcs.caltech.edu>
Cc:        Attila Nagy <bra@fsn.hu>, freebsd-current@freebsd.org
Subject:   Re: malloc problems with MySQL?
Message-ID:  <444FD673.3070004@FreeBSD.org>
In-Reply-To: <20060426182837.GA29737@barf.ugcs.caltech.edu>
References:  <444F71F3.6030901@fsn.hu> <444F83AD.9040207@FreeBSD.org> <20060426182837.GA29737@barf.ugcs.caltech.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Paul Allen wrote:
>>From Jason Evans <jasone@freebsd.org>, Wed, Apr 26, 2006 at 07:29:01AM -0700:
> 
>>Allocations that are larger than the chunk size (2 MB by default) are 
>>allocated using mmap(2), rather than sbrk(2).  Most likely, your 
>>problems will go away if you reduce the heap size, so that mmap has more 
>>address space to work with.
> 
> Which raises the question of what this code is checking for...
> 1167 #ifdef USE_BRK
> 1168         else if ((uintptr_t)ret >= (uintptr_t)brk_base
> 1169             && (uintptr_t)ret < (uintptr_t)brk_max) {
> 1170                 /*
> 1171                  * We succeeded in mapping memory, but at a location that could
> 1172                  * be confused with brk.  Leave the mapping intact so that this
> 1173                  * won't ever happen again, then try again.
> 1174                  */
> 1175                 assert(addr == NULL);
> 1176                 goto AGAIN;
> 1177         }
> 1178 #endif

This code was based on my misunderstanding of MAXDSIZ.  I thought it was 
an upper bound on how large the data segment could be, but it's actually 
only the default value.  I will remove this code, and modify the 
semantics of brk_max to deal with a variable data segment size limit.

> 1212                 /*
> 1213                  * Check for address ranges that were previously chunks and try
> 1214                  * to use them.
> 1215                  */
> 1216 
> 1217                 tchunk = RB_MIN(chunk_tree_s, &old_chunks);
> 1218                 while (tchunk != NULL) {
> 1219                         /* Found an address range.  Try to recycle it. */
> 1220 
> 1221                         chunk = tchunk->chunk;
> 1222                         delchunk = tchunk;
> 1223                         tchunk = RB_NEXT(chunk_tree_s, &old_chunks, delchunk);
> 1224 
> 1225                         /* Remove delchunk from the tree. */
> 1226                         RB_REMOVE(chunk_tree_s, &old_chunks, delchunk);
> 1227                         base_chunk_node_dealloc(delchunk);
> 1228 
> 1229 #ifdef USE_BRK
> 1230                         if ((uintptr_t)chunk >= (uintptr_t)brk_base
> 1231                             && (uintptr_t)chunk < (uintptr_t)brk_max) {
> 1232                                 /* Re-use a previously freed brk chunk. */
> 1233                                 ret = chunk;
> 1234                                 goto RETURN;
> 1235                         }
> 1236 #endif
> 1237                         if ((ret = pages_map(chunk, size)) != NULL) {
> 1238                                 /* Success. */
> 1239                                 goto RETURN;
> 1240                         }
> 1241                 }
> 1242 
> There is something rather scary going on in this part of the code.
> Perhaps it just needs better commenting... You are scanning the tree
> headed by old_chunks looking for an address range.  If this is in the
> mmap region you attempt to mmap it again... to "reuse address space"
> 
> To which I have to say: huh? In chunk_dealloc you explicitly call munmap 
> on the address.  Therefore, what is the point of intentionally "reusing" 
> it by insisting the kernel return an address you had before.  Perhaps
> this would make sense if you had some perchunk structures allocated that
> you wanted to reuse but if this is the case then I question whether you
> really mean to just RB_REMOVE the chunk.  Surely some other cleanup would
> be appropriate... 

This code behaves as intended.  Note that in order to get chunk-aligned 
memory via mmap(), it is necessary to over-allocate, then trim.  By 
keeping a cache of unmapped chunks, we can avoid the extra system calls 
most of the time, and we are also able to reduce memory fragmentation in 
many cases. (Over-allocation and trimming fails to utilize chunk-sized 
holes in the memory map.)

> For that matter the brk allocation code should be changed to permit an
> allocation greater than chunk_size.  i.e., by assigning incr from size
> and pulling it out of the if block.  If you do this, of course you can
> only be strictly better off than you are now because a program that
> mainly allocs without freeing will get the benefit of the larger address
> space.  Whereas given that you allocate chunk_size from brk now,
> when you go to reuse brk space you can decide to only do so to fullfil
> chunk_size requests without losing anything relative to the current 
> implementation.

There is one issue with this change that you don't mention: the current 
implementation never returns sbrk()ed memory.  I did this in order to 
avoid races in multi-threaded programs that use brk() or sbrk(). 
However, in hindsight, such obscure (not to mention poorly designed) 
programs aren't worth supporting at the expense of all other programs. 
As such, I don't have a problem with making this change (along with a 
change that attempts to shrink the data segment during chunk deallocation).

> That you do not permit kegs larger than chunk_size though suggests a
> defect in your implementation.  one solution to this would be to adjust
> chunk_size before giving up on the allocation...

I don't understand what you are saying here.  Perhaps you were being 
confused by the chunk_size variable masking in huge_malloc() that you 
mention below.

> Also it is quite confusing that you have a global variable "chunk_size"
> that you mask with a local variable in huge_malloc.

Indeed, it is.  I'll fix this.

Thanks,
Jason



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?444FD673.3070004>