From owner-freebsd-current@FreeBSD.ORG Wed Apr 26 20:22:16 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 56D9716A400 for ; Wed, 26 Apr 2006 20:22:16 +0000 (UTC) (envelope-from jasone@FreeBSD.org) Received: from lh.synack.net (lh.synack.net [204.152.188.37]) by mx1.FreeBSD.org (Postfix) with ESMTP id DD22C43D45 for ; Wed, 26 Apr 2006 20:22:15 +0000 (GMT) (envelope-from jasone@FreeBSD.org) Received: by lh.synack.net (Postfix, from userid 100) id 2FFEF5E48DA; Wed, 26 Apr 2006 13:22:15 -0700 (PDT) Received: from [192.168.168.201] (moscow-cuda-gen2-68-64-60-20.losaca.adelphia.net [68.64.60.20]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by lh.synack.net (Postfix) with ESMTP id 790875E48DA; Wed, 26 Apr 2006 13:22:14 -0700 (PDT) Message-ID: <444FD673.3070004@FreeBSD.org> Date: Wed, 26 Apr 2006 13:22:11 -0700 From: Jason Evans User-Agent: Mozilla Thunderbird 1.0.8-1.4.1 (X11/20060420) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Paul Allen References: <444F71F3.6030901@fsn.hu> <444F83AD.9040207@FreeBSD.org> <20060426182837.GA29737@barf.ugcs.caltech.edu> In-Reply-To: <20060426182837.GA29737@barf.ugcs.caltech.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 3.0.5 (2005-11-28) on lh.synack.net X-Spam-Level: * X-Spam-Status: No, score=1.8 required=5.0 tests=RCVD_IN_NJABL_DUL, RCVD_IN_SORBS_DUL autolearn=no version=3.0.5 Cc: Attila Nagy , freebsd-current@freebsd.org Subject: Re: malloc problems with MySQL? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Apr 2006 20:22:16 -0000 Paul Allen wrote: >>From Jason Evans , Wed, Apr 26, 2006 at 07:29:01AM -0700: > >>Allocations that are larger than the chunk size (2 MB by default) are >>allocated using mmap(2), rather than sbrk(2). Most likely, your >>problems will go away if you reduce the heap size, so that mmap has more >>address space to work with. > > Which raises the question of what this code is checking for... > 1167 #ifdef USE_BRK > 1168 else if ((uintptr_t)ret >= (uintptr_t)brk_base > 1169 && (uintptr_t)ret < (uintptr_t)brk_max) { > 1170 /* > 1171 * We succeeded in mapping memory, but at a location that could > 1172 * be confused with brk. Leave the mapping intact so that this > 1173 * won't ever happen again, then try again. > 1174 */ > 1175 assert(addr == NULL); > 1176 goto AGAIN; > 1177 } > 1178 #endif This code was based on my misunderstanding of MAXDSIZ. I thought it was an upper bound on how large the data segment could be, but it's actually only the default value. I will remove this code, and modify the semantics of brk_max to deal with a variable data segment size limit. > 1212 /* > 1213 * Check for address ranges that were previously chunks and try > 1214 * to use them. > 1215 */ > 1216 > 1217 tchunk = RB_MIN(chunk_tree_s, &old_chunks); > 1218 while (tchunk != NULL) { > 1219 /* Found an address range. Try to recycle it. */ > 1220 > 1221 chunk = tchunk->chunk; > 1222 delchunk = tchunk; > 1223 tchunk = RB_NEXT(chunk_tree_s, &old_chunks, delchunk); > 1224 > 1225 /* Remove delchunk from the tree. */ > 1226 RB_REMOVE(chunk_tree_s, &old_chunks, delchunk); > 1227 base_chunk_node_dealloc(delchunk); > 1228 > 1229 #ifdef USE_BRK > 1230 if ((uintptr_t)chunk >= (uintptr_t)brk_base > 1231 && (uintptr_t)chunk < (uintptr_t)brk_max) { > 1232 /* Re-use a previously freed brk chunk. */ > 1233 ret = chunk; > 1234 goto RETURN; > 1235 } > 1236 #endif > 1237 if ((ret = pages_map(chunk, size)) != NULL) { > 1238 /* Success. */ > 1239 goto RETURN; > 1240 } > 1241 } > 1242 > There is something rather scary going on in this part of the code. > Perhaps it just needs better commenting... You are scanning the tree > headed by old_chunks looking for an address range. If this is in the > mmap region you attempt to mmap it again... to "reuse address space" > > To which I have to say: huh? In chunk_dealloc you explicitly call munmap > on the address. Therefore, what is the point of intentionally "reusing" > it by insisting the kernel return an address you had before. Perhaps > this would make sense if you had some perchunk structures allocated that > you wanted to reuse but if this is the case then I question whether you > really mean to just RB_REMOVE the chunk. Surely some other cleanup would > be appropriate... This code behaves as intended. Note that in order to get chunk-aligned memory via mmap(), it is necessary to over-allocate, then trim. By keeping a cache of unmapped chunks, we can avoid the extra system calls most of the time, and we are also able to reduce memory fragmentation in many cases. (Over-allocation and trimming fails to utilize chunk-sized holes in the memory map.) > For that matter the brk allocation code should be changed to permit an > allocation greater than chunk_size. i.e., by assigning incr from size > and pulling it out of the if block. If you do this, of course you can > only be strictly better off than you are now because a program that > mainly allocs without freeing will get the benefit of the larger address > space. Whereas given that you allocate chunk_size from brk now, > when you go to reuse brk space you can decide to only do so to fullfil > chunk_size requests without losing anything relative to the current > implementation. There is one issue with this change that you don't mention: the current implementation never returns sbrk()ed memory. I did this in order to avoid races in multi-threaded programs that use brk() or sbrk(). However, in hindsight, such obscure (not to mention poorly designed) programs aren't worth supporting at the expense of all other programs. As such, I don't have a problem with making this change (along with a change that attempts to shrink the data segment during chunk deallocation). > That you do not permit kegs larger than chunk_size though suggests a > defect in your implementation. one solution to this would be to adjust > chunk_size before giving up on the allocation... I don't understand what you are saying here. Perhaps you were being confused by the chunk_size variable masking in huge_malloc() that you mention below. > Also it is quite confusing that you have a global variable "chunk_size" > that you mask with a local variable in huge_malloc. Indeed, it is. I'll fix this. Thanks, Jason