From owner-freebsd-hackers Tue Aug 3 12:11:21 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22]) by hub.freebsd.org (Postfix) with ESMTP id 12DD515348 for ; Tue, 3 Aug 1999 12:11:12 -0700 (PDT) (envelope-from kbyanc@alcnet.com) X-Provider: ALC Communications, Inc. http://www.alcnet.com/ Received: from kbyanc (ws-41.alcnet.com [63.69.28.41]) by kronos.alcnet.com (8.9.3/8.9.3/antispam) with SMTP id PAA39815 for ; Tue, 3 Aug 1999 15:10:15 -0400 (EDT) From: "Kelly Yancey" To: Subject: Results of investigating optimizing calloc()... Date: Tue, 3 Aug 1999 15:10:15 -0400 Message-ID: <001e01bedde3$d1af64c0$291c453f@kbyanc.alcnet.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Importance: Normal Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Sorry for posting this out-of-the-blue, I meant to post it last week while the thread was still fresh, but I got distracted by work and other projects, so here it is a tad late. For those of you who hadn't been following the previous calloc() thread: I had theorized that we could see an improvement in the performance of calloc() by optimizing the case where there were no free memory was on the heap and sbrk() was called to extend the process's address space. Since sbrk() ensures that the newly available virtual address space is zeroed, it seemed logical that calloc() could just return memory from that space. Currently calloc() just calls malloc() which either returns free memory already on the heap or extends the heap (using sbrk() ). Calloc() then calls bzero() to clear the memory before returning it. The idea was to make calloc() as smart as malloc() so it didn't waste its time bzero()ing memory that was already cleared (courtesy of the kernel and sbrk() ). However, after spending several hours wrapping my head around the malloc() implementation (btw, thanks need to go to Poul-Henning Kamp for the great malloc we have...someone should by him a beer ;) ), I realized as Mattew Dillon and Dag-Erling Smorgrav predicted that this was fruitless effort. The reason: the best case I could hope for calloc() would be the worst case for malloc(). This is to say, that with all the work to make calloc() smarter, the best it could hope for would be to need to sbrk() more memory (relatively rare I discovered through some diagnostic printf()s); by comparison that is the worst thing malloc() ever has to do (often just pulling memory out of a free list on the heap to serve the request). But what about paging, that was the other concern. calloc() currently touches all of the pages of memory in order to zero them. Well, this is the only place where making calloc() smarter would help. The results below illustrate the point: All tests were performed on a 486/66 with 16M of RAM running FreeBSD-3.2: malloc calloc malloc+wrt calloc+wrt real time 0.31 77.33 75.76 100.67 user time 0.11 15.99 16.45 32.00 sys time 0.18 37.47 37.37 37.91 maximum resident set size 308 1324 1252 1284 average shared memory size 4 4 4 4 average unshared data size 477 744 744 744 average unshared stack size 37 129 128 128 page reclaims 0 124799 124788 124780 page faults 0 41 0 0 swaps 0 0 0 0 voluntary context switches 0 0 0 0 involuntary context switches 3 604 593 761 As you can see, calloc() takes drastically longer to allocate the same amount of memory as malloc() does. The difference being that calloc() touches every page (look at the huge number of page reclaims). But as the latter 2 columns show, if you actually ever use the memory, then you are pretty much even. We expect that writing to all the malloc()'ed memory should be the same as calloc() since that is how calloc() is implemented... and sure enough it is. And we see that writing to calloc()'ed memory is only as expensive as the write itself (no change in sys time nor page reclaims). If you consider the vast amount of memory that was written (1000! * 1024 bytes = 4.12e+2570 bytes) in the tests, then you can see how little overhead the write itself actually is for real-world programs. So what does it all mean: it means I was wrong. We really don't stand to gain anything from making calloc() smarter. At best I could have hoped to optimize the rare case of needing to sbrk() more memory to fulfill a calloc() request, and even then we see that the optimization isn't that great. If anyone is interested, following are the actually test programs. Kelly ~kbyanc@posi.net~ FreeBSD - The Power To Serve - http://www.freebsd.org/ Join Team FreeBSD - http://www.posi.net/freebsd/Team-FreeBSD/ /* malloctest.c */ #include void main() { int i; void *loc; for(i=0; i<1000; i++) { loc=malloc(i * 1024); free(loc); } } /* calloctest.c */ #include void main() { int i; void *loc; for(i=0; i<1000; i++) { loc=calloc(i, 1024); free(loc); } } /* malloctest2.c */ #include #include void main() { int i; void *loc; for(i=0; i<1000; i++) { loc=malloc(i * 1024); memset(loc, 'A', i * 1024); free(loc); } } /* calloctest2.c */ #include #include void main() { int i; void *loc; for(i=0; i<1000; i++) { loc=calloc(i, 1024); memset(loc, 'A', i * 1024); free(loc); } } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message