Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Aug 1999 15:10:15 -0400
From:      "Kelly Yancey" <kbyanc@alcnet.com>
To:        <freebsd-hackers@freebsd.org>
Subject:   Results of investigating optimizing calloc()...
Message-ID:  <001e01bedde3$d1af64c0$291c453f@kbyanc.alcnet.com>

next in thread | raw e-mail | index | archive | help

  Sorry for posting this out-of-the-blue, I meant to post it last week
while the thread was still fresh, but I got distracted by work and other
projects, so here it is a tad late.

  For those of you who hadn't been following the previous calloc() thread:
  I had theorized that we could see an improvement in the performance of
calloc() by optimizing the case where there were no free memory was on the
heap and sbrk() was called to extend the process's address space. Since
sbrk() ensures that the newly available virtual address space is zeroed, it
seemed logical that calloc() could just return memory from that space.
  Currently calloc() just calls malloc() which either returns free memory
already on the heap or extends the heap (using sbrk() ). Calloc() then calls
bzero() to clear the memory before returning it. The idea was to make
calloc() as smart as malloc() so it didn't waste its time bzero()ing memory
that was already cleared (courtesy of the kernel and sbrk() ).

  However, after spending several hours wrapping my head around the malloc()
implementation (btw, thanks need to go to Poul-Henning Kamp for the great
malloc we have...someone should by him a beer ;) ), I realized as
Mattew Dillon and Dag-Erling Smorgrav predicted that this was fruitless
effort. The reason: the best case I could hope for calloc() would be the
worst case for malloc(). This is to say, that with all the work to make
calloc() smarter, the best it could hope for would be to need to sbrk()
more memory (relatively rare I discovered through some diagnostic
printf()s);
by comparison that is the worst thing malloc() ever has to do (often just
pulling memory out of a free list on the heap to serve the request).

  But what about paging, that was the other concern. calloc() currently
touches all of the pages of memory in order to zero them. Well, this is the
only place where making calloc() smarter would help. The results below
illustrate the point:

All tests were performed on a 486/66 with 16M of RAM running FreeBSD-3.2:

                              malloc  calloc malloc+wrt calloc+wrt
                   real time    0.31   77.33    75.76     100.67
                   user time    0.11   15.99    16.45      32.00
                    sys time    0.18   37.47    37.37      37.91
   maximum resident set size     308    1324     1252       1284
  average shared memory size       4       4        4          4
  average unshared data size     477     744      744        744
 average unshared stack size      37     129      128        128
               page reclaims       0  124799   124788     124780
                 page faults       0      41        0          0
                       swaps       0       0        0          0
  voluntary context switches       0       0        0          0
involuntary context switches       3     604      593        761


  As you can see, calloc() takes drastically longer to allocate the same
amount of memory as malloc() does. The difference being that calloc()
touches every page (look at the huge number of page reclaims). But as the
latter 2 columns show, if you actually ever use the memory, then you are
pretty much even. We expect that writing to all the malloc()'ed memory
should be the same as calloc() since that is how calloc() is implemented...
and sure enough it is. And we see that writing to calloc()'ed memory is
only as expensive as the write itself (no change in sys time nor page
reclaims). If you consider the vast amount of memory that was written
(1000! * 1024 bytes = 4.12e+2570 bytes) in the tests, then you can see
how little overhead the write itself actually is for real-world programs.

  So what does it all mean: it means I was wrong. We really don't stand
to gain anything from making calloc() smarter. At best I could have hoped
to optimize the rare case of needing to sbrk() more memory to fulfill a
calloc() request, and even then we see that the optimization isn't that
great. If anyone is interested, following are the actually test programs.

  Kelly
 ~kbyanc@posi.net~
  FreeBSD - The Power To Serve - http://www.freebsd.org/
  Join Team FreeBSD - http://www.posi.net/freebsd/Team-FreeBSD/


/* malloctest.c */
#include <stdlib.h>

void main() {
  int i;
  void *loc;

  for(i=0; i<1000; i++) {
    loc=malloc(i * 1024);
    free(loc);
  }
}

/* calloctest.c */
#include <stdlib.h>

void main() {
  int i;
  void *loc;

  for(i=0; i<1000; i++) {
    loc=calloc(i, 1024);
    free(loc);
  }
}

/* malloctest2.c */
#include <stdlib.h>
#include <string.h>

void main() {
  int i;
  void *loc;

  for(i=0; i<1000; i++) {
    loc=malloc(i * 1024);
    memset(loc, 'A', i * 1024);
    free(loc);
  }
}

/* calloctest2.c */
#include <stdlib.h>
#include <string.h>

void main() {
  int i;
  void *loc;

  for(i=0; i<1000; i++) {
    loc=calloc(i, 1024);
    memset(loc, 'A', i * 1024);
    free(loc);
  }
}



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?001e01bedde3$d1af64c0$291c453f>