Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Nov 2002 14:34:07 -0800 (PST)
From:      Nate Lawson <nate@root.org>
To:        Jeff Roberson <jroberson@chesapeake.net>
Cc:        arch@FreeBSD.ORG
Subject:   Re: malloc(9) performance
Message-ID:  <Pine.BSF.4.21.0211061429500.16185-100000@root.org>
In-Reply-To: <20021106163703.X1374-100000@mail.chesapeake.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 6 Nov 2002, Jeff Roberson wrote:
> On Wed, 6 Nov 2002, Nate Lawson wrote:
> 
> > On Wed, 6 Nov 2002, Terry Lambert wrote:
> > > Nate Lawson wrote:
> > > > (*) For me, fast enough means sustaining 100000 small (400 byte) and big
> > > > (64k) allocations/frees per second (one every 10 us).  Maximum memory in
> > > > use at any point in time would be a few MB.  Latency is my main concern
> > > > and memory fragmentation much less so.
> > >
> > > For me, the number is closer to 600,000: theoretically, this should
> > > be the combined mbuf and tcpcb connection struct allocation rate,
> > > given the maximum possible connection per second rate on a Gigabit
> > > ethernet's packets-per-secon throughput.
> >
> > I'm going to get really flamed for this but I did a short test of the
> > userland malloc and found that I get 33000 mallocs/sec when grabbing 64k
> > (and then touching one byte on each page) compared to 1800/s when I add in
> > the small mallocs (one 608 byte, one 400 byte).  So I may be sticking with
> > a private freelist for the small allocations and use malloc for the large
> > buf.  Of course, profiling and real tests would be much better but I just
> > need to make a quick decision for now.
> >
> > -Nate
>
> I do not currently have performance numbers for malloc(9).  They will not
> have anything at all to do with the performance of the library malloc.  It

I understand this.  I only posted because I was surprised by the 18x
difference between 1 large malloc(3) and that plus two small malloc(3)'s.

> has to do semi expensive system calls to do its job.  

I don't see how the syscall would result in 18x degradation, especially
since once the brk() happens, the allocation should just be a list
access.  But I am not familiar with the internals of either malloc(3) or
(9).

> I only did system
> level benchmarks against the old malloc and it did very well.  The fast
> path for malloc is only the overhead of a lock, poping an item off of a
> stack, and an unlock.  It should be quite fast.

Agreed.

> It would be trivial to write a module that would time malloc and print out
> some results.  Could you try this?  If not I can put it on my to do list.

I'll take care of this and then you can point out any degradation in my
approach.

-Nate


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0211061429500.16185-100000>