Date: Wed, 6 Nov 2002 14:34:07 -0800 (PST) From: Nate Lawson <nate@root.org> To: Jeff Roberson <jroberson@chesapeake.net> Cc: arch@FreeBSD.ORG Subject: Re: malloc(9) performance Message-ID: <Pine.BSF.4.21.0211061429500.16185-100000@root.org> In-Reply-To: <20021106163703.X1374-100000@mail.chesapeake.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 6 Nov 2002, Jeff Roberson wrote: > On Wed, 6 Nov 2002, Nate Lawson wrote: > > > On Wed, 6 Nov 2002, Terry Lambert wrote: > > > Nate Lawson wrote: > > > > (*) For me, fast enough means sustaining 100000 small (400 byte) and big > > > > (64k) allocations/frees per second (one every 10 us). Maximum memory in > > > > use at any point in time would be a few MB. Latency is my main concern > > > > and memory fragmentation much less so. > > > > > > For me, the number is closer to 600,000: theoretically, this should > > > be the combined mbuf and tcpcb connection struct allocation rate, > > > given the maximum possible connection per second rate on a Gigabit > > > ethernet's packets-per-secon throughput. > > > > I'm going to get really flamed for this but I did a short test of the > > userland malloc and found that I get 33000 mallocs/sec when grabbing 64k > > (and then touching one byte on each page) compared to 1800/s when I add in > > the small mallocs (one 608 byte, one 400 byte). So I may be sticking with > > a private freelist for the small allocations and use malloc for the large > > buf. Of course, profiling and real tests would be much better but I just > > need to make a quick decision for now. > > > > -Nate > > I do not currently have performance numbers for malloc(9). They will not > have anything at all to do with the performance of the library malloc. It I understand this. I only posted because I was surprised by the 18x difference between 1 large malloc(3) and that plus two small malloc(3)'s. > has to do semi expensive system calls to do its job. I don't see how the syscall would result in 18x degradation, especially since once the brk() happens, the allocation should just be a list access. But I am not familiar with the internals of either malloc(3) or (9). > I only did system > level benchmarks against the old malloc and it did very well. The fast > path for malloc is only the overhead of a lock, poping an item off of a > stack, and an unlock. It should be quite fast. Agreed. > It would be trivial to write a module that would time malloc and print out > some results. Could you try this? If not I can put it on my to do list. I'll take care of this and then you can point out any degradation in my approach. -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0211061429500.16185-100000>