Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Jul 2011 04:07:19 +0200
From:      Davide Italiano <davide.italiano@gmail.com>
To:        freebsd-hackers@freebsd.org
Subject:   UMA large allocations issues
Message-ID:  <CACYV=-G%2BUzw=q8WXqO_D8GLbrEGqKC0H_HL1U_wQCFPH9CeypQ@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi.
I'm a student and some time ago I started investigating a bit about
the performance/fragmentation issue of large allocations within the
UMA allocator.
Benckmarks showed up that this problems of performances are mainly
related to the fact that every call to uma_large_malloc() results in a
call to kmem_malloc(), and this behaviour is really inefficient.

I started doing some work. Here's somethin:
First of all, I tried to define larger zones and let uma do it all as
a first step.
UMA can allocate slabs of more than one page. So I tried to define
zones of 1,2,4,8 pages, moving ZMEM_KMAX up.
I tested the solution w/ raidtest. Here there are some numbers.

Here's the workload characterization:


set mediasize=`diskinfo /dev/zvol/tank/vol | awk '{print $3}'`
set sectorsize=`diskinfo /dev/zvol/tank/vol | awk '{print $2}'`
raidtest genfile -s $mediasize -S $sectorsize -n 50000

# $mediasize = 10737418240
# $sectorsize = 512

Number of READ requests: 24924
Number of WRITE requests: 25076
Numbers of bytes to transmit: 3305292800


raidtest test -d /dev/zvol/tank/vol -n 4
## tested using 4 cores, 1.5 GB Ram

Results:
Number of processes: 4
Bytes per second: 10146896
Requests per second: 153

Results: (4* PAGE_SIZE)
Number of processes: 4
Bytes per second: 14793969
Requests per second: 223

Results: (8* PAGE_SIZE)
Number of processes: 4
Bytes per second: 6855779
Requests per second: 103


The result of this tests is that defining larger zones is useful until
the size of these zones is not too big. After some size, performances
decreases significantly.

As second step, alc@ proposed to create a new layer that sits between
UMA and the VM subsystem. This layer can manage a pool of chunk that
can be used to satisfy requests from uma_large_malloc so avoiding the
overhead due to kmem_malloc() calls.

I've recently started developing a patch (not yet full working) that
implements this layer. First of all I'd like to concentrate my
attention to the performance problem rather than the fragmentation
one. So the patch that actually started to write doesn't care about
fragmentation aspects.

http://davit.altervista.org/uma_large_allocations.patch

There are some questions to which I wasn't able to answer (for
example, when it's safe to call kmem_malloc() to request memory).

So, at the end of the day I'm asking for your opinion about this issue
and I'm looking for a "mentor" (some kind of guidance) to continue
this project. If someone is interested to help, it would be very
appreciated.

Best

Davide Italiano



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACYV=-G%2BUzw=q8WXqO_D8GLbrEGqKC0H_HL1U_wQCFPH9CeypQ>