Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Aug 2020 08:24:12 +0200
From:      Julian Grajkowski <julian.grajkowski@gmail.com>
To:        freebsd-drivers@freebsd.org
Subject:   contigmalloc and contigfree issue
Message-ID:  <CAGQdsJh%2BOxU-MDmVX5rPQuRf_YAM7YjBdoOpdHszQes-spZZxA@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi,


I am currently developing a driver, which uses quite large (2M+) contiguous
memory chunks allocations via contigmalloc. I have noticed that
contigmalloc as well as contigfree take a lot of time when dealing with
large memory chunks and also a rather significant difference in number of
cycles required for freeing the memory via contigfree between FreeBSD 11.3
and 12.1.


I have checked this on the same machine with both 11.3-p11 and 12.1-p7 by
running contigmalloc (with and without M_NOWAIT)  and contigfree in a loop
40 times and the average results are following:


*11.3-p11*


*M_NOWAIT*

*contigmalloc 179112 contigfree 211302*


*No flag*

*contigmalloc 176541 contigfree 199462*


*12.1-p7*


*M_NOWAIT*

*contigmalloc 171769 contigfree 127562*


*No flag*

*contigmalloc 171974 contigfree 127448*


Following code was used for measuring:


[code]

static void runTest() {


        const int pageSize = 4096;

        unsigned long size;

        void *ptr = NULL;

        unsigned long long t1, t2;

        int i = 0;

        unsigned long long avg_malloc = 0, avg_free = 0;

        int probes = 40;

        size = 2 * 1024 * 1024;


        for (i = 0; i < probes; i++) {


                t1 = rdtsc();

                ptr = contigmalloc(size, M_FOO, M_NOWAIT, 0, ~0,

                                   pageSize, 0);

                t2 = rdtsc();


                avg_malloc += t2 - t1;


                printf("contigmalloc cycles: %llu\n", t2-t1);


                if (ptr == NULL)

                        return;


                t1 = rdtsc();

                contigfree(ptr, size, M_FOO);

                t2 = rdtsc();

                avg_free += t2 - t1;

                printf("contigfree cycles: %llu\n\n", t2-t1);

                uprintf("Loop complete\n");

        }


        printf("contigmalloc %llu contigfree %llu\n", avg_malloc / probes,
avg_free / probes);

}

[/code]


The measurements were done in a following setup:


CPU: Intel(R) Atom(TM) CPU C3958 @ 2.00GHz (2000.06-MHz K8-class CPU)

FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs


8GB RAM (2x4GB):

        Type: DDR4

        Type Detail: Synchronous Unbuffered (Unregistered)

        Speed: 2400 MT/s


Could someone please provide some explanation of discussed performance of
contigmalloc and contigfree? Also, what could be the reason for improved
performance of contigfree on 12.1? I still have not figured out what is the
source of this behavior. Is performance of contigmalloc and contigfree a
known issue? I have not found any info about this problem on the forum.


Please find the complete source of the module used for measurements in
attachment.


Thank you in advance for any help.


Kind regards,

Julian



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGQdsJh%2BOxU-MDmVX5rPQuRf_YAM7YjBdoOpdHszQes-spZZxA>