Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Jul 2021 19:54:00 +0300
From:      Vitaliy Gusev <gusev.vitaliy@gmail.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-hackers@freebsd.org, Mark Johnston <markj@freebsd.org>
Subject:   Re: madvise(MADV_FREE) doesn't work in some cases?
Message-ID:  <57BCE463-6200-4F83-A321-2F0444E7F063@gmail.com>
In-Reply-To: <YOBLn/XHpmEBfAdw@kib.kiev.ua>
References:  <0A95973D-254A-4574-8DC7-9F515F60B873@gmail.com> <YOBLn/XHpmEBfAdw@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_3BEFE3ED-1F3C-4A6B-AAC6-021F01426D22
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Hi,


> On 3 Jul 2021, at 14:35, Konstantin Belousov <kostikbel@gmail.com> =
wrote:
>=20
> On Sat, Jul 03, 2021 at 02:32:01AM +0300, Vitaliy Gusev wrote:
>> ...
>> Does it mean madvise() doesn't work well in FreeBSD or test does =
something wrong?
>=20
> Your program does not exactly what you described above.  There is a =
generic
> race to consume memory, and some specific details about madvise(2) on =
FreeBSD.
>=20
> =46rom the code, you do:
> - mmap anonymous private region
> - fork
> - both child and parent start touching the mmaped region.

Their execution should be serialised by sleeps. Yes it is not fully =
fair, but for testing purpose is enough.


> Two processes race to consume 1/2 of RAM on your system.  If one of
> them happen to execute faster then another, you do get to the case =
where
> one of them does madvise().  But it could be that processes execute in
> lockstep, and try to eat all the memory before going to madvise().
> Did you excluded this case?
>=20

I believe I did all things right. You can see sleeps that serialise =
execution. To check again I modified test and added time printing and =
MADV_DONTNEED:

Here is source  http://cpp.sh/2rd4f <http://cpp.sh/2rd4f>; and I put it =
at the end of this email.

I=E2=80=99ve run:=20

$ ./mmapfork 2300
mmap 0x801000000 pid 40628
end 0x890c00000 len 0x8fc00000
pid 40628
pid 40629
40629: [1625500831] touch
40629: [1625500832] sleep before madvise
40629: [1625500833] madvise
40629: [1625500834] Press enter to exit
40628: [1625500845] touch
40628: [1625500846] sleep before madvise
40628: [1625500851] madvise
40628: [1625500852] Press enter to exit

And you can see that child (40628) started running in 11 seconds after =
parent had already called madvise() for all scope of touched memory.

And finally in dmesg:

pid 40629 (mmapfork), jid 0, uid 1001, was killed: out of swap space

So the same result as I wrote in the first email.


> Now, about the specific of madvise(MADV_FREE) on FreeBSD.  Due to the =
way
> CoW is implemented with the shadow chain of objects, we cannot drop =
the
> top of the shadow chain, otherwise instead of returning zeroed pages =
next
> time, we would return content back in the time.  It was relatively =
recent
> discovery, see bf5661f4a1af6931ec4b6, PR 240061.
>=20

Thanks, I will look at it.

> To explain it in simplified form, when there is potential old content
> under the CoW copy for the mapping, we cannot drop CoW-ed pages. This
> is the motivation why madvise(MADV_FREE) does nothing for your =
program.
> When you run two instances without fork, there is no previous content
> and no Cow, so madvise() can safely remove the pages from the object,
> and on the next access they are zero-filled.
>=20

Do I understand right, that it should work with MADV_DONTNEED? But =
=E2=80=9Cdontneed" variant doesn=E2=80=99t work.=20

> You can read more details in the referenced commit, as well as some =
musings
> about way to make it somewhat better.
>=20
> I must say, that trying to allocated 1/2 + 1/2 of RAM this way, on a =
system
> without swap, is the way to ask for troubles anyway.



I=E2=80=99ve just notify that other operation systems work well with =
that, whereas FreeBSD has troubles.

Probably something in madvise() has not been finished yet?

=E2=80=94=E2=80=94

#include <sys/mman.h>
#include <err.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <time.h>

int main(int argc, char *argv[])
{
        size_t len =3D (size_t)(argc > 1 ? atoi(argv[1]) : 1024) * 1024 =
* 1024;
        uint8_t *ptr, *end, *p;
        unsigned pagesz =3D 1<<12;
        int pid;

        ptr =3D (uint8_t *)mmap(NULL, len, PROT_WRITE | PROT_READ, =
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
        if (ptr =3D=3D MAP_FAILED)
                err(1, "cannot mmap");

        end =3D ptr + len;
        printf("mmap %p pid %d\n", ptr, getpid());
        printf("end %p len %#lx\n", end, len);

        fflush(stdout);

        pid =3D fork();

        if (pid < 0)
                err(1, "cannot fork");

        printf("pid %d\n", getpid());

        sleep(pid =3D=3D 0 ? 1 : 15);

        printf("%d: [%ld] touch\n", getpid(), time(NULL));

        p =3D ptr;
        while (p < end) {
                *p =3D 1;
                p +=3D pagesz;
        }

        printf("%d: [%ld] sleep before madvise\n", getpid(), =
time(NULL));
        sleep(pid =3D=3D 0 ? 1 : 5);
        printf("%d: [%ld] madvise\n", getpid(), time(NULL));

        p =3D ptr;
        while (p < end) {
                int error;

                error =3D madvise(p, pagesz, MADV_DONTNEED);
                if (error) {
                        err(1, "cannot madvise");
                }
                p +=3D pagesz;
        }

        printf("%d: [%ld] Press enter to exit\n", getpid(), time(NULL));
        getchar();
}


--Apple-Mail=_3BEFE3ED-1F3C-4A6B-AAC6-021F01426D22--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?57BCE463-6200-4F83-A321-2F0444E7F063>