Date: Mon, 5 Jul 2021 19:54:00 +0300 From: Vitaliy Gusev <gusev.vitaliy@gmail.com> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-hackers@freebsd.org, Mark Johnston <markj@freebsd.org> Subject: Re: madvise(MADV_FREE) doesn't work in some cases? Message-ID: <57BCE463-6200-4F83-A321-2F0444E7F063@gmail.com> In-Reply-To: <YOBLn/XHpmEBfAdw@kib.kiev.ua> References: <0A95973D-254A-4574-8DC7-9F515F60B873@gmail.com> <YOBLn/XHpmEBfAdw@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_3BEFE3ED-1F3C-4A6B-AAC6-021F01426D22 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi, > On 3 Jul 2021, at 14:35, Konstantin Belousov <kostikbel@gmail.com> = wrote: >=20 > On Sat, Jul 03, 2021 at 02:32:01AM +0300, Vitaliy Gusev wrote: >> ... >> Does it mean madvise() doesn't work well in FreeBSD or test does = something wrong? >=20 > Your program does not exactly what you described above. There is a = generic > race to consume memory, and some specific details about madvise(2) on = FreeBSD. >=20 > =46rom the code, you do: > - mmap anonymous private region > - fork > - both child and parent start touching the mmaped region. Their execution should be serialised by sleeps. Yes it is not fully = fair, but for testing purpose is enough. > Two processes race to consume 1/2 of RAM on your system. If one of > them happen to execute faster then another, you do get to the case = where > one of them does madvise(). But it could be that processes execute in > lockstep, and try to eat all the memory before going to madvise(). > Did you excluded this case? >=20 I believe I did all things right. You can see sleeps that serialise = execution. To check again I modified test and added time printing and = MADV_DONTNEED: Here is source http://cpp.sh/2rd4f <http://cpp.sh/2rd4f> and I put it = at the end of this email. I=E2=80=99ve run:=20 $ ./mmapfork 2300 mmap 0x801000000 pid 40628 end 0x890c00000 len 0x8fc00000 pid 40628 pid 40629 40629: [1625500831] touch 40629: [1625500832] sleep before madvise 40629: [1625500833] madvise 40629: [1625500834] Press enter to exit 40628: [1625500845] touch 40628: [1625500846] sleep before madvise 40628: [1625500851] madvise 40628: [1625500852] Press enter to exit And you can see that child (40628) started running in 11 seconds after = parent had already called madvise() for all scope of touched memory. And finally in dmesg: pid 40629 (mmapfork), jid 0, uid 1001, was killed: out of swap space So the same result as I wrote in the first email. > Now, about the specific of madvise(MADV_FREE) on FreeBSD. Due to the = way > CoW is implemented with the shadow chain of objects, we cannot drop = the > top of the shadow chain, otherwise instead of returning zeroed pages = next > time, we would return content back in the time. It was relatively = recent > discovery, see bf5661f4a1af6931ec4b6, PR 240061. >=20 Thanks, I will look at it. > To explain it in simplified form, when there is potential old content > under the CoW copy for the mapping, we cannot drop CoW-ed pages. This > is the motivation why madvise(MADV_FREE) does nothing for your = program. > When you run two instances without fork, there is no previous content > and no Cow, so madvise() can safely remove the pages from the object, > and on the next access they are zero-filled. >=20 Do I understand right, that it should work with MADV_DONTNEED? But = =E2=80=9Cdontneed" variant doesn=E2=80=99t work.=20 > You can read more details in the referenced commit, as well as some = musings > about way to make it somewhat better. >=20 > I must say, that trying to allocated 1/2 + 1/2 of RAM this way, on a = system > without swap, is the way to ask for troubles anyway. I=E2=80=99ve just notify that other operation systems work well with = that, whereas FreeBSD has troubles. Probably something in madvise() has not been finished yet? =E2=80=94=E2=80=94 #include <sys/mman.h> #include <err.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <time.h> int main(int argc, char *argv[]) { size_t len =3D (size_t)(argc > 1 ? atoi(argv[1]) : 1024) * 1024 = * 1024; uint8_t *ptr, *end, *p; unsigned pagesz =3D 1<<12; int pid; ptr =3D (uint8_t *)mmap(NULL, len, PROT_WRITE | PROT_READ, = MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); if (ptr =3D=3D MAP_FAILED) err(1, "cannot mmap"); end =3D ptr + len; printf("mmap %p pid %d\n", ptr, getpid()); printf("end %p len %#lx\n", end, len); fflush(stdout); pid =3D fork(); if (pid < 0) err(1, "cannot fork"); printf("pid %d\n", getpid()); sleep(pid =3D=3D 0 ? 1 : 15); printf("%d: [%ld] touch\n", getpid(), time(NULL)); p =3D ptr; while (p < end) { *p =3D 1; p +=3D pagesz; } printf("%d: [%ld] sleep before madvise\n", getpid(), = time(NULL)); sleep(pid =3D=3D 0 ? 1 : 5); printf("%d: [%ld] madvise\n", getpid(), time(NULL)); p =3D ptr; while (p < end) { int error; error =3D madvise(p, pagesz, MADV_DONTNEED); if (error) { err(1, "cannot madvise"); } p +=3D pagesz; } printf("%d: [%ld] Press enter to exit\n", getpid(), time(NULL)); getchar(); } --Apple-Mail=_3BEFE3ED-1F3C-4A6B-AAC6-021F01426D22--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?57BCE463-6200-4F83-A321-2F0444E7F063>