Date: Sat, 3 Jul 2021 14:35:59 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Vitaliy Gusev <gusev.vitaliy@gmail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: madvise(MADV_FREE) doesn't work in some cases? Message-ID: <YOBLn/XHpmEBfAdw@kib.kiev.ua> In-Reply-To: <0A95973D-254A-4574-8DC7-9F515F60B873@gmail.com> References: <0A95973D-254A-4574-8DC7-9F515F60B873@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jul 03, 2021 at 02:32:01AM +0300, Vitaliy Gusev wrote: > Hi, > > I came across not expected behaviour with madvise() in FreeBSD. > > Attached test program mmapfork does: mmap, fork, touch memory and then madvise(MADV_FREE). > > Expected behaviour - one process can allocate memory (lazy allocation) while system is freeing previously allocated memory for a second process. > > Current behaviour - system kills one process with message in dmesg: > > pid 31314 (mmapfork), jid 0, uid 1001, was killed: out of swap space > > Running this test in Linux or illumos shows expected behaviour with a little difference in illumos - it frees memory almost immediately, w/o needs lack of memory in a system. > > If use MADV_NOTNEED - no changes. > > If modify program and do not do fork(), but run two instances - that shows expected behaviour. > > To reproduce just disable swap, and run program with argument as 1/2 RAM on a system. For instance, command below will try run and use ~ 2GB area twice. > > [vetal@bsdev ~]$ ./mmapfork 2000 > > Testing program is attached. > > Note, during testing I disabled swap on all systems: Linux, illumos and FreeBSD. > > Does it mean madvise() doesn't work well in FreeBSD or test does something wrong? Your program does not exactly what you described above. There is a generic race to consume memory, and some specific details about madvise(2) on FreeBSD. >From the code, you do: - mmap anonymous private region - fork - both child and parent start touching the mmaped region. Two processes race to consume 1/2 of RAM on your system. If one of them happen to execute faster then another, you do get to the case where one of them does madvise(). But it could be that processes execute in lockstep, and try to eat all the memory before going to madvise(). Did you excluded this case? Now, about the specific of madvise(MADV_FREE) on FreeBSD. Due to the way CoW is implemented with the shadow chain of objects, we cannot drop the top of the shadow chain, otherwise instead of returning zeroed pages next time, we would return content back in the time. It was relatively recent discovery, see bf5661f4a1af6931ec4b6, PR 240061. To explain it in simplified form, when there is potential old content under the CoW copy for the mapping, we cannot drop CoW-ed pages. This is the motivation why madvise(MADV_FREE) does nothing for your program. When you run two instances without fork, there is no previous content and no Cow, so madvise() can safely remove the pages from the object, and on the next access they are zero-filled. You can read more details in the referenced commit, as well as some musings about way to make it somewhat better. I must say, that trying to allocated 1/2 + 1/2 of RAM this way, on a system without swap, is the way to ask for troubles anyway.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YOBLn/XHpmEBfAdw>