Date: Wed, 22 Mar 2017 15:09:23 -0700 From: Mark Millard <markmi@dsl-only.net> To: freebsd-hackers@freebsd.org Cc: freebsd-arm <freebsd-arm@freebsd.org> Subject: fork-then-swap-out [zero RES(ident memory)] questions tied to arm64 failures (zeroed memory pages) Message-ID: <59C6BC12-1BC2-41D5-8B47-D0AD44D2CDF0@dsl-only.net>
next in thread | raw e-mail | index | archive | help
The later questions are associated with: Bugzilla 217239 and 217138 (which I now expect have a common cause) https://lists.freebsd.org/pipermail/freebsd-arm/2017-March/015867.html (and its thread) These are tied to some process memory pages being trashed (to be zero) in particular types of arm64 contexts. This is reproducible in multiple arm64 contexts. The context is head but I believe there are reports in the lists tied to 11 as well. [Unfortunately the above all very much shows a learn-as-I-go property. Also the list has a sub-exchange on my testing other devices to check for device failures that is not directly relevant here.] These are tied to problems with fork-then-swap-out-then-swap-in contexts on arm64. (Even though I've occasionally typed amd64 accidentally in places in those materials.) Memory allocations from before the fork are involved, ones not yet accessed by the child side of the fork at the time of the fork. fork sets up copy-on-write so that the child process temporarily shares pages (those it does not write), or should. But what if the parent process or both parent and child are swapped-out just shortly after the fork (so, say, top -PCwaopid shows zero for RES(ident memory)? What is the handling of, say, the child swapping back in while the parent still is swapped out? I notice that the child can have a much smaller SWAP figure than the parent so it would appear that the parent swap-out has pages that the child does not. So what if the child needs some of those pages? What should happen? (Vs. what does happen on arm64 in specific types of contexts? More below.) I ask mostly to see if I can do more to give evidence of what is going on and what to test for any proposed fix. I'm not likely to find the underlying problem(s) for arm64 directly, unlike my investigation that lead to fork-trampoline being fixed in head's -r313772 (2017-Feb-15). [ https://lists.freebsd.org/pipermail/freebsd-arm/2017-February/015656.html and its thread, including when its title changed in: https://lists.freebsd.org/pipermail/freebsd-arm/2017-February/015678.html .] Part of that unlikely-to-solve status is because the context seems to be bound to a lot of special conditions and interesting behaviors simultaneously: A) Both my original reproductions of problem reports on the lists and the only (simple) programs for reproducing the probablems involve fork-then-swap-out [zero RES(ident memory)]. Neither fork by itself nor swap-out/in by itself have been sufficient. B) jemalloc's tcache being in use (__je_tcache_maxclass == 32*1024) is part of every example of reproduction of the problem. C) allocations <= SMALL_MAXCLASS (SMALL_MAXCLASS==14*1024) get the failure (but bigger ones work, both fitting inside __je_tcache_maxclass and not). Again: every example reproduction of the problem has this status. D) FreeBSD sometimes gets into a state where /etc/malloc.conf doing tcache:false does not seem to disable tcache. (Rebooting goes back to tcache:false working after such has been observed.) [Related or independent? I've no clue.] Usually tcache:false seems to work and so avoid the failures. E) Use of POSIX_MADV_WILLNEED on the problematical allocation(s) in the child process after the fork but before the swap-outs of the child and parent prevents the failures (no read or write access to the memory from the child until after the swap-in). Doing so just in the parent process does not prevent the failures. F) Similar to (E) but read-accessing a byte or more of one or more pages from the problematical allocations from the child process after the fork but before the swap-out makes those specific pages not fail. (The others still fail, if any.) Done from the parent process instead does not avoid the failures. G) In a sequence like: su creates a sh which then runs one of my test programs that then forks off a child it can be that all of the 4 processes show the zeroed memory area like the child process does. su and sh need to have swapped-out and back in for them to get failures. su and sh die once they hit an assert that fails due to the zeroed memory page(s). The asserts involve addresses also messed up in the test program processes (parent and child). In my reading I've not been able to determine what to expect for fork-then-swap-out-and-back-in for pages that the child process had not accessed yet but which might not be around for later activity because of the parent process's own swapped-out status at the time. Note: While I usually run a non-debug kernel I've tried a debug kernel and it provided no notices of problems. I got no additional information from the attempt. [My usual KERNCONF file includes GENERIC and then disables various debug items.] The bugzilla reports have example code for showing the problems and various behaviors. The two in 217239 are probably of more interest than the first one on 217138. === Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?59C6BC12-1BC2-41D5-8B47-D0AD44D2CDF0>