Date: Sat, 10 Feb 2018 05:12:11 +0000 From: <Elliott.Rabe@dell.com> To: <freebsd-hackers@freebsd.org> Cc: <markj@FreeBSD.org>, <kib@FreeBSD.org>, <alc@FreeBSD.org>, <Eric.Van.Gyzen@dell.com> Subject: Stale memory during post fork cow pmap update Message-ID: <5A7E7F2B.80900@dell.com>
index | next in thread | raw e-mail
Greetings- I've been hunting for the root cause of elusive, slight memory corruptions in a large, complex process that manages many threads. All failures and experimentation thus far has been on x86_64 architecture machines, and pmap_pcid is not in use. I believe I have stumbled into a very unlikely race condition in the way the vm code updates the pmap during write fault processing following a fork of the process. In this situation, when the process is forked, appropriate vm entries are marked copy-on-write. One such entry allocated by static process initialization is frequently used by many threads in the process. This makes it a prime candidate to write-fault shortly after a fork system call is made. In this scenario, such a fault normally burdens the faulting thread with the task of allocating a new page, entering the page as part of managed memory, and updating the pmap with the new physical address and the change to writeable status. This action is followed with an invalidation of the TLB on the current CPU, and in this case is also followed by IPI_INVLPG IPIs to do the same on other CPUs (there are often many active threads in this process). Before this remote TLB invalidation has completed, other CPUs are free to act on either the old OR new page characteristics. If other threads are alive and using contents of the faulting page on other CPUs, bad things can occur. In one simplified and somewhat contrived example, one thread attempts to write to a location on the faulting page under the protection of a lock while another thread attempts to read from the same location twice in succession under the protection of the same lock. If both the writing thread and reading thread are running on different CPUs, and if the write is directed to the new physical address, the reads may come from different physical addresses if a TLB invalidation occurs between them. This seemingly violates the guarantees provided by the locking primitives and can result in subtle memory corruption symptoms. It took me quite a while to chase these symptoms from user-space down into the operating system, and even longer to end up with a stand-alone test fixture able to reproduce the situation described above on demand. If I alter the kernel code to perform a two-stage update of the pmap entry, the observed corruption symptoms disappear. This two-stage mechanism updates and invalidates the new physical address in a read-only state first, and then does a second pmap update and invalidation to change the status to writeable. The intended effect was to cause any other threads writing to the faulting page to become obstructed until the earlier fault is complete, thus eliminating the possibility of the physical pages having different contents until the new physical address was fully visible. This is goofy, and from an efficiency standpoint it is obviously undesirable, but it was the first thing that came to mind, and it seems to be working fine. I am not terribly familliar with the higher level design here, so it is unclear to me if this problem is simply a very unlikely race condition that hasn't yet been diagnosed or if this is instead the breakdown of some other mechanism of which I am not aware. I would appreciate the insights of those of you who have more history and experience with this area of the code. Thank you for your time! Elliott Rabe elliott_rabe@dell.comhelp
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5A7E7F2B.80900>
