Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Feb 2018 09:10:21 +0000
From:      <Elliott.Rabe@dell.com>
To:        <kib@freebsd.org>
Cc:        <freebsd-hackers@freebsd.org>, <Eric.Van.Gyzen@dell.com>, <alc@FreeBSD.org>, <markj@FreeBSD.org>, <truckman@FreeBSD.org>
Subject:   Re: Stale memory during post fork cow pmap update
Message-ID:  <5A82AB7C.6090404@dell.com>
In-Reply-To: <20180210225608.GM33564@kib.kiev.ua>
References:  <5A7E7F2B.80900@dell.com> <20180210111848.GL33564@kib.kiev.ua> <5A7F6A7C.80607@dell.com> <20180210225608.GM33564@kib.kiev.ua>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]

On 02/10/2018 04:56 PM, Konstantin Belousov wrote:
> On Sat, Feb 10, 2018 at 09:56:20PM +0000, Elliott.Rabe@dell.com wrote:
>> On 02/10/2018 05:18 AM, Konstantin Belousov wrote:
>>> On Sat, Feb 10, 2018 at 05:12:11AM +0000, Elliott.Rabe@dell.com wrote:
>>>> ...
>>>> I've been hunting for the root cause of elusive, slight memory
>>>> corruptions in a large, complex process that manages many threads. All
>>>> failures and experimentation thus far has been on x86_64 architecture
>>>> machines, and pmap_pcid is not in use.
>>>> ...
>>> It is necessary for you to provide the test and provide
>>> some kind of the test trace or the output which illustrates the issue
>>> you found.
>> Here is the sequence of actions I am referring to.  There is only one
>> lock, and all the writes/reads are on one logical page.
>>
>> +The process is forked transitioning a map entry to COW
>> +Thread A writes to a page on the map entry, faults, updates the pmap to
>> writable at a new phys addr, and starts TLB invalidations...
>> +Thread B acquires a lock, writes to a location on the new phys addr,
>> and releases the lock
>> +Thread C acquires the lock, reads from the location on the old phys addr...
>> +Thread A ...continues the TLB invalidations which are completed
>> +Thread C ...reads from the location on the new phys addr, and releases
>> the lock
>>
>> In this example Thread B and C [lock, use and unlock] properly and
>> neither own the lock at the same time.  Thread A was writing somewhere
>> else on the page and so never had/needed the lock.  Thread B sees a
>> location that is only ever read|modified under a lock change beneath it
>> while it is the lock owner.
> I believe you mean 'Thread C' in the last sentence.
You are correct, I did mean Thread C.
>> I will get a test patch together and make it available as soon as I can.
> Please.

Sorry for my delayed response; I had been working off a separate project 
based on releng/11.1 and it took me longer then I expected to get a dev 
rig setup off of master on which I could re-evaluate the situation.

I am attaching my test apparatus, however, calling it a test is probably 
a disservice to tests everywhere.  I consider this entire fixture 
disposable, so I didn't get carried away trying to properly 
style/partition/locate the code.  I never wanted anything this 
complicated either; it pretty much just evolved into a development aid 
to spelunk around in the fault/pmap handling.  My attempts thus-far at 
reducing the fixture to be user-space only have not been successful.  
Additionally, I have noticed that the fixture is /very/ sensitive to any 
changes in timing; several of the debugging entries even seem key to 
hitting the problem.  I didn't have much luck getting the problem to 
manifest on a virtual machine guest w/ a VirtualBox host either.  For 
all of these reasons, I don't think there is value here in trying to use 
this as any sort of regression fixture, unless perhaps if someone is 
willing to try to turn it into something less ridiculous.  Despite all 
shortcomings, on my hardware anyways, it is able to reproduce the 
example I described pretty much immediately when I use it with the 
debugging knob "-v". Instructions and expectations are at the top of the 
main test fixture source file.

I am also attaching a patch that I have been using to prevent the 
problem.  I was looking at things with a much narrower view and made the 
changes directly in pmap_enter.  I suspect the internal 
double-update-invalidate is slightly better performance wise then taking 
two whole faults, but I haven't benchmarked it, it probably doesn't 
matter much compared to the cost and frequency of the actual copies, and 
it also has the disadvantage of being architecture specific.  I also 
don't feel like I have enough experience with the vm fault code in 
general for my commentary to be very valuable here.  However, I do 
wonder: 1) if there are any other scenarios where a potentially 
accessible page might be undergoing an [address+writable] change in the 
same way (this sort of thing seems hard to read out of code), and 2) if 
there is ever any legal reason why an accessible page should be 
undergoing such a change?  If not, perhaps we could come up with an 
appropriate sanity-check condition to guard against any cases of this 
sort of thing accidentally slipping in the future.

The attached git patches should apply and build cleanly on master commit 
fe0ee5c.  I have verified at least these three scenarios in my environment:
1) the fixture alone reproduces the problem.
2) the fixture with my patch does not reproduce the problem.
3) the fixture with your patch does not reproduce the problem.

Thanks!

[-- Attachment #2 --]
From 3090b8232f6f421c0c6de2102b18cfac5700b51a Mon Sep 17 00:00:00 2001
From: Elliott Rabe <elliott.rabe@dell.com>
Date: Sun, 11 Feb 2018 17:19:26 -0600
Subject: [PATCH 1/3] DISPOSABLE: A test fixture that can repro a pmap
 update-invalidate race condition

A high-level description of the fixture is available in forking_stale.c
---
 stand/libsa/printf.c   |   16 +
 stand/libsa/stand.h    |    1 +
 sys/amd64/amd64/pmap.c |   15 +
 sys/amd64/conf/GENERIC |   19 +-
 sys/vm/forking_stale.c | 1054 ++++++++++++++++++++++++++++++++++++++++++++++++
 sys/vm/forking_stale.h |  296 ++++++++++++++
 sys/vm/vm_fault.c      |   16 +
 sys/vm/vm_page.c       |  459 +++++++++++++++++++++
 sys/vm/vm_page.h       |    5 +
 9 files changed, 1872 insertions(+), 9 deletions(-)
 create mode 100755 sys/vm/forking_stale.c
 create mode 100755 sys/vm/forking_stale.h

diff --git a/stand/libsa/printf.c b/stand/libsa/printf.c
index d0c409d..c77e941 100644
--- a/stand/libsa/printf.c
+++ b/stand/libsa/printf.c
@@ -149,6 +149,22 @@ vsprintf(char *buf, const char *cfmt, va_list ap)
 	buf[retval] = '\0';
 }
 
+int
+vsnprintf(char *buf, size_t size, const char *cfmt, va_list ap)
+{
+    int retval;
+    struct print_buf arg;
+
+    arg.buf = buf;
+    arg.size = size;
+
+    retval = kvprintf(cfmt, &snprint_func, &arg, 10, ap);
+
+    if (arg.size >= 1)
+        *(arg.buf)++ = 0;
+    return retval;
+}
+
 /*
  * Put a NUL-terminated ASCII number (base <= 36) in a buffer in reverse
  * order; return an optional length and a pointer to the last character
diff --git a/stand/libsa/stand.h b/stand/libsa/stand.h
index f6a612b..8d50efe 100644
--- a/stand/libsa/stand.h
+++ b/stand/libsa/stand.h
@@ -274,6 +274,7 @@ extern void	vprintf(const char *fmt, __va_list);
 extern int	sprintf(char *buf, const char *cfmt, ...) __printflike(2, 3);
 extern int	snprintf(char *buf, size_t size, const char *cfmt, ...) __printflike(3, 4);
 extern void	vsprintf(char *buf, const char *cfmt, __va_list);
+extern int  vsnprintf(char *buf, size_t size, const char *cfmt, __va_list);
 
 extern void	twiddle(u_int callerdiv);
 extern void	twiddle_divisor(u_int globaldiv);
diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index b9889e3..7bb9c1b 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -4628,6 +4628,8 @@ int
 pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot,
     u_int flags, int8_t psind)
 {
+	fstf_conditional_fault_debug(__FILE__, __LINE__, pmap, va, prot);
+
 	struct rwlock *lock;
 	pd_entry_t *pde;
 	pt_entry_t *pte, PG_G, PG_A, PG_M, PG_RW, PG_V;
@@ -4792,9 +4794,20 @@ pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot,
 	/*
 	 * Update the PTE.
 	 */
+	fstf_conditional_point_debug(__FILE__, __LINE__, pmap, va);
+
 	if ((origpte & PG_V) != 0) {
 validate:
+		fstf_conditional_pmapmod_advance(__FILE__,
+						 __LINE__,
+						 va,
+						 PHYS_TO_VM_PAGE(newpte & PG_FRAME));
+
 		origpte = pte_load_store(pte, newpte);
+
+		fstf_conditional_pte_debug(__FILE__, __LINE__, pmap, va, origpte);
+		fstf_conditional_pte_debug(__FILE__, __LINE__, pmap, va, newpte);
+
 		opa = origpte & PG_FRAME;
 		if (opa != pa) {
 			if ((origpte & PG_MANAGED) != 0) {
@@ -4833,6 +4846,8 @@ pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot,
 	} else
 		pte_store(pte, newpte);
 
+	fstf_conditional_point_debug(__FILE__, __LINE__, pmap, va);
+
 unchanged:
 
 #if VM_NRESERVLEVEL > 0
diff --git a/sys/amd64/conf/GENERIC b/sys/amd64/conf/GENERIC
index 1af7b7b..4c45290 100644
--- a/sys/amd64/conf/GENERIC
+++ b/sys/amd64/conf/GENERIC
@@ -86,16 +86,16 @@ options 	RCTL			# Resource limits
 options 	KDB			# Enable kernel debugger support.
 options 	KDB_TRACE		# Print a stack trace for a panic.
 # For full debugger support use (turn off in stable branch):
-options 	BUF_TRACKING		# Track buffer history
-options 	DDB			# Support DDB.
-options 	FULL_BUF_TRACKING	# Track more buffer history
+#options 	BUF_TRACKING		# Track buffer history
+#options 	DDB			# Support DDB.
+#options 	FULL_BUF_TRACKING	# Track more buffer history
 options 	GDB			# Support remote GDB.
-options 	DEADLKRES		# Enable the deadlock resolver
-options 	INVARIANTS		# Enable calls of extra sanity checking
-options 	INVARIANT_SUPPORT	# Extra sanity checks of internal structures, required by INVARIANTS
-options 	WITNESS			# Enable checks to detect deadlocks and cycles
-options 	WITNESS_SKIPSPIN	# Don't run witness on spinlocks for speed
-options 	MALLOC_DEBUG_MAXZONES=8	# Separate malloc(9) zones
+#options 	DEADLKRES		# Enable the deadlock resolver
+#options 	INVARIANTS		# Enable calls of extra sanity checking
+#options 	INVARIANT_SUPPORT	# Extra sanity checks of internal structures, required by INVARIANTS
+#options 	WITNESS			# Enable checks to detect deadlocks and cycles
+#options 	WITNESS_SKIPSPIN	# Don't run witness on spinlocks for speed
+#options 	MALLOC_DEBUG_MAXZONES=8	# Separate malloc(9) zones
 
 # Make an SMP-capable kernel by default
 options 	SMP			# Symmetric MultiProcessor Kernel
@@ -103,6 +103,7 @@ options 	EARLY_AP_STARTUP
 
 # CPU frequency control
 device		cpufreq
+device      cpuctl
 
 # Bus support.
 device		acpi
diff --git a/sys/vm/forking_stale.c b/sys/vm/forking_stale.c
new file mode 100755
index 0000000..47a1bb9
--- /dev/null
+++ b/sys/vm/forking_stale.c
@@ -0,0 +1,1054 @@
+/*
+ * FSTF
+ *
+ * forking stale test fixture
+ *
+ * A test fixture to repro a specific race condition in the
+ * FreeBSD kernel amd64 pmap code.
+ *
+ * The test fixture has components in the kernel and userspace.
+ * A sysctl hook in the kernel is called from userspace to allocate
+ * a region of memory in the kernel, the physical address of which
+ * is mmaped somewhere into the test process.  64-bits of this region
+ * is then atomically manipulated in both kernel and userspace to
+ * help coordinate test actions.  This coordination is nothing more
+ * then bit-flag changing & "spin-wait" loops intended to advance
+ * actions in a specific sequence.  The test creates three threads:
+ * a faulter, a writer, and a reader.  The main process updates
+ * the value at a virtual address to a known state under a lock.
+ * The process is forked and held to keep the vm entries in a
+ * copy-on-write state.  The faulter thread is awoken to trap in
+ * the kernel to perform the copy-on-write operation for a page.
+ * Once the pmap page update that applies the new physical address
+ * AND the clearing of the read-only state has occurred the writer
+ * and reader threads are released.  The reader thread continually
+ * reads the value under lock twice and ensures the values read are
+ * the same.  The writer thread changes the value under the lock.
+ * The iteration stops once the fault handling has been completed.
+ * This whole test cycle is repeated an arbitrary number of iterations.
+ *
+ * The expected "working" behavior from the OS is that because the
+ * test value is always modified and read under the same lock, that it
+ * should not be possible for a thread to read the value twice and see
+ * a change in value.  In practice a "mismatch" can be observed, presumably
+ * if the TLB is invalidated for the CPU the reader thread is running
+ * on in-between successive reads.
+ *
+ * The expected "working" behavior from the test fixture below when it
+ * detects the problem is to error out with a non-zero exit code
+ * displaying "complete MISMATCH" when the reader thread sees the
+ * illegal value change.
+ *
+ * This test is expected to run only on an x86_64 platform with at
+ * least 4 CPUs configured for SMP.  It is not expected to work (nor
+ * was it ever tried) on any other architecture types or platforms with
+ * smaller CPU counts.  It also uses the "rdtscp" instruction
+ * when generating debugging info to help correlate actions occurring
+ * across different CPUs/TSCs.  If this is not available, the fixture
+ * will core attempting to execute an illegal instruction.  The TSC
+ * MSR AUX can be seeded with the CPU ID from a shell like this:
+ * root@:/usr/src/sys/vm # msr_tsc_aux=0xc0000103; max_cpu_id=`sysctl -n kern.smp.maxid`; cpu=0
+ * root@:/usr/src/sys/vm # while [ ${cpu} -le ${max_cpu_id} ]; do cpuhex=`printf 0x%x ${cpu}`; cpucontrol -m "${msr_tsc_aux}=${cpuhex}" "/dev/cpuctl${cpu}"; cpu=`expr "${cpu}" + 1`; done
+ *
+ * NOTE:  Although I have seen it fail occasionally on test virtual machines,
+ * the fixture is far more reliable on real hardware (usually fails in a few seconds).
+ * The timing necessary to hit this race condition seems very delicate.
+ *
+ * The test fixture requires a kernel built w/ the test hooks as well.
+ * Apply the test code patch from /usr/src, rebuild and boot into the kernel.
+ * The test fixture can be built from with the following command:
+ * root@:/usr/src/sys/vm # clang -Wall -g -O0 forking_stale.c -o forking_stale -lpthread
+ *
+ * Example output if the fixture detects the problem:
+ *
+ * root@:/usr/src/sys/vm # ./forking_stale -v > /dev/null
+ * RUNTIME PARAMS:niters 100000
+ * ERROR: Mismatch: Old=0, New=1
+ * runtime (sec) 0.378667
+ * forks 1149
+ * faulter iters 1149
+ * writer iters 1149
+ * reader iters 9063
+ * reader value old 6498
+ * reader value new 2565
+ * pmapstalls 1149
+ * mismatches 1
+ * completion reason MISMATCH
+ * exit 1
+ *
+ * Debugging:
+ *
+ * The fixture can be directed to capture runtime debugging about the timing and
+ * actions leading up to the mismatch.  To use this, a command like this will
+ * interleave both userspace and kernel debugging info:
+ *
+ * Example command to generate debug timing data in a file named 'fr_sorted.txt'
+ * ./forking_stale -v > fr.txt; sysctl -n kern.fstf_debug_output >> fr.txt && cat fr.txt | sort > fr_sorted.txt
+ *
+ * Output above the following line is from a previous iteration and should be ignored:
+ * TSC:    40256065075175 CPU:00 TID:100187 Forker     CODE:forking_stale.c:0481 STATE:0x0000000000000007 MSG:Test hook active...
+ *
+ */
+
+#include <pthread.h>
+#include <pthread_np.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <signal.h>
+
+#include <sys/param.h>
+#include <sys/cpuset.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/types.h>
+#include <sys/sysctl.h>
+#include <sys/mman.h>
+
+#include <machine/atomic.h>
+#include <machine/stdarg.h>
+
+#include "forking_stale.h"
+
+#define value_old 0
+#define value_new 1
+
+#define opt_iterations_default 100000lu
+
+static int fork_func();
+static void * main_func(void *);
+static void * faulter_func(void *);
+static void * writer_func(void *);
+static void * reader_func(void *);
+
+// A structure housed on the memory page we will be testing
+struct test_data
+{
+    uint64_t value;
+    uint64_t something_else;
+};
+
+// The miscellaneous stuff necessary to perform the testing
+struct test_machinery
+{
+    pthread_mutexattr_t mattr;
+    pthread_mutex_t mutex;
+    pthread_attr_t tattr;
+    int main_tid;
+    pthread_t faulter;
+    int faulter_tid;
+    uint64_t faulter_iters;
+    pthread_t writer;
+    int writer_tid;
+    uint64_t writer_iters;
+    pthread_t reader;
+    int reader_tid;
+    uint64_t reader_iters;
+    int fork_tid;
+    uint64_t tsc_freq;
+    uint64_t forks;
+    uint64_t pmapstalls;
+    uint64_t old_data;
+    uint64_t new_data;
+    uint64_t mismatches;
+    uint64_t debug_value;
+    long opt_iterations;
+    int opt_early_error_exit;
+    int opt_wired;
+    struct test_data * test_data;
+    volatile uint64_t * state_control;
+    uint64_t state_control_phys;
+    struct fstf_debug_data debug_entries[FSTF_DEBUG_NUMBER_ENTRIES];
+    u_int debug_position;
+} * test;
+
+static __thread int lcl_tid = 0;
+
+// Initialize any 'test_machinery' values
+static void test_machinery_init(void)
+{
+    memset(test, 0x0, sizeof(struct test_machinery));
+    test->opt_iterations = opt_iterations_default;
+    test->opt_early_error_exit = 1;
+}
+
+// Describe a tid to make debug output more readable
+static const char * test_actor_descr(int tid)
+{
+    char * desc = "<unknown>";
+    if (tid == test->main_tid)
+    {
+        desc = "Forker";
+    }
+    else if (tid == test->faulter_tid)
+    {
+        desc = "Faulter";
+    }
+    else if (tid == test->writer_tid)
+    {
+        desc = "Writer";
+    }
+    else if (tid == test->reader_tid)
+    {
+        desc = "Reader";
+    }
+    else if (tid == test->fork_tid)
+    {
+        desc = "Forkee";
+    }
+    return (desc);
+}
+
+// Retrieve the tid for the current thread
+static int get_current_tid(void)
+{
+    if (lcl_tid == 0)
+    {
+        lcl_tid = pthread_getthreadid_np();
+    }
+    return (lcl_tid);
+}
+
+// Best effort set of the name of the current thread
+static void set_current_thread_name(void)
+{
+    pthread_set_name_np(pthread_self(), test_actor_descr(get_current_tid()));
+}
+
+// Get the location of a 64-bit region state/control region shared by userspace and the kernel
+volatile uint64_t * fstf_state_control(void)
+{
+    return (test->state_control);
+}
+
+// Get the tsc frequency
+uint64_t fstf_tsc_frequency_seconds(void)
+{
+    return (test->tsc_freq);
+}
+
+// Fast formatted debug
+void fstf_debug_fast(const char * file, int line, uint32_t type, uint64_t state)
+{
+    if (test->debug_value)
+    {
+        int pos = FSTF_DEBUG_POSITION_INDEX(atomic_fetchadd_int(&test->debug_position, 1));
+
+        test->debug_entries[pos].tsc = fstf_tsc_and_aux(&test->debug_entries[pos].aux);
+        test->debug_entries[pos].tid = get_current_tid();
+        test->debug_entries[pos].file = file;
+        test->debug_entries[pos].line = line;
+        test->debug_entries[pos].state = state;
+        test->debug_entries[pos].dbg_type = type;
+    }
+}
+
+// Misc text debug
+void fstf_debug_misc(const char * file, int line, const char *fmt, ...)
+{
+    if (test->debug_value)
+    {
+        char buffer[1024];
+        va_list ap;
+
+        va_start(ap, fmt);
+
+        vsnprintf(buffer, sizeof(buffer), fmt, ap);
+
+        va_end(ap);
+
+        uint64_t tst = fstf_state_value();
+
+        int pos = FSTF_DEBUG_POSITION_INDEX(atomic_fetchadd_int(&test->debug_position, 1));
+
+        uint32_t aux;
+        uint64_t tsc = fstf_tsc_and_aux(&aux);
+        test->debug_entries[pos].dbg_type = FSTF_DEBUG_TYPE_MISCELLANEOUS;
+
+        va_start(ap, fmt);
+
+        int tid = get_current_tid();
+
+        snprintf(test->debug_entries[pos].misc,
+                 sizeof(test->debug_entries[pos].misc),
+                 FSTF_DEBUG_FORMAT_STR,
+                 tsc,
+                 aux,
+                 tid,
+                 test_actor_descr(tid),
+                 file,
+                 line,
+                 tst,
+                 buffer);
+
+        va_end(ap);
+    }
+}
+
+// General debugging/text output
+static void debug_output(const char *fmt, ...)
+{
+    va_list ap;
+
+    va_start(ap, fmt);
+
+    vprintf(fmt, ap);
+
+    va_end(ap);
+}
+
+// General stats/error output
+static void console_output(const char *fmt, ...)
+{
+    va_list ap;
+
+    va_start(ap, fmt);
+
+    vfprintf(stderr, fmt, ap);
+
+    va_end(ap);
+}
+
+// Traverse the captured debug entries and output them
+static void dump_debug(void)
+{
+    int num;
+    for (num = 0; num < FSTF_DEBUG_NUMBER_ENTRIES; num++)
+    {
+        if (test->debug_entries[num].dbg_type != FSTF_DEBUG_TYPE_UNUSED)
+        {
+            if (test->debug_entries[num].dbg_type == FSTF_DEBUG_TYPE_MISCELLANEOUS)
+            {
+                debug_output("%s", test->debug_entries[num].misc);
+            }
+            else
+            {
+                char msg[1024];
+                char temp[1024];
+                char buffer[1024];
+
+                fstf_debug_fill_state_descr(test->debug_entries[num].state, temp, sizeof(buffer));
+
+                snprintf(msg,
+                         sizeof(msg),
+                         "State: %s (0x%016lx) [%s]\n",
+                         temp,
+                         test->debug_entries[num].state,
+                         fstf_debug_type_descr(test->debug_entries[num].dbg_type));
+
+                snprintf(buffer,
+                         sizeof(buffer),
+                         FSTF_DEBUG_FORMAT_STR,
+                         test->debug_entries[num].tsc,
+                         test->debug_entries[num].aux,
+                         test->debug_entries[num].tid,
+                         test_actor_descr(test->debug_entries[num].tid),
+                         fstf_strip_file_path(__FILE__),
+                         test->debug_entries[num].line,
+                         test->debug_entries[num].state,
+                         msg);
+
+                debug_output("%s", buffer);
+            }
+        }
+    }
+}
+
+// Get the fd int for /dev/mem
+static int dev_mem_fd(void)
+{
+    static int memfd = -1;
+    if (memfd < 0)
+    {
+        memfd = open("/dev/mem", O_RDWR|O_CLOEXEC);
+    }
+    return (memfd);
+}
+
+// Prepare some stuff for threading
+static int thread_prep(void)
+{
+    if ((pthread_mutexattr_init(&test->mattr) != 0) ||
+        (pthread_mutexattr_settype(&test->mattr, PTHREAD_MUTEX_ADAPTIVE_NP) != 0))
+
+    {
+        console_output("ERROR: test setting up mutex attribute for AdaptiveNP\n");
+        return (1);
+    }
+
+    if (pthread_mutex_init(&test->mutex, &test->mattr) != 0)
+    {
+        console_output("ERROR: test initializing mutex\n");
+        return (1);
+    }
+
+    if (pthread_attr_init(&test->tattr) != 0)
+    {
+        console_output("ERROR: test initializing pthread attr\n");
+        return (1);
+    }
+
+    if (pthread_attr_setdetachstate(&test->tattr, PTHREAD_CREATE_JOINABLE) != 0)
+    {
+        console_output("ERROR: test setting pthread attr joinable\n");
+        return (1);
+    }
+
+    return (0);
+}
+
+// Create and start test therads and setup affinities
+static int start_threads(void)
+{
+    int rc;
+    cpuset_t thecpuset;
+
+    CPU_ZERO(&thecpuset);
+    CPU_SET(0, &thecpuset);
+    rc = pthread_setaffinity_np(pthread_self(), sizeof(cpuset_t), &thecpuset);
+    if (rc != 0)
+    {
+        console_output("ERROR: pthread_setaffinity_np() main\n");
+        return (1);
+    }
+
+    rc = pthread_create(&test->faulter, &test->tattr, faulter_func, NULL);
+    if (rc != 0)
+    {
+        console_output("ERROR: pthread_create() faulter\n");
+        return (1);
+    }
+    CPU_ZERO(&thecpuset);
+    CPU_SET(1, &thecpuset);
+    rc = pthread_setaffinity_np(test->faulter, sizeof(cpuset_t), &thecpuset);
+    if (rc != 0)
+    {
+        console_output("ERROR: pthread_setaffinity_np() faulter\n");
+        return (1);
+    }
+
+    rc = pthread_create(&test->writer, &test->tattr, writer_func, NULL);
+    if (rc != 0)
+    {
+        console_output("ERROR: pthread_create() writer\n");
+        return (1);
+    }
+    CPU_ZERO(&thecpuset);
+    CPU_SET(2, &thecpuset);
+    rc = pthread_setaffinity_np(test->writer, sizeof(cpuset_t), &thecpuset);
+    if (rc != 0)
+    {
+        console_output("ERROR: pthread_setaffinity_np() writer\n");
+        return (1);
+    }
+
+    rc = pthread_create(&test->reader, &test->tattr, reader_func, NULL);
+    if (rc != 0)
+    {
+        console_output("ERROR: pthread_create() reader\n");
+        return (1);
+    }
+    CPU_ZERO(&thecpuset);
+    CPU_SET(3, &thecpuset);
+    rc = pthread_setaffinity_np(test->reader, sizeof(cpuset_t), &thecpuset);
+    if (rc != 0)
+    {
+        console_output("ERROR: pthread_setaffinity_np() reader\n");
+        return (1);
+    }
+
+    return (0);
+}
+
+// main processing function; do as many 'fork' iterations as requested or until done
+static void * main_func(void *m)
+{
+    int f;
+    long i;
+
+    for (i = 0; i < test->opt_iterations; i++)
+    {
+        memset(test->debug_entries, 0x00, sizeof(test->debug_entries));
+
+        FSTF_DEBUG_MISC("Test hook active. tsc_freq:%lu, control vaddr:%p, control paddr:0x%lx, test_data vaddr:0x%lx, test_data value location:%p\n",
+                        test->tsc_freq,
+                        test->state_control,
+                        test->state_control_phys,
+                        (void*)FSTF_CONSTANT_VADDR,
+                        &test->test_data->value);
+
+        pthread_mutex_lock(&test->mutex);
+
+            test->test_data->value = value_old;
+
+        pthread_mutex_unlock(&test->mutex);
+
+        if ((f = fork_func()) < 0)
+        {
+            console_output("Error forking: %d, Errno: %d\n", f, errno);
+            goto Done;
+        }
+
+        test->forks++;
+
+        if (fstf_state_impediment_all(FSTF_STATE_BITS_IDLE) & FSTF_STATE_BITS_FINISHED)
+        {
+            goto Done;
+        }
+    }
+
+    fstf_state_transition(0,
+                          FSTF_STATE_BIT_COMPLETE);
+
+Done:
+
+    return (0);
+}
+
+// fork processing function; coordinate the testing for the race
+static int fork_func(void)
+{
+    pid_t pid, savedpid;
+    int pstat;
+
+    switch(pid = fork())
+    {
+        case -1:
+            // error
+            break;
+        case 0:
+        {
+            // child
+            test->fork_tid = get_current_tid();
+            set_current_thread_name();
+
+            // tell threads we have forked and they can get ready
+            fstf_state_transition((FSTF_STATE_BITS_PRIMED | FSTF_STATE_BIT_TEST_ACTIVE),
+                                  FSTF_STATE_BIT_READY_TO_PRIME);
+
+            // wait for threads to be in the right state
+            if (fstf_state_impediment_all(FSTF_STATE_BITS_PRIMED) & FSTF_STATE_BITS_FINISHED)
+            {
+                goto Done;
+            }
+
+            // tell the faulter he can go
+            fstf_state_transition(FSTF_STATE_BIT_READY_TO_PRIME,
+                                  FSTF_STATE_BIT_TEST_ACTIVE);
+
+            // wait for the faulter to be done, at that point the iteration is over
+            if (fstf_state_impediment_all(FSTF_STATE_BIT_FAULTER_DONE) & FSTF_STATE_BITS_FINISHED)
+            {
+                goto Done;
+            }
+
+            // cleanup
+            uint64_t st = fstf_state_impediment_all(FSTF_STATE_BITS_IDLE);
+            if (st & FSTF_STATE_BIT_KERNEL_PMAPMOD)
+            {
+                test->pmapstalls++;
+            }
+            if (st & FSTF_STATE_BITS_FINISHED)
+            {
+                goto Done;
+            }
+            fstf_state_transition(FSTF_STATE_BITS_RESET,
+                                  0);
+
+            Done:
+                _exit(127);
+        }
+        default:
+        {
+            // parent
+            savedpid = pid;
+            do
+            {
+                pid = wait4(savedpid, &pstat, 0, (struct rusage *)0);
+            }
+            while (pid == -1 && errno == EINTR);
+            break;
+        }
+    }
+
+    return(pid == -1 ? -1 : pstat);
+}
+
+// faulter thread processing function
+static void * faulter_func(void *t)
+{
+    test->faulter_tid = get_current_tid();
+    set_current_thread_name();
+
+    for (;;)
+    {
+        // indicate we have reached the 'idle' state
+        fstf_state_transition(0,
+                              FSTF_STATE_BIT_FAULTER_IDLE);
+
+        // wait to be told we can prepare
+        if (fstf_state_impediment_any(FSTF_STATE_BIT_READY_TO_PRIME) & FSTF_STATE_BITS_FINISHED)
+        {
+            goto Done;
+        }
+
+        pthread_mutex_lock(&test->mutex);
+        pthread_mutex_unlock(&test->mutex);
+
+        test->faulter_iters++;
+        test->faulter_iters--;
+
+        // indicate we are prepared
+        fstf_state_transition(FSTF_STATE_BIT_FAULTER_IDLE,
+                              FSTF_STATE_BIT_FAULTER_PRIMED);
+
+        // wait to be told we can write-fault
+        if (fstf_state_impediment_all(FSTF_STATE_BIT_TEST_ACTIVE) & FSTF_STATE_BITS_FINISHED)
+        {
+            goto Done;
+        }
+
+        // actually do the write fault
+        FSTF_DEBUG_MISC("Writing to address: %p\n", &test->test_data->something_else);
+        test->test_data->something_else = 1;
+
+        fstf_state_transition(0,
+                              FSTF_STATE_BIT_FAULTER_DONE);
+
+        test->faulter_iters++;
+    }
+
+Done:
+
+    return (0);
+}
+
+// writer thread processing function
+static void * writer_func(void *t)
+{
+    test->writer_tid = get_current_tid();
+    set_current_thread_name();
+
+    for (;;)
+    {
+        // indicate we have reached the 'idle' state
+        fstf_state_transition(0,
+                              FSTF_STATE_BIT_WRITER_IDLE);
+
+        // wait to be told we can prepare
+        if (fstf_state_impediment_any(FSTF_STATE_BIT_READY_TO_PRIME) & FSTF_STATE_BITS_FINISHED)
+        {
+            goto Done;
+        }
+
+        pthread_mutex_lock(&test->mutex);
+            (void)test->test_data->value;
+        pthread_mutex_unlock(&test->mutex);
+
+        test->writer_iters++;
+        test->writer_iters--;
+
+        // indicate we are prepared
+        fstf_state_transition(FSTF_STATE_BIT_WRITER_IDLE,
+                              FSTF_STATE_BIT_WRITER_PRIMED);
+
+        // wait for either the kernel code to have modified the pmap or the fault to be over
+        if (fstf_state_impediment_any(FSTF_STATE_BIT_KERNEL_PMAPMOD | FSTF_STATE_BIT_FAULTER_DONE) & FSTF_STATE_BITS_FINISHED)
+        {
+            goto Done;
+        }
+
+        // change the value at the test location to something else
+        pthread_mutex_lock(&test->mutex);
+
+            (void)test->test_data->value;
+
+            FSTF_DEBUG_MISC("Writing to address: %p\n", &test->test_data->value);
+
+            test->test_data->value = value_new;
+
+        pthread_mutex_unlock(&test->mutex);
+
+        test->writer_iters++;
+    }
+
+Done:
+
+    return (0);
+}
+
+// reader thread processing function
+static void * reader_func(void *t3)
+{
+    test->reader_tid = get_current_tid();
+    set_current_thread_name();
+
+    for (;;)
+    {
+        // indicate we have reached the 'idle' state
+        fstf_state_transition(0,
+                              FSTF_STATE_BIT_READER_IDLE);
+
+        // wait to be told we can prepare
+        if (fstf_state_impediment_any(FSTF_STATE_BIT_READY_TO_PRIME) & FSTF_STATE_BITS_FINISHED)
+        {
+            goto Done;
+        }
+
+        pthread_mutex_lock(&test->mutex);
+            (void)test->test_data->value;
+        pthread_mutex_unlock(&test->mutex);
+
+        test->reader_iters++;
+        test->reader_iters--;
+
+        // indicate we are prepared
+        fstf_state_transition(FSTF_STATE_BIT_READER_IDLE,
+                              FSTF_STATE_BIT_READER_PRIMED);
+
+        // wait for either the kernel code to have modified the pmap or the fault to be over
+        if (fstf_state_impediment_any(FSTF_STATE_BIT_TEST_ACTIVE) & FSTF_STATE_BITS_FINISHED)
+        {
+            goto Done;
+        }
+
+        while (!fstf_state_check_any(FSTF_STATE_BIT_FAULTER_DONE | FSTF_STATE_BITS_FINISHED))
+        {
+            // read the value at the test location twice and complain if it is different
+            if (pthread_mutex_trylock(&test->mutex) == 0)
+            {
+                    uint64_t initialValue = test->test_data->value;
+
+                    uint64_t nextValue = test->test_data->value;
+
+                    if (initialValue != nextValue)
+                    {
+                        test->mismatches++;
+
+                        FSTF_DEBUG_MISC("Mismatch detected! Old=%lu, New=%lu\n",
+                                        initialValue,
+                                        nextValue);
+
+                        if (test->opt_early_error_exit)
+                        {
+                            console_output("ERROR: Mismatch: Old=%lu, New=%lu\n",
+                                           initialValue,
+                                           nextValue);
+                            fstf_state_transition(0,
+                                                  FSTF_STATE_BIT_MISMATCH);
+                        }
+                    }
+
+                    if (nextValue == value_old)
+                    {
+                        test->old_data++;
+                    }
+                    else if (nextValue == value_new)
+                    {
+                        test->new_data++;
+                    }
+
+                pthread_mutex_unlock(&test->mutex);
+
+                test->reader_iters++;
+            }
+        }
+    }
+
+Done:
+
+    return (0);
+}
+
+// shutdown/join threads for completion
+static int finish_threads(void)
+{
+    int err = 0;
+    if (pthread_join(test->faulter,  NULL))
+    {
+        console_output("ERROR: pthread_join() faulter\n");
+        err = 1;
+    }
+    if (pthread_join(test->writer,  NULL))
+    {
+        console_output("ERROR: pthread_join() writer\n");
+        err = 1;
+    }
+    if (pthread_join(test->reader,  NULL))
+    {
+        console_output("ERROR: pthread_join() reader\n");
+        err = 1;
+    }
+    return (err);
+}
+
+// Cleanup stuff for threading
+static int thread_cleanup(void)
+{
+    pthread_mutex_destroy(&test->mutex);
+    pthread_attr_destroy(&test->tattr);
+    return (0);
+}
+
+// do the actual testing
+static int run_test(void)
+{
+    int err = 0;
+
+    if (thread_prep())
+    {
+        err = 1;
+    }
+    else
+    {
+        if (start_threads())
+        {
+            err = 1;
+        }
+
+        main_func(0);
+
+        if (finish_threads())
+        {
+            err = 1;
+        }
+
+        thread_cleanup();
+    }
+    return (err);
+}
+
+// output basic help info
+static void usage(const char * progname)
+{
+    console_output("Usage: %s [opts]\n"
+                   "OPTIONS:\n"
+                   "    -n N ........ set number of iterations, default=%lu\n"
+                   "    -v   ........ output debugging data at the end of the run\n"
+                   "    -d   ........ request heavier debugging in the kernel\n"
+                   "    -k   ........ keep running; don't exit early on mismatch\n"
+                   "    -w   ........ wire the memory with mlockall\n"
+                   "    -h .......... show this help\n",
+                   progname,
+                   opt_iterations_default);
+}
+
+// a cancel handler so a ctrl-c'ed long running test can still output useful info
+static void cancel_function(int signo)
+{
+    fstf_state_transition(0,
+                          FSTF_STATE_BIT_CANCELLED);
+}
+
+// main
+int main(int argc, char *argv[])
+{
+    uint64_t tm1, tm2, tdiff;
+    int opt, err;
+    char * pEnd;
+
+    // setup our cancel handler
+    if (signal(SIGINT, cancel_function) == SIG_ERR)
+    {
+        console_output("ERROR: setting signal handler cancel\n");
+        exit(1);
+    }
+
+    // allocate memory for the test machinery; we don't care where it is but want it shared (non-COW)
+    void *machinery_mem = mmap(NULL,
+                               sizeof(struct test_machinery),
+                               PROT_READ | PROT_WRITE,
+                               MAP_SHARED | MAP_ANON,
+                               -1,
+                               0);
+    if (machinery_mem == NULL)
+    {
+        console_output("ERROR: allocating memory for test machinery\n");
+        exit(1);
+    }
+
+    test = machinery_mem;
+    test_machinery_init();
+
+    test->main_tid = get_current_tid();
+    set_current_thread_name();
+
+    while ((opt = getopt(argc, argv, "hvdkfwn:")) != -1)
+    {
+        switch (opt)
+        {
+            case 'n':
+                test->opt_iterations = strtol(optarg, &pEnd, 10);
+                break;
+            case 'v':
+                test->debug_value |= FSTF_SYSCTL_DEBUG_ON;
+                break;
+            case 'd':
+                test->debug_value |= FSTF_SYSCTL_DEBUG_HEAVY;
+                break;
+            case 'k':
+                test->opt_early_error_exit = 0;
+                break;
+            case 'w':
+                test->opt_wired = 1;
+                break;
+            case 'h':
+                usage(argv[0]);
+                exit(0);
+                break;
+            default : usage(argv[0]);
+                exit(-1);
+                break;
+        }
+    }
+
+    console_output("RUNTIME PARAMS:"
+                   "niters %-20lu\n",
+                   test->opt_iterations);
+
+    size_t size;
+    pid_t pid;
+    uint64_t old;
+    uint64_t new;
+
+    // obtain the tsc frequency from the sysctl
+    size = sizeof(old);
+    err = sysctlbyname("machdep.tsc_freq", &old, &size, NULL, 0);
+    if (err)
+    {
+        console_output("ERROR: calling machdep.tsc_freq\n");
+        exit(1);
+    }
+    test->tsc_freq = old;
+
+    // allocate memory for the test region;  put it at a fixed address and COW.
+    void *test_data_mem = mmap((void*)FSTF_CONSTANT_VADDR,
+                               sizeof(struct test_data),
+                               PROT_READ | PROT_WRITE,
+                               MAP_FIXED | MAP_ANON | MAP_PRIVATE,
+                               -1,
+                               0);
+    if (test_data_mem == NULL)
+    {
+        console_output("ERROR: null test struct addr\n");
+        exit(1);
+    }
+    test->test_data = test_data_mem;
+    test->test_data->value = 0xffffffffffffffffull;
+
+    // call into the sysctl to prepare the test;  pass the desired debugging options
+    size = sizeof(old);
+    pid = getpid();
+    new = ((uint64_t)pid);
+    if (test->debug_value)
+    {
+        new = test->debug_value | ((uint64_t)pid);
+    }
+    err = sysctlbyname("kern.fstf_setup", &old, &size, &new, sizeof(new));
+    if (err)
+    {
+        console_output("ERROR: calling kernel test hook, new: 0x%lx\n", new);
+        exit(1);
+    }
+
+    // call into a sysctl to get the physical address the kernel setup
+    size = sizeof(old);
+    err = sysctlbyname("kern.fstf_control_paddr", &old, &size, NULL, 0);
+    if (err)
+    {
+        console_output("ERROR: calling kernel test addr\n");
+        exit(1);
+    }
+    test->state_control_phys = old;
+
+    // map the physical control address into our address space so we can coordinate
+    void *control_mem = mmap(NULL,
+                             PAGE_SIZE,
+                             PROT_READ | PROT_WRITE,
+                             MAP_SHARED,
+                             dev_mem_fd(),
+                             (off_t)test->state_control_phys);
+    if (control_mem == NULL)
+    {
+        console_output("ERROR: mmaping control region\n");
+        exit(1);
+    }
+    test->state_control = control_mem;
+
+    // allow a runtime toggle to 'lock' the test region to demonstrate it "fixes the glitch"
+    if (test->opt_wired)
+    {
+        err = mlockall(MCL_CURRENT);
+        if (err)
+        {
+            console_output("ERROR: calling mlockall\n");
+            exit(1);
+        }
+    }
+
+    // run the test
+    uint32_t aux;
+    tm1 = fstf_tsc_and_aux(&aux);
+    err = run_test();
+    tm2 = fstf_tsc_and_aux(&aux);
+    tdiff = (tm2 - tm1);
+
+    // error if we found any discrepancies
+    pthread_mutex_lock(&test->mutex);
+        uint64_t mismatchez = test->mismatches;
+    pthread_mutex_unlock(&test->mutex);
+    if (mismatchez)
+    {
+        err = 1;
+    }
+
+    // output the debugging from the test
+    dump_debug();
+
+    // output status information about the test run
+    char status[1024];
+    fstf_debug_fill_state_descr((fstf_state_value() & FSTF_STATE_BITS_FINISHED), status, sizeof(status));
+    const char * reason = status;
+    while (*reason == ' ') { reason++; }
+    console_output("runtime (sec) %lf\n"
+                   "forks %lu\n"
+                   "faulter iters %lu\n"
+                   "writer iters %lu\n"
+                   "reader iters %lu\n"
+                   "reader value old %lu\n"
+                   "reader value new %lu\n"
+                   "pmapstalls %lu\n"
+                   "mismatches %lu\n"
+                   "completion reason %s\n"
+                   "exit %d\n",
+                   ((1.0*tdiff)/fstf_tsc_frequency_seconds()),
+                   test->forks,
+                   test->faulter_iters,
+                   test->writer_iters,
+                   test->reader_iters,
+                   test->old_data,
+                   test->new_data,
+                   test->pmapstalls,
+                   mismatchez,
+                   reason,
+                   err);
+
+    if (machinery_mem)
+    {
+        munmap(machinery_mem, sizeof(struct test_machinery));
+    }
+    if (control_mem)
+    {
+        munmap(control_mem, PAGE_SIZE);
+    }
+    if (test_data_mem)
+    {
+        munmap(test_data_mem, sizeof(struct test_data));
+    }
+
+    return (err);
+}
diff --git a/sys/vm/forking_stale.h b/sys/vm/forking_stale.h
new file mode 100755
index 0000000..5cfde83
--- /dev/null
+++ b/sys/vm/forking_stale.h
@@ -0,0 +1,296 @@
+/*
+ * FSTF
+ *
+ * forking stale test fixture header
+ *
+ * Common constants for userspace/kernel fork-cow-bug repro
+ */
+
+#ifndef _FORKING_STALE__
+#define _FORKING_STALE__
+
+#include <machine/atomic.h>
+
+// A common virtual address to target
+#define FSTF_CONSTANT_OFFSET 31337
+#define FSTF_CONSTANT_VADDR (31337 * PAGE_SIZE)
+#define FSTF_CONSTANT_TIMEOUT_SECONDS 1
+
+// Various bits of state information easily accessed/shared in a uint64_t size value
+#define FSTF_STATE_BIT_COMPLETE          0x8000000000000000llu   // Natural termination (iteration count reached)
+#define FSTF_STATE_BIT_TIMEOUT           0x4000000000000000llu   // State change timeout (unexpected test condition)
+#define FSTF_STATE_BIT_MISMATCH          0x2000000000000000llu   // Mismatch detected (real problem found)
+#define FSTF_STATE_BIT_CANCELLED         0x1000000000000000llu   // Early cancellation requested by the user
+#define FSTF_STATE_BIT_FAULTER_IDLE      0x0000000000000001llu   // The faulter thread at the beginning
+#define FSTF_STATE_BIT_WRITER_IDLE       0x0000000000000002llu   // The writer thread at the beginning
+#define FSTF_STATE_BIT_READER_IDLE       0x0000000000000004llu   // The reader thread at the beginning
+#define FSTF_STATE_BIT_READY_TO_PRIME    0x0000000000000008llu   // Fork is ready for thread to prepare (vm_map in COW state)
+#define FSTF_STATE_BIT_FAULTER_PRIMED    0x0000000000000010llu   // The faulter thread has "primed" variables it intends to use
+#define FSTF_STATE_BIT_WRITER_PRIMED     0x0000000000000020llu   // The writer thread has "primed" variables it intends to use
+#define FSTF_STATE_BIT_READER_PRIMED     0x0000000000000040llu   // The reader thread has "primed" variables it intends to use
+#define FSTF_STATE_BIT_TEST_ACTIVE       0x0000000000000080llu   // The race is on
+#define FSTF_STATE_BIT_KERNEL_PMAPMOD    0x0000000000000100llu   // The pmap has been updated in the kernel
+#define FSTF_STATE_BIT_FAULTER_DONE      0x0000000000000200llu   // The faulter thread has "completed" its update
+
+// Various combinations of the above bits
+#define FSTF_STATE_BITS_ALL              0xffffffffffffffffllu
+#define FSTF_STATE_BITS_FINISHED         0xf000000000000000llu
+#define FSTF_STATE_BITS_IDLE             (FSTF_STATE_BIT_FAULTER_IDLE | FSTF_STATE_BIT_WRITER_IDLE | FSTF_STATE_BIT_READER_IDLE)
+#define FSTF_STATE_BITS_PRIMED           (FSTF_STATE_BIT_FAULTER_PRIMED | FSTF_STATE_BIT_WRITER_PRIMED | FSTF_STATE_BIT_READER_PRIMED)
+#define FSTF_STATE_BITS_DONE             (FSTF_STATE_BIT_FAULTER_DONE)
+#define FSTF_STATE_BITS_RESET            (FSTF_STATE_BITS_PRIMED | FSTF_STATE_BIT_KERNEL_PMAPMOD | FSTF_STATE_BITS_DONE | FSTF_STATE_BIT_TEST_ACTIVE)
+
+// Some pre-determined "faster" debugging modes
+#define FSTF_DEBUG_TYPE_UNUSED                 0
+#define FSTF_DEBUG_TYPE_MISCELLANEOUS          1
+#define FSTF_DEBUG_TYPE_TRANSITION_INITIAL     2
+#define FSTF_DEBUG_TYPE_TRANSITION_FINAL       3
+#define FSTF_DEBUG_TYPE_IMPEDIMENT_INITIAL     4
+#define FSTF_DEBUG_TYPE_IMPEDIMENT_FINAL       5
+#define FSTF_DEBUG_TYPE_CHECK_STATE            6
+
+// sysctl test hook flags
+#define FSTF_SYSCTL_PID_MASK      0x00000000ffffffffull
+#define FSTF_SYSCTL_DEBUG_MASK    0xffffffff00000000ull
+#define FSTF_SYSCTL_DEBUG_ON      0x8000000000000000ull
+#define FSTF_SYSCTL_DEBUG_HEAVY   0xC000000000000000ull
+
+// Debug constants
+#define FSTF_DEBUG_CONST_MISC_SIZE 800
+#define FSTF_DEBUG_FORMAT_STR "TSC:%18lu CPU:%02d TID:%06d %-10s CODE:%15s:%04d STATE:0x%016lx MSG:%s"
+
+// helper macros to do consistent debugging
+#define FSTF_DEBUG_ENTRY_POF2 9
+#define FSTF_DEBUG_NUMBER_ENTRIES (1 << FSTF_DEBUG_ENTRY_POF2)
+#define FSTF_DEBUG_POSITION_INDEX(num) (num & (FSTF_DEBUG_NUMBER_ENTRIES-1))
+#define FSTF_DEBUG_MISC(fmt, ...) fstf_debug_misc(__FILE__, __LINE__, fmt, ##__VA_ARGS__);
+
+// Structure containing data for debug messages
+struct fstf_debug_data
+{
+    uint64_t tsc;
+    uint32_t aux;
+    int tid;
+    const char * file;
+    int line;
+    uint64_t state;
+    uint32_t dbg_type;
+    char misc[FSTF_DEBUG_CONST_MISC_SIZE];
+};
+
+// A "slim" debugging variant
+void fstf_debug_fast(const char * file, int line, uint32_t type, uint64_t state);
+
+// A "misc" debugging variant
+void fstf_debug_misc(const char * file, int line, const char *fmt, ...);
+
+// A location where the state variable can be accessed
+volatile uint64_t * fstf_state_control(void);
+
+// Get the tsc frequency
+uint64_t fstf_tsc_frequency_seconds(void);
+
+// Get the "current" state value
+static uint64_t fstf_state_value(void)
+{
+    volatile uint64_t * control = fstf_state_control();
+    uint64_t state = 0;
+    if (control)
+    {
+        state = atomic_load_acq_64(control);
+    }
+    return (state);
+}
+
+// Pushes a textual description of the state flags into a supplied buffer
+static void fstf_debug_fill_state_descr(uint64_t state, char * buf, int len)
+{
+    snprintf(buf,
+             len,
+             "%14s %14s %14s %14s %14s %14s %14s %14s %14s %14s %14s %14s %14s %14s",
+             ((state & FSTF_STATE_BIT_COMPLETE) ?              "COMPLETE" : ""),
+             ((state & FSTF_STATE_BIT_TIMEOUT) ?               "TIMEOUT" : ""),
+             ((state & FSTF_STATE_BIT_MISMATCH) ?              "MISMATCH" : ""),
+             ((state & FSTF_STATE_BIT_CANCELLED) ?             "CANCELLED" : ""),
+             ((state & FSTF_STATE_BIT_FAULTER_IDLE) ?          "FAULTER_IDLE" : ""),
+             ((state & FSTF_STATE_BIT_WRITER_IDLE) ?           "WRITER_IDLE" : ""),
+             ((state & FSTF_STATE_BIT_READER_IDLE) ?           "READER_IDLE" : ""),
+             ((state & FSTF_STATE_BIT_READY_TO_PRIME) ?        "READY_TO_PRIME" : ""),
+             ((state & FSTF_STATE_BIT_FAULTER_PRIMED) ?        "FAULTER_PRIMED" : ""),
+             ((state & FSTF_STATE_BIT_WRITER_PRIMED) ?         "WRITER_PRIMED" : ""),
+             ((state & FSTF_STATE_BIT_READER_PRIMED) ?         "READER_PRIMED" : ""),
+             ((state & FSTF_STATE_BIT_TEST_ACTIVE) ?           "TEST_ACTIVE" : ""),
+             ((state & FSTF_STATE_BIT_KERNEL_PMAPMOD) ?        "KERNEL_PMAPMOD" : ""),
+             ((state & FSTF_STATE_BIT_FAULTER_DONE) ?          "FAULTER_DONE" : ""));
+}
+
+// Describe the debug action
+static const char * fstf_debug_type_descr(uint32_t type)
+{
+    switch(type)
+    {
+    case FSTF_DEBUG_TYPE_MISCELLANEOUS:
+        return ("Miscellaneous");
+    case FSTF_DEBUG_TYPE_TRANSITION_INITIAL:
+        return ("Transition Initial");
+    case FSTF_DEBUG_TYPE_TRANSITION_FINAL:
+        return ("Transition Final");
+    case FSTF_DEBUG_TYPE_IMPEDIMENT_INITIAL:
+        return ("Impediment Initial");
+    case FSTF_DEBUG_TYPE_IMPEDIMENT_FINAL:
+        return ("Impediment Final");
+    case FSTF_DEBUG_TYPE_CHECK_STATE:
+        return ("Check State");
+    default:
+        break;
+    }
+    return ("Unknown");
+}
+
+// strip "misc" path information from a file name leaving just the file element name
+static const char * fstf_strip_file_path(const char * file)
+{
+    const char * p;
+    const char * fn;
+    size_t len;
+
+    fn = file;
+    if (fn != 0)
+    {
+        for (p = fn, len = strlen(fn);
+            len > 0;
+            len--, p++)
+        {
+            if (*p == '/' || *p == '\\')
+            {
+                fn = p + 1;
+            }
+        }
+    }
+
+    return (fn);
+}
+
+// "rdtscp"
+static uint64_t fstf_tsc_and_aux(uint32_t *ipAux)
+{
+//    uint32_t a, d;
+//    *ipAux = 0;
+//    __asm__ volatile("rdtsc" : "=a" (a), "=d" (d));
+//    return ((uint64_t)a) | (((uint64_t)d) << 32);
+    uint64_t a, d;
+    __asm__ volatile("rdtscp" : "=a" (a), "=d" (d), "=c" (*ipAux));
+    return (d << 32) + a;
+}
+
+// Updates the state data to remove 'remove_bits' and add 'include_bits'
+static uint64_t fstf_change_state_bits(uint64_t remove_bits, uint64_t include_bits)
+{
+    uint64_t old_val;
+    uint64_t new_val;
+
+    do
+    {
+        old_val = atomic_load_acq_64(fstf_state_control());
+        new_val = ((old_val & (~remove_bits)) | include_bits);
+        if (old_val == new_val)
+        {
+            break;
+        }
+
+    } while (!atomic_cmpset_64(fstf_state_control(), old_val, new_val));
+
+    return (new_val);
+}
+
+// Waits for the state data to match a certain bitmask
+// opts == fstf_opt_all --> wait for state control to have all bits in bit_mask
+// opts == fstf_opt_any --> wait for state control to have any bits in bit_mask
+static int fstf_opt_all = 0;
+static int fstf_opt_any = 1;
+static uint64_t fstf_wait_for_state_bits(uint64_t wait_for_bit_mask, int opts)
+{
+    uint64_t max_wait = FSTF_CONSTANT_TIMEOUT_SECONDS * fstf_tsc_frequency_seconds();
+    uint64_t now;
+    uint32_t aux;
+    uint64_t start_cycles = fstf_tsc_and_aux(&aux);
+    uint64_t end_cycles = start_cycles + max_wait;
+    uint64_t tst;
+    do
+    {
+        __asm__ volatile("pause\n": : :"memory");
+
+        tst = atomic_load_acq_64(fstf_state_control());
+        if (tst & FSTF_STATE_BITS_FINISHED)
+        {
+            return (tst);
+        }
+        if (opts == fstf_opt_all)
+        {
+            if ((tst & wait_for_bit_mask) == wait_for_bit_mask)
+            {
+                return (tst);
+            }
+        }
+        else if (opts == fstf_opt_any)
+        {
+            if ((tst & wait_for_bit_mask))
+            {
+                return (tst);
+            }
+        }
+        now = fstf_tsc_and_aux(&aux);
+    } while (now < end_cycles);
+
+    tst = fstf_change_state_bits(0, FSTF_STATE_BIT_TIMEOUT);
+
+    return (tst);
+}
+
+// Window dressing for fstf_change_state_bits
+static uint64_t fstf_wrap_state_transition(const char * file, int line, uint64_t remove_bits, uint64_t include_bits)
+{
+    fstf_debug_fast(file, line, FSTF_DEBUG_TYPE_TRANSITION_INITIAL, fstf_state_value());
+    uint64_t newstate = fstf_change_state_bits(remove_bits, include_bits);
+    fstf_debug_fast(file, line, FSTF_DEBUG_TYPE_TRANSITION_FINAL,   newstate);
+    return (newstate);
+}
+
+// Window dressing for fstf_wait_for_state_bits
+static uint64_t fstf_wrap_state_impediment(const char * file, int line, uint64_t wait_for_bitmask, int opts)
+{
+    fstf_debug_fast(file, line, FSTF_DEBUG_TYPE_IMPEDIMENT_INITIAL, fstf_state_value());
+    uint64_t newstate = fstf_wait_for_state_bits(wait_for_bitmask, opts);
+    fstf_debug_fast(file, line, FSTF_DEBUG_TYPE_IMPEDIMENT_FINAL,   newstate);
+    return (newstate);
+}
+
+// Window dressing for fstf_wrap_check_for_bits
+static int fstf_wrap_check_for_bits(const char * file, int line, uint64_t check_for_bitmask, int opts)
+{
+    uint64_t state = fstf_state_value();
+    fstf_debug_fast(file, line, FSTF_DEBUG_TYPE_CHECK_STATE, state);
+    if (opts == fstf_opt_all)
+    {
+        if ((state & check_for_bitmask) == check_for_bitmask)
+        {
+            return (1);
+        }
+    }
+    else if (opts == fstf_opt_any)
+    {
+        if ((state & check_for_bitmask))
+        {
+            return (1);
+        }
+    }
+    return (0);
+}
+
+#define fstf_state_impediment_all(wait_for) fstf_wrap_state_impediment(__FILE__, __LINE__, wait_for, fstf_opt_all)
+#define fstf_state_impediment_any(wait_for) fstf_wrap_state_impediment(__FILE__, __LINE__, wait_for, fstf_opt_any)
+#define fstf_state_transition(remove, add)  fstf_wrap_state_transition(__FILE__, __LINE__, remove, add)
+#define fstf_state_check_all(check_for) fstf_wrap_check_for_bits(__FILE__, __LINE__, check_for, fstf_opt_all)
+#define fstf_state_check_any(check_for) fstf_wrap_check_for_bits(__FILE__, __LINE__, check_for, fstf_opt_any)
+
+#endif
diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index 83e12a5..eaab9c6 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -544,6 +544,9 @@ vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_prot_t fault_type,
 
 RetryFault:;
 
+	fstf_conditional_point_debug(__FILE__, __LINE__, map->pmap, vaddr);
+
+
 	/*
 	 * Find the backing store object and offset into it to begin the
 	 * search.
@@ -558,6 +561,7 @@ RetryFault:;
 	}
 
 	fs.map_generation = fs.map->timestamp;
+	fstf_conditional_fault_debug(__FILE__, __LINE__, map->pmap, vaddr, prot);
 
 	if (fs.entry->eflags & MAP_ENTRY_NOFAULT) {
 		panic("%s: fault on nofault entry, addr: %#lx",
@@ -607,6 +611,7 @@ RetryFault:;
 		    (fs.first_object->type != OBJT_VNODE &&
 		    (fs.first_object->flags & OBJ_TMPFS_NODE) == 0) ||
 		    (fs.first_object->flags & OBJ_MIGHTBEDIRTY) != 0) {
+			fstf_conditional_point_debug(__FILE__, __LINE__, map->pmap, vaddr);
 			rv = vm_fault_soft_fast(&fs, vaddr, prot, fault_type,
 			    fault_flags, wired, m_hold);
 			if (rv == KERN_SUCCESS)
@@ -721,6 +726,9 @@ RetryFault:;
 			 * found the page ).
 			 */
 			vm_page_xbusy(fs.m);
+			if (fs.map) {
+				fstf_conditional_point_debug(__FILE__, __LINE__, fs.map->pmap, vaddr);
+			}
 			if (fs.m->valid != VM_PAGE_BITS_ALL)
 				goto readrest;
 			break;
@@ -784,6 +792,9 @@ RetryFault:;
 					alloc_req |= VM_ALLOC_ZERO;
 				fs.m = vm_page_alloc(fs.object, fs.pindex,
 				    alloc_req);
+				if (fs.m != NULL && fs.map != NULL) {
+					fstf_conditional_point_debug(__FILE__, __LINE__, fs.map->pmap, vaddr);
+				}
 			}
 			if (fs.m == NULL) {
 				unlock_and_deallocate(&fs);
@@ -1133,6 +1144,7 @@ RetryFault:;
 				/*
 				 * Oh, well, lets copy it.
 				 */
+				fstf_conditional_point_debug(__FILE__, __LINE__, fs.map->pmap, vaddr);
 				pmap_copy_page(fs.m, fs.first_m);
 				fs.first_m->valid = VM_PAGE_BITS_ALL;
 				if (wired && (fault_flags &
@@ -1168,6 +1180,7 @@ RetryFault:;
 			curthread->td_cow++;
 		} else {
 			prot &= ~VM_PROT_WRITE;
+			fstf_conditional_fault_debug(__FILE__, __LINE__, map->pmap, vaddr, prot);
 		}
 	}
 
@@ -1185,6 +1198,7 @@ RetryFault:;
 		if (fs.map->timestamp != fs.map_generation) {
 			result = vm_map_lookup_locked(&fs.map, vaddr, fault_type,
 			    &fs.entry, &retry_object, &retry_pindex, &retry_prot, &wired);
+			fstf_conditional_fault_debug(__FILE__, __LINE__, map->pmap, vaddr, retry_prot);
 
 			/*
 			 * If we don't need the page any longer, put it on the inactive
@@ -1194,6 +1208,7 @@ RetryFault:;
 			if (result != KERN_SUCCESS) {
 				release_page(&fs);
 				unlock_and_deallocate(&fs);
+				fstf_conditional_point_debug(__FILE__, __LINE__, map->pmap, vaddr);
 
 				/*
 				 * If retry of map lookup would have blocked then
@@ -1219,6 +1234,7 @@ RetryFault:;
 			 * write-enabled after all.
 			 */
 			prot &= retry_prot;
+			fstf_conditional_fault_debug(__FILE__, __LINE__, map->pmap, vaddr, prot);
 		}
 	}
 
diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c
index f17a981..82c5fd7f 100644
--- a/sys/vm/vm_page.c
+++ b/sys/vm/vm_page.c
@@ -4094,6 +4094,465 @@ vm_page_trylock_KBI(vm_page_t m, const char *file, int line)
 	return (mtx_trylock_flags_(vm_page_lockptr(m), 0, file, line));
 }
 
+
+
+
+
+
+
+#include <sys/types.h>
+#include <sys/malloc.h>
+#include <sys/sysctl.h>
+#include <vm/vm_map.h>
+#include <machine/atomic.h>
+#include <machine/stdarg.h>
+
+#include <vm/forking_stale.h>
+
+extern  uint64_t tsc_freq;
+
+MALLOC_DEFINE(M_FSTF, "fstf", "FSTF Test Hook");
+
+static pmap_t test_pmap = NULL;
+static uintptr_t test_vaddr = FSTF_CONSTANT_VADDR;
+struct fstf_debug_data debug_entries[FSTF_DEBUG_NUMBER_ENTRIES];
+static u_int debug_position;
+static uint64_t debug_value = 0;
+static void * state_control_alloc = 0;
+static size_t state_control_alloc_length = PAGE_SIZE;
+static vm_paddr_t state_control_paddr = 0;
+static volatile uint64_t * state_control = 0;
+
+static int sysctl_fstf_setup(SYSCTL_HANDLER_ARGS);
+static int sysctl_debug_output(SYSCTL_HANDLER_ARGS);
+
+SYSCTL_NODE(_kern,
+            OID_AUTO,
+            fstf_setup,
+            CTLFLAG_RW,
+            sysctl_fstf_setup,
+            "Test hook to setup to help debug a race condition");
+
+SYSCTL_U64(_kern,
+           OID_AUTO,
+           fstf_control_paddr,
+           CTLFLAG_RD,
+           &state_control_paddr,
+           0,
+           "Read the physical memory address allocated for the test");
+
+SYSCTL_OID(_kern,
+           OID_AUTO,
+           fstf_debug_output,
+           CTLTYPE_STRING | CTLFLAG_RD,
+           NULL,
+           0,
+           sysctl_debug_output,
+           "A",
+           "Output test fixture debugging");
+
+/* Output debugging captured in memory in the kernel during the last test run */
+static int
+sysctl_debug_output(SYSCTL_HANDLER_ARGS)
+{
+        struct sbuf sbuf;
+        int num, error;
+
+        error = sysctl_wire_old_buffer(req, 0);
+        if (error != 0) {
+            return (error);
+        }
+
+        sbuf_new_for_sysctl(&sbuf, NULL, sizeof(debug_entries[0].misc) + 1024, req);
+
+        sbuf_printf(&sbuf, "\n");
+
+        for (num = 0; num < FSTF_DEBUG_NUMBER_ENTRIES; num++) {
+            if (debug_entries[num].dbg_type != FSTF_DEBUG_TYPE_UNUSED) {
+                if (debug_entries[num].dbg_type == FSTF_DEBUG_TYPE_MISCELLANEOUS) {
+
+                    error = sbuf_printf(&sbuf, "%s", debug_entries[num].misc);
+                    if (error) {
+                        printf("Error: %d handling sysctl printf\n", error);
+                        break;
+                    }
+
+                } else {
+
+                    char stbuf[512];
+                    char state_str[1024];
+                    fstf_debug_fill_state_descr(debug_entries[num].state,
+                                    stbuf,
+                                    sizeof(stbuf));
+
+                    snprintf(state_str,
+                     sizeof(state_str),
+                     "State: %s (0x%016lx) [%s]\n",
+                     stbuf,
+                     debug_entries[num].state,
+                     fstf_debug_type_descr(debug_entries[num].dbg_type));
+
+                    error = sbuf_printf(&sbuf,
+                                FSTF_DEBUG_FORMAT_STR,
+                                debug_entries[num].tsc,
+                                debug_entries[num].aux,
+                                debug_entries[num].tid,
+                                "",
+                                fstf_strip_file_path(debug_entries[num].file),
+                                debug_entries[num].line,
+                                debug_entries[num].state,
+                                state_str);
+                    if (error) {
+                        printf("Error: %d handling sysctl printf\n", error);
+                        break;
+                    }
+                }
+            }
+        }
+
+        error = sbuf_finish(&sbuf);
+
+        sbuf_delete(&sbuf);
+        return (error);
+}
+
+/* Setup for testing the race condition */
+static int
+sysctl_fstf_setup(SYSCTL_HANDLER_ARGS)
+{
+        int error;
+        pid_t pid;
+        uint64_t pidmore;
+        struct proc *p;
+
+        /* Get the value supplied; an 64 bit integer with <debug | pid> */
+
+        error = sysctl_handle_64(oidp, &pidmore, 0, req);
+        if (error || req->newptr == NULL) {
+            return (error);
+        }
+
+        FSTF_DEBUG_MISC("Test Hook: Entry: %0lx\n", pidmore);
+
+        pid = (pid_t)(FSTF_SYSCTL_PID_MASK & pidmore);
+        debug_value = (FSTF_SYSCTL_DEBUG_MASK & pidmore);
+
+        FSTF_DEBUG_MISC("Test Hook: Initial pid=%d dbg=%lu\n",
+                        pid,
+                        debug_value);
+
+        /* Confirm that the calling process is correct & look it up */
+        error = pget(pid, PGET_CANSEE | PGET_ISCURRENT | PGET_CANDEBUG | PGET_NOTWEXIT | PGET_NOTID, &p);
+        if (error != 0) {
+            printf("Test Hook: Setup Error, unable to lookup/match PID.  Error: %d\n", error);
+            return (error);
+        }
+
+        FSTF_DEBUG_MISC("Test Hook: Process Located pid=%d name=%s\n",
+                        pid,
+                        p->p_comm);
+
+        /* Release any previous test resoruces */
+        if (state_control_alloc) {
+            contigfree(state_control_alloc,
+                       state_control_alloc_length,
+                       M_FSTF);
+            state_control = NULL;
+            state_control_alloc = NULL;
+        }
+
+        /* Allocate a control region & make the physical address available via a sysctl */
+        state_control_alloc = contigmalloc(state_control_alloc_length,
+                                           M_FSTF,
+                                           M_WAITOK | M_ZERO,
+                                           0ul,
+                                           ~0ul,
+                                           PAGE_SIZE,
+                                           0);
+        if (state_control_alloc == NULL) {
+            error = 1;
+            printf("Test Hook: Setup Error, unable to allocate kernel control memory.\n");
+            goto Done;
+        }
+        state_control = state_control_alloc;
+        state_control_paddr = vtophys(state_control_alloc);
+
+        /* Squirrel away the process pmap and vaddr; we use these to make the
+         * test hooks conditional only on select access */
+        test_pmap = vmspace_pmap(p->p_vmspace);
+        test_vaddr = FSTF_CONSTANT_VADDR;
+
+        /* Clear the debugging out from any prevous runs */
+        int num;
+        for (num = 0; num < FSTF_DEBUG_NUMBER_ENTRIES; num++) {
+            debug_entries[num].dbg_type = FSTF_DEBUG_TYPE_UNUSED;
+        }
+
+        FSTF_DEBUG_MISC("Test Hook: Initiated pid=%d, name=%s, pmap=%p, vaddr=0x%jx, "
+                        "control vaddr=0x%jx, control paddr=0x%jx\n",
+                        pid,
+                        p->p_comm,
+                        test_pmap,
+                        (uintmax_t)test_vaddr,
+                        (uintmax_t)state_control,
+                        (uintmax_t)state_control_paddr);
+
+        Done:
+
+        PROC_UNLOCK(p);
+
+        return (error);
+}
+
+/* The state/control region */
+volatile uint64_t * fstf_state_control(void)
+{
+        return (state_control);
+}
+
+/* Get the tsc frequency in seconds */
+uint64_t fstf_tsc_frequency_seconds(void)
+{
+        uint64_t freq = atomic_load_acq_64(&tsc_freq);
+        return (freq);
+}
+
+/* A fast pre-determined debugging function */
+void fstf_debug_fast(const char * file, int line, uint32_t type, uint64_t state)
+{
+        if (debug_value) {
+
+            struct thread *td = curthread;
+            if ((td != NULL)) {
+
+                int pos = FSTF_DEBUG_POSITION_INDEX(atomic_fetchadd_int(&debug_position, 1));
+
+                debug_entries[pos].tsc = fstf_tsc_and_aux(&debug_entries[pos].aux);
+                debug_entries[pos].tid = td->td_tid;
+                debug_entries[pos].file = file;
+                debug_entries[pos].line = line;
+                debug_entries[pos].state = state;
+                debug_entries[pos].dbg_type = type;
+            }
+        }
+}
+
+/* A misc text debugging function */
+void fstf_debug_misc(const char * file, int line, const char *fmt, ...)
+{
+        if (debug_value) {
+
+            struct thread *td = curthread;
+            if ((td != NULL)) {
+
+                char buffer[512];
+                va_list ap;
+
+                va_start(ap, fmt);
+
+                vsnprintf(buffer, sizeof(buffer), fmt, ap);
+
+                va_end(ap);
+
+                uint64_t state = fstf_state_value();
+
+                int pos = FSTF_DEBUG_POSITION_INDEX(atomic_fetchadd_int(&debug_position, 1));
+
+                uint32_t aux;
+                uint64_t tsc = fstf_tsc_and_aux(&aux);
+
+                debug_entries[pos].dbg_type = FSTF_DEBUG_TYPE_MISCELLANEOUS;
+
+                snprintf(debug_entries[pos].misc,
+                     sizeof(debug_entries[pos].misc),
+                     FSTF_DEBUG_FORMAT_STR,
+                     tsc,
+                     aux,
+                     (td == NULL ? 0 : td->td_tid),
+                     (td == NULL ? "" : td->td_name),
+                     fstf_strip_file_path(file),
+                     line,
+                     state,
+                     buffer);
+            }
+        }
+}
+
+/* A 'got here' debugging function */
+void fstf_conditional_point_debug(const char * file, int line, pmap_t pmap, vm_offset_t vaddr)
+{
+    if ((pmap != NULL) &&
+        (pmap == test_pmap) &&
+        (((debug_value & FSTF_SYSCTL_DEBUG_HEAVY) == FSTF_SYSCTL_DEBUG_HEAVY) ||
+          ((test_vaddr <= vaddr) && (vaddr < (test_vaddr + PAGE_SIZE))))) {
+
+        fstf_debug_misc(file,
+                line,
+                "Debug: pmap=%p vaddr=0x%jx\n",
+                pmap,
+                vaddr);
+    }
+}
+
+/* A vm_page_t info debugging function */
+static void fstf_debug_page(const char * file, int line, vm_offset_t vaddr, vm_page_t m)
+{
+    fstf_debug_misc(file,
+                        line,
+                        "Debug: VAddr:0x%jx page %p obj %p pidx 0x%jx phys 0x%jx q %d hold %d wire %d"
+                        "  af 0x%x of 0x%x f 0x%x act %d busy %x valid 0x%x dirty 0x%x\n",
+                        (uintmax_t)vaddr,
+                        m,
+                        m->object,
+                        (uintmax_t)m->pindex,
+                        (uintmax_t)m->phys_addr,
+                        m->queue,
+                        m->hold_count,
+                        m->wire_count,
+                        m->aflags,
+                        m->oflags,
+                        m->flags,
+                        m->act_count,
+                        m->busy_lock,
+                        m->valid,
+                        m->dirty);
+}
+
+/* A pt_entry_t info debugging function */
+void fstf_conditional_pte_debug(const char * file, int line, pmap_t pmap, vm_offset_t vaddr, pt_entry_t pte)
+{
+    if ((pmap != NULL) &&
+        (pmap == test_pmap) &&
+        (((debug_value & FSTF_SYSCTL_DEBUG_HEAVY) == FSTF_SYSCTL_DEBUG_HEAVY) ||
+              ((test_vaddr <= vaddr) && (vaddr < (test_vaddr + PAGE_SIZE))))) {
+
+        vm_page_t m = PHYS_TO_VM_PAGE(pte & PG_FRAME);
+        fstf_debug_misc(file,
+                line,
+                "Debug: Hit[pmap:%p VAddr:0x%jx P:0x%jx F:%016lx] %s|%s|%s|%s|%s|%s|%s|%s|%s|%s|%s|%s|%s\n",
+                pmap,
+                (uintmax_t)vaddr,
+                (uintmax_t)(pte & PG_FRAME),
+                pte,
+                ((pte & PG_V) ?             "Valid" :               "Invalid"),
+                ((pte & PG_RW) ?            "Read-Write" :          "No-Read-Write"),
+                ((pte & PG_A) ?             "Accessed" :            "Not-Accessed"),
+                ((pte & PG_M) ?             "Dirty" :               "Not-Dirty"),
+                ((pte & PG_PS) ?            "4K-Page-Size" :        "2M-Page-Size"),
+                ((pte & PG_G) ?             "Global" :              "Not-Global"),
+                ((pte & PG_AVAIL1) ?        "Avail1" :              "No-Avail1"),
+                ((pte & PG_AVAIL2) ?        "Managed" :             "Unmanaged"),
+                ((pte & PG_AVAIL3) ?        "Wired" :               "Not-Wired"),
+                ((pte & PG_U) ?             "User-Supervisor" :     "No-User-Supervisor"),
+                ((pte & PG_NC_PWT) ?        "Write-Through" :       "No-Write-Through"),
+                ((pte & PG_NC_PCD) ?        "Cache-Disable" :       "No-Cache-Disable"),
+                ((pte & PG_PTE_PAT) ?       "PAT-Index" :           "No-PAT-Index"));
+        if (m != NULL)
+        {
+            fstf_debug_page(file, line, vaddr, m);
+        }
+    }
+}
+
+/* A 'got here' debugging function */
+void fstf_conditional_fault_debug(const char * file, int line, pmap_t pmap, vm_offset_t vaddr, vm_prot_t prot)
+{
+    if ((pmap != NULL) &&
+        (pmap == test_pmap) &&
+        (((debug_value & FSTF_SYSCTL_DEBUG_HEAVY) == FSTF_SYSCTL_DEBUG_HEAVY) ||
+          ((test_vaddr <= vaddr) && (vaddr < (test_vaddr + PAGE_SIZE))))) {
+
+        fstf_debug_misc(file,
+                line,
+                "Debug: pmap=%p vaddr=0x%jx prot=%s|%s|%s|%s|%s\n",
+                pmap,
+                vaddr,
+                ((prot & VM_PROT_READ) ? "Read" : "No-Read"),
+                ((prot & VM_PROT_WRITE) ? "Write" : "No-Write"),
+                ((prot & VM_PROT_EXECUTE) ? "Execute" : "No-Execute"),
+                ((prot & VM_PROT_COPY) ? "Copy" : "No-Copy"),
+                ((prot & VM_PROT_FAULT_LOOKUP) ? "Lookup" : "No-Lookup"));
+    }
+}
+
+/* A helper to check if a supplied pmap and vaddr match and should trigger test code */
+static int fstf_conditional(pmap_t pmap, vm_offset_t vaddr)
+{
+    if ((pmap != NULL) &&
+        (pmap == test_pmap) &&
+        ((test_vaddr <= vaddr) && (vaddr < (test_vaddr + PAGE_SIZE)))) {
+
+        return (1);
+    }
+    return (0);
+}
+
+/* Debug & advance the state once at the appropriate error-gen state */
+int fstf_conditional_pmapmod_advance(const char * file, int line, vm_offset_t vaddr, vm_page_t m)
+{
+        int status = 0;
+        pmap_t pmap = PCPU_GET(curpmap);
+        if (fstf_conditional(pmap, vaddr)) {
+
+            if (state_control) {
+
+                uint64_t old_val;
+                uint64_t new_val;
+
+                fstf_debug_fast(file,
+                        line,
+                        FSTF_DEBUG_TYPE_TRANSITION_INITIAL,
+                        fstf_state_value());
+
+                fstf_debug_page(file, line, vaddr, m);
+
+                do
+                {
+                    old_val = atomic_load_acq_64(state_control);
+                    if ((old_val & FSTF_STATE_BITS_PRIMED) != FSTF_STATE_BITS_PRIMED)
+                    {
+                        status = 0;
+                        goto Done;
+                    }
+                    if (old_val & FSTF_STATE_BITS_FINISHED)
+                    {
+                        status = 0;
+                        goto Done;
+                    }
+                    new_val = (old_val | FSTF_STATE_BIT_KERNEL_PMAPMOD);
+
+                } while (!atomic_cmpset_64(state_control, old_val, new_val));
+
+                fstf_debug_fast(file,
+                        line,
+                        FSTF_DEBUG_TYPE_TRANSITION_FINAL,
+                        new_val);
+                status = 1;
+            }
+        }
+
+        Done:
+
+        return (status);
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
 #if defined(INVARIANTS) || defined(INVARIANT_SUPPORT)
 void
 vm_page_assert_locked_KBI(vm_page_t m, const char *file, int line)
diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h
index 6ca808b..e250be0 100644
--- a/sys/vm/vm_page.h
+++ b/sys/vm/vm_page.h
@@ -452,6 +452,11 @@ malloc2vm_flags(int malloc_flags)
 #define	PS_ALL_VALID	0x2
 #define	PS_NONE_BUSY	0x4
 
+void fstf_conditional_point_debug(const char * file, int line, pmap_t pmap, vm_offset_t vaddr);
+void fstf_conditional_fault_debug(const char * file, int line, pmap_t pmap, vm_offset_t vaddr, vm_prot_t prot);
+void fstf_conditional_pte_debug(const char * file, int line, pmap_t pmap, vm_offset_t vaddr, pt_entry_t pte);
+int fstf_conditional_pmapmod_advance(const char * file, int line, vm_offset_t vaddr, vm_page_t m);
+
 void vm_page_busy_downgrade(vm_page_t m);
 void vm_page_busy_sleep(vm_page_t m, const char *msg, bool nonshared);
 void vm_page_flash(vm_page_t m);
-- 
2.10.2


[-- Attachment #3 --]
From 004d3e879cdf894771e260fadaf92c48e48180fe Mon Sep 17 00:00:00 2001
From: Elliott Rabe <elliott.rabe@dell.com>
Date: Sun, 11 Feb 2018 17:29:15 -0600
Subject: [PATCH 2/3] TRIAL: Double invalidate when finishing COW pmap update

When a process forks the first write to a page of memory starts a copy-on-write
operation.  The pmap is currently updated with the new physical address and the
writable status in a single atomic operation followed by the necessary TLB
invalidations.  Marking the page writeable before the invalidations are complete
allows the page contents to be changed before they are guaranteed to be fully
visible.  This can result in subtle memory corruptions.

Here is a simplified example of what can occur:

  +A process is forked which transitions a map entry to COW
  +Thread A writes to a page on the map entry, faults, updates the pmap to
   writable at a new phys addr, and starts TLB invalidations...
  +Thread B acquires a lock, writes to a location on the new phys addr,
   and releases the lock
  +Thread C acquires the lock, reads from the location on the old phys addr...
  +Thread A ...continues the TLB invalidations which are completed
  +Thread C ...reads from the location on the new phys addr, and releases
   the lock

In this example Thread B and C lock, use memory and unlock properly and neither
own the lock at the same time.  Thread C sees data protected by a lock change
beneath it while it is the lock owner.  Thread A was writing somewhere else on
the page and so never needed the lock.

This commit introduces a double-update-invalidation for the scenario above.  The
pmap update will first apply the new address in a read-only state and perform the
TLB invalidations.  Immediately afterwards, the pmap will be updated again marking
the region as writeable.  This strategy ensures the page contents are immutable
until all CPUs know the correct location.
---
 sys/amd64/amd64/pmap.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index 7bb9c1b..aa91cb1 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -4638,7 +4638,7 @@ pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot,
 	vm_paddr_t opa, pa;
 	vm_page_t mpte, om;
 	int rv;
-	boolean_t nosleep;
+	boolean_t nosleep, delayrw;
 
 	PG_A = pmap_accessed_bit(pmap);
 	PG_G = pmap_global_bit(pmap);
@@ -4728,6 +4728,7 @@ pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot,
 		panic("pmap_enter: invalid page directory va=%#lx", va);
 
 	origpte = *pte;
+	delayrw = 0;
 
 	/*
 	 * Is the specified virtual address already mapped?
@@ -4768,6 +4769,11 @@ pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot,
 			if (((origpte ^ newpte) & ~(PG_M | PG_A)) == 0)
 				goto unchanged;
 			goto validate;
+		} else if (((origpte & PG_MANAGED) != 0) &&
+			   ((origpte & PG_RW) == 0) &&
+			   ((newpte & PG_RW) != 0)) {
+			newpte &= ~PG_RW;
+			delayrw = 1;
 		}
 	} else {
 		/*
@@ -4841,6 +4847,17 @@ pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot,
 			 */
 			goto unchanged;
 		}
+		if (delayrw) {
+			/*
+			 * If both the physical address has changed and we're adding
+			 * RW, we do a two-stage update so the region is immutable
+			 * until all CPUs have visibility to the new address.
+			 */
+			if ((origpte & PG_A) != 0)
+				pmap_invalidate_page(pmap, va);
+			newpte |= PG_RW;
+			origpte = pte_load_store(pte, newpte);
+		}
 		if ((origpte & PG_A) != 0)
 			pmap_invalidate_page(pmap, va);
 	} else
-- 
2.10.2

help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5A82AB7C.6090404>