FreeBSD Mail Archives

Date:      Thu, 22 Mar 2012 14:01:59 -0400
From:      Jeremiah Lott <jlott@averesystems.com>
To:        freebsd-amd64@freebsd.org
Cc:        alc@freebsd.org, kib@freebsd.org
Subject:   page fault after wiring page
Message-ID:  <DB3A5662-87CD-459E-9DC6-9E41EF58ACF0@averesystems.com>

index | next in thread | raw e-mail


We've been seeing some panics and deadlocks that appear to be related to getting a page fault when accessing a page after it has been wired (on amd64).  All the ones we have seen are related to sysctl handlers that call sysctl_wire_old_buffer, then lock a mutex, then call SYSCTL_OUT.  When it does the copyout, it gets a page fault even though the page has been wired, sometimes causing it to sleep while holding a mutex or recurse on non-recursable mutexes.  Here are the two panics that are most easy to follow:

Sleeping thread (tid 100458, pid 2737) owns a non-sleepable lock
sched_switch() at 0xffffffff80603bf5 = sched_switch+0x146
mi_switch() at 0xffffffff805e8e15 = mi_switch+0x183
sleepq_switch() at 0xffffffff8061e6e7 = sleepq_switch+0xb1
sleepq_wait() at 0xffffffff8061f0ea = sleepq_wait+0x3d
_sx_slock_hard() at 0xffffffff805e7ca7 = _sx_slock_hard+0x41d
_sx_slock() at 0xffffffff805e7e32 = _sx_slock+0x3d
vm_map_lookup() at 0xffffffff807909e4 = vm_map_lookup+0x54
vm_fault() at 0xffffffff80786c20 = vm_fault+0x11c
trap_pfault() at 0xffffffff80844dd0 = trap_pfault+0xe1
trap() at 0xffffffff80845286 = trap+0x337
calltrap() at 0xffffffff80827f28 = calltrap+0x8
--- trap 0xc, rip = 0xffffffff8084296b, rsp = 0xffffff811391e7e0, rbp = 0xffffff811391e810 ---
copyout() at 0xffffffff8084296b = copyout+0x3b
sysctl_rtsock() at 0xffffffff806a5ef7 = sysctl_rtsock+0x499
sysctl_root() at 0xffffffff805eab9e = sysctl_root+0xea
userland_sysctl() at 0xffffffff805eae6e = userland_sysctl+0x14f
sysctl() at 0xffffffff805eb258 = sysctl+0x9a
amd64_syscall() at 0xffffffff80844065 = amd64_syscall+0x145
Xfast_syscall() at 0xffffffff8082821c = Xfast_syscall+0xfc

login: panic: _mtx_lock_sleep: recursed on non-recursive mutex process lock @ ../../../amd64/amd64/trap.c:731
cpuid = 0
KDB: stack backtrace:
gdb_trace_self_wrapper() at 0xffffffff8057e7ea = gdb_trace_self_wrapper+0x2a
kdb_backtrace() at 0xffffffff8062ffdc = kdb_backtrace+0x37
panic() at 0xffffffff805f89ca = panic+0x2ad
_mtx_lock_flags() at 0xffffffff805e9376 = _mtx_lock_flags
_mtx_lock_flags() at 0xffffffff805e9417 = _mtx_lock_flags+0xa1
trap_pfault() at 0xffffffff80880450 = trap_pfault+0xa1
trap() at 0xffffffff80880ac7 = trap+0x4b8
calltrap() at 0xffffffff80861af8 = calltrap+0x8
--- trap 0xc, rip = 0xffffffff8087de8b, rsp = 0xffffff807b7e9410, rbp = 0xffffff807b7e9440 ---
copyout() at 0xffffffff8087de8b = copyout+0x3b
sysctl_out_proc() at 0xffffffff805ed305 = sysctl_out_proc+0x16c
sysctl_root() at 0xffffffff80606141 = sysctl_root+0x13a
userland_sysctl() at 0xffffffff8060640a = userland_sysctl+0x14f
sysctl() at 0xffffffff806067f8 = sysctl+0x9a
amd64_syscall() at 0xffffffff8087f635 = amd64_syscall+0x145
Xfast_syscall() at 0xffffffff80861dec = Xfast_syscall+0xfc
--- syscall (202, FreeBSD ELF64, sysctl), rip = 0x801c12b0c, rsp = 0x7fffffffb768, rbp = 0x7fffffffb7b0 ---
--- curthread 0xffffff000465b000, tid 100142

After doing some instrumentation, I think I've figured out what is causing this.  It seems that when I am wiring the page, in some situations the page table entry is being changed from read-only -> read-write as well as being wired.  I haven't figured out the exact scenario that causes this, but I can definitely see it in my added trace.  Here is an example page table entry transition I am seeing in pmap_enter that is called as a result of the wire:

pmap_enter: origpte: 80000000ad201425 newpte: 80000000ad201607

This means that we are setting PG_W (wired) and PG_RW (read/write) in this pmap_enter operation.  Everytime I saw a page-fault after wiring it was immediately preceded by a transition like this (in the cases that did not page fault, the page table entry already had PG_RW set).  This made me suspect that a read-only version of the page table entry was cached in the TLB.  I noticed we invalidate in some situations in pmap_enter, but this transition is not one of them.  I was able to eliminate the panics by making this change:

diff --git a/src/sys/amd64/amd64/pmap.c b/src/sys/amd64/amd64/pmap.c
--- a/src/sys/amd64/amd64/pmap.c
+++ b/src/sys/amd64/amd64/pmap.c
@@ -3251,6 +3251,11 @@ validate:
                                if (opa != VM_PAGE_TO_PHYS(m) || ((origpte &
                                    PG_NX) == 0 && (newpte & PG_NX)))
                                        invlva = TRUE;
+                               if ((newpte & PG_W) &&
+                                   ((origpte & PG_RW) == 0) &&
+                                   (newpte & PG_RW)) {
+                                       invlva = TRUE;
+                               }
                       }
                        if ((origpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) {
                                if ((origpte & PG_MANAGED) != 0)

I wanted to see if anyone has seen issues in this area, and if this fix seems appropriate.  I'm running 8.2, but I didn't see any obvious changes to pmap stuff in head which would change this behavior.  Thanks for any feedback,

  Jeremiah Lott
  Avere Systems

help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DB3A5662-87CD-459E-9DC6-9E41EF58ACF0>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation