Date: Thu, 22 Mar 2012 20:24:12 +0200 From: Andriy Gapon <avg@FreeBSD.org> To: Jeremiah Lott <jlott@averesystems.com> Cc: alc@FreeBSD.org, kib@FreeBSD.org, freebsd-amd64@FreeBSD.org Subject: Re: page fault after wiring page Message-ID: <4F6B6E4C.9030503@FreeBSD.org> In-Reply-To: <DB3A5662-87CD-459E-9DC6-9E41EF58ACF0@averesystems.com> References: <DB3A5662-87CD-459E-9DC6-9E41EF58ACF0@averesystems.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I see that you've already CC-ed the right people :-) Does this commit look related http://svn.freebsd.org/changeset/base/233291 ? on 22/03/2012 20:01 Jeremiah Lott said the following: > We've been seeing some panics and deadlocks that appear to be related to getting a page fault when accessing a page after it has been wired (on amd64). All the ones we have seen are related to sysctl handlers that call sysctl_wire_old_buffer, then lock a mutex, then call SYSCTL_OUT. When it does the copyout, it gets a page fault even though the page has been wired, sometimes causing it to sleep while holding a mutex or recurse on non-recursable mutexes. Here are the two panics that are most easy to follow: > > Sleeping thread (tid 100458, pid 2737) owns a non-sleepable lock > sched_switch() at 0xffffffff80603bf5 = sched_switch+0x146 > mi_switch() at 0xffffffff805e8e15 = mi_switch+0x183 > sleepq_switch() at 0xffffffff8061e6e7 = sleepq_switch+0xb1 > sleepq_wait() at 0xffffffff8061f0ea = sleepq_wait+0x3d > _sx_slock_hard() at 0xffffffff805e7ca7 = _sx_slock_hard+0x41d > _sx_slock() at 0xffffffff805e7e32 = _sx_slock+0x3d > vm_map_lookup() at 0xffffffff807909e4 = vm_map_lookup+0x54 > vm_fault() at 0xffffffff80786c20 = vm_fault+0x11c > trap_pfault() at 0xffffffff80844dd0 = trap_pfault+0xe1 > trap() at 0xffffffff80845286 = trap+0x337 > calltrap() at 0xffffffff80827f28 = calltrap+0x8 > --- trap 0xc, rip = 0xffffffff8084296b, rsp = 0xffffff811391e7e0, rbp = 0xffffff811391e810 --- > copyout() at 0xffffffff8084296b = copyout+0x3b > sysctl_rtsock() at 0xffffffff806a5ef7 = sysctl_rtsock+0x499 > sysctl_root() at 0xffffffff805eab9e = sysctl_root+0xea > userland_sysctl() at 0xffffffff805eae6e = userland_sysctl+0x14f > sysctl() at 0xffffffff805eb258 = sysctl+0x9a > amd64_syscall() at 0xffffffff80844065 = amd64_syscall+0x145 > Xfast_syscall() at 0xffffffff8082821c = Xfast_syscall+0xfc > > login: panic: _mtx_lock_sleep: recursed on non-recursive mutex process lock @ ../../../amd64/amd64/trap.c:731 > cpuid = 0 > KDB: stack backtrace: > gdb_trace_self_wrapper() at 0xffffffff8057e7ea = gdb_trace_self_wrapper+0x2a > kdb_backtrace() at 0xffffffff8062ffdc = kdb_backtrace+0x37 > panic() at 0xffffffff805f89ca = panic+0x2ad > _mtx_lock_flags() at 0xffffffff805e9376 = _mtx_lock_flags > _mtx_lock_flags() at 0xffffffff805e9417 = _mtx_lock_flags+0xa1 > trap_pfault() at 0xffffffff80880450 = trap_pfault+0xa1 > trap() at 0xffffffff80880ac7 = trap+0x4b8 > calltrap() at 0xffffffff80861af8 = calltrap+0x8 > --- trap 0xc, rip = 0xffffffff8087de8b, rsp = 0xffffff807b7e9410, rbp = 0xffffff807b7e9440 --- > copyout() at 0xffffffff8087de8b = copyout+0x3b > sysctl_out_proc() at 0xffffffff805ed305 = sysctl_out_proc+0x16c > sysctl_root() at 0xffffffff80606141 = sysctl_root+0x13a > userland_sysctl() at 0xffffffff8060640a = userland_sysctl+0x14f > sysctl() at 0xffffffff806067f8 = sysctl+0x9a > amd64_syscall() at 0xffffffff8087f635 = amd64_syscall+0x145 > Xfast_syscall() at 0xffffffff80861dec = Xfast_syscall+0xfc > --- syscall (202, FreeBSD ELF64, sysctl), rip = 0x801c12b0c, rsp = 0x7fffffffb768, rbp = 0x7fffffffb7b0 --- > --- curthread 0xffffff000465b000, tid 100142 > > After doing some instrumentation, I think I've figured out what is causing this. It seems that when I am wiring the page, in some situations the page table entry is being changed from read-only -> read-write as well as being wired. I haven't figured out the exact scenario that causes this, but I can definitely see it in my added trace. Here is an example page table entry transition I am seeing in pmap_enter that is called as a result of the wire: > > pmap_enter: origpte: 80000000ad201425 newpte: 80000000ad201607 > > This means that we are setting PG_W (wired) and PG_RW (read/write) in this pmap_enter operation. Everytime I saw a page-fault after wiring it was immediately preceded by a transition like this (in the cases that did not page fault, the page table entry already had PG_RW set). This made me suspect that a read-only version of the page table entry was cached in the TLB. I noticed we invalidate in some situations in pmap_enter, but this transition is not one of them. I was able to eliminate the panics by making this change: > > diff --git a/src/sys/amd64/amd64/pmap.c b/src/sys/amd64/amd64/pmap.c > --- a/src/sys/amd64/amd64/pmap.c > +++ b/src/sys/amd64/amd64/pmap.c > @@ -3251,6 +3251,11 @@ validate: > if (opa != VM_PAGE_TO_PHYS(m) || ((origpte & > PG_NX) == 0 && (newpte & PG_NX))) > invlva = TRUE; > + if ((newpte & PG_W) && > + ((origpte & PG_RW) == 0) && > + (newpte & PG_RW)) { > + invlva = TRUE; > + } > } > if ((origpte & (PG_M | PG_RW)) == (PG_M | PG_RW)) { > if ((origpte & PG_MANAGED) != 0) > > I wanted to see if anyone has seen issues in this area, and if this fix seems appropriate. I'm running 8.2, but I didn't see any obvious changes to pmap stuff in head which would change this behavior. Thanks for any feedback, > > Jeremiah Lott > Avere Systems -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F6B6E4C.9030503>