Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Mar 2012 20:34:58 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Jeremiah Lott <jlott@averesystems.com>
Cc:        alc@freebsd.org, freebsd-amd64@freebsd.org
Subject:   Re: page fault after wiring page
Message-ID:  <20120322183458.GF2358@deviant.kiev.zoral.com.ua>
In-Reply-To: <DB3A5662-87CD-459E-9DC6-9E41EF58ACF0@averesystems.com>
References:  <DB3A5662-87CD-459E-9DC6-9E41EF58ACF0@averesystems.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--xkJEvbTYpfqRozZ0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Mar 22, 2012 at 02:01:59PM -0400, Jeremiah Lott wrote:
> We've been seeing some panics and deadlocks that appear to be related to =
getting a page fault when accessing a page after it has been wired (on amd6=
4).  All the ones we have seen are related to sysctl handlers that call sys=
ctl_wire_old_buffer, then lock a mutex, then call SYSCTL_OUT.  When it does=
 the copyout, it gets a page fault even though the page has been wired, som=
etimes causing it to sleep while holding a mutex or recurse on non-recursab=
le mutexes.  Here are the two panics that are most easy to follow:
>=20
> Sleeping thread (tid 100458, pid 2737) owns a non-sleepable lock
> sched_switch() at 0xffffffff80603bf5 =3D sched_switch+0x146
> mi_switch() at 0xffffffff805e8e15 =3D mi_switch+0x183
> sleepq_switch() at 0xffffffff8061e6e7 =3D sleepq_switch+0xb1
> sleepq_wait() at 0xffffffff8061f0ea =3D sleepq_wait+0x3d
> _sx_slock_hard() at 0xffffffff805e7ca7 =3D _sx_slock_hard+0x41d
> _sx_slock() at 0xffffffff805e7e32 =3D _sx_slock+0x3d
> vm_map_lookup() at 0xffffffff807909e4 =3D vm_map_lookup+0x54
> vm_fault() at 0xffffffff80786c20 =3D vm_fault+0x11c
> trap_pfault() at 0xffffffff80844dd0 =3D trap_pfault+0xe1
> trap() at 0xffffffff80845286 =3D trap+0x337
> calltrap() at 0xffffffff80827f28 =3D calltrap+0x8
> --- trap 0xc, rip =3D 0xffffffff8084296b, rsp =3D 0xffffff811391e7e0, rbp=
 =3D 0xffffff811391e810 ---
> copyout() at 0xffffffff8084296b =3D copyout+0x3b
> sysctl_rtsock() at 0xffffffff806a5ef7 =3D sysctl_rtsock+0x499
> sysctl_root() at 0xffffffff805eab9e =3D sysctl_root+0xea
> userland_sysctl() at 0xffffffff805eae6e =3D userland_sysctl+0x14f
> sysctl() at 0xffffffff805eb258 =3D sysctl+0x9a
> amd64_syscall() at 0xffffffff80844065 =3D amd64_syscall+0x145
> Xfast_syscall() at 0xffffffff8082821c =3D Xfast_syscall+0xfc
>=20
> login: panic: _mtx_lock_sleep: recursed on non-recursive mutex process lo=
ck @ ../../../amd64/amd64/trap.c:731
> cpuid =3D 0
> KDB: stack backtrace:
> gdb_trace_self_wrapper() at 0xffffffff8057e7ea =3D gdb_trace_self_wrapper=
+0x2a
> kdb_backtrace() at 0xffffffff8062ffdc =3D kdb_backtrace+0x37
> panic() at 0xffffffff805f89ca =3D panic+0x2ad
> _mtx_lock_flags() at 0xffffffff805e9376 =3D _mtx_lock_flags
> _mtx_lock_flags() at 0xffffffff805e9417 =3D _mtx_lock_flags+0xa1
> trap_pfault() at 0xffffffff80880450 =3D trap_pfault+0xa1
> trap() at 0xffffffff80880ac7 =3D trap+0x4b8
> calltrap() at 0xffffffff80861af8 =3D calltrap+0x8
> --- trap 0xc, rip =3D 0xffffffff8087de8b, rsp =3D 0xffffff807b7e9410, rbp=
 =3D 0xffffff807b7e9440 ---
> copyout() at 0xffffffff8087de8b =3D copyout+0x3b
> sysctl_out_proc() at 0xffffffff805ed305 =3D sysctl_out_proc+0x16c
> sysctl_root() at 0xffffffff80606141 =3D sysctl_root+0x13a
> userland_sysctl() at 0xffffffff8060640a =3D userland_sysctl+0x14f
> sysctl() at 0xffffffff806067f8 =3D sysctl+0x9a
> amd64_syscall() at 0xffffffff8087f635 =3D amd64_syscall+0x145
> Xfast_syscall() at 0xffffffff80861dec =3D Xfast_syscall+0xfc
> --- syscall (202, FreeBSD ELF64, sysctl), rip =3D 0x801c12b0c, rsp =3D 0x=
7fffffffb768, rbp =3D 0x7fffffffb7b0 ---
> --- curthread 0xffffff000465b000, tid 100142
>=20
> After doing some instrumentation, I think I've figured out what is causin=
g this.  It seems that when I am wiring the page, in some situations the pa=
ge table entry is being changed from read-only -> read-write as well as bei=
ng wired.  I haven't figured out the exact scenario that causes this, but I=
 can definitely see it in my added trace.  Here is an example page table en=
try transition I am seeing in pmap_enter that is called as a result of the =
wire:
>=20
> pmap_enter: origpte: 80000000ad201425 newpte: 80000000ad201607
>=20
> This means that we are setting PG_W (wired) and PG_RW (read/write) in thi=
s pmap_enter operation.  Everytime I saw a page-fault after wiring it was i=
mmediately preceded by a transition like this (in the cases that did not pa=
ge fault, the page table entry already had PG_RW set).  This made me suspec=
t that a read-only version of the page table entry was cached in the TLB.  =
I noticed we invalidate in some situations in pmap_enter, but this transiti=
on is not one of them.  I was able to eliminate the panics by making this c=
hange:
>=20
> diff --git a/src/sys/amd64/amd64/pmap.c b/src/sys/amd64/amd64/pmap.c
> --- a/src/sys/amd64/amd64/pmap.c
> +++ b/src/sys/amd64/amd64/pmap.c
> @@ -3251,6 +3251,11 @@ validate:
>                                 if (opa !=3D VM_PAGE_TO_PHYS(m) || ((orig=
pte &
>                                     PG_NX) =3D=3D 0 && (newpte & PG_NX)))
>                                         invlva =3D TRUE;
> +                               if ((newpte & PG_W) &&
> +                                   ((origpte & PG_RW) =3D=3D 0) &&
> +                                   (newpte & PG_RW)) {
> +                                       invlva =3D TRUE;
> +                               }
>                        }
>                         if ((origpte & (PG_M | PG_RW)) =3D=3D (PG_M | PG_=
RW)) {
>                                 if ((origpte & PG_MANAGED) !=3D 0)
>=20
> I wanted to see if anyone has seen issues in this area, and if this fix s=
eems appropriate.  I'm running 8.2, but I didn't see any obvious changes to=
 pmap stuff in head which would change this behavior.  Thanks for any feedb=
ack,
>=20
>   Jeremiah Lott
>   Avere Systems

This should be the issue fixed in the r233291.

--xkJEvbTYpfqRozZ0
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk9rcNAACgkQC3+MBN1Mb4hyDgCcC4HAZSGcRJOYwxRV08Rguk3X
KmAAn207yJzVN/FORpBPyDzFb9LNaKb8
=59Ts
-----END PGP SIGNATURE-----

--xkJEvbTYpfqRozZ0--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120322183458.GF2358>