Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 May 2011 23:16:02 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        arch@freebsd.org
Subject:   Re: [PATCH] Add ktrace records for user page faults
Message-ID:  <20110502201602.GD48734@deviant.kiev.zoral.com.ua>
In-Reply-To: <201105021602.02668.jhb@freebsd.org>
References:  <201105021537.19507.jhb@freebsd.org> <20110502195555.GC48734@deviant.kiev.zoral.com.ua> <201105021602.02668.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--+NWYRNQpSl62EBce
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, May 02, 2011 at 04:02:02PM -0400, John Baldwin wrote:
> On Monday, May 02, 2011 3:55:55 pm Kostik Belousov wrote:
> > On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote:
> > > One thing I have found useful is knowing when processes are in the ke=
rnel=20
> > > instead of in userland.  ktrace already provides records for syscall=
=20
> > > entry/exit.  The other major source of time spent in the kernel that =
I've seen=20
> > > is page fault handling.  To that end, I have a patch that adds ktrace=
 records=20
> > > to the beginning and end of VM faults.  This gives a pair of records =
so a user=20
> > > can see how long a fault took (similar to how one can see how long a =
syscall=20
> > > takes now).  Sample output from kdump is below:
> > >=20
> > >  47565 echo     CALL  mmap(0x800a87000,0x179000,PROT_READ|
> > > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0)
> > >  47565 echo     RET   mmap 34370777088/0x800a87000
> > >  47565 echo     PFLT  0x800723000 VM_PROT_EXECUTE
> > >  47565 echo     RET   KERN_SUCCESS
> > >  47565 echo     CALL  munmap(0x800887000,0x179000)
> > >  47565 echo     RET   munmap 0
> > >  47565 echo     PFLT  0x800a00000 VM_PROT_WRITE
> > >  47565 echo     RET   KERN_SUCCESS
> > >=20
> > > The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.p=
atch and=20
> > > included below.
> >=20
> > One immediate detail is that trap() truncates the fault address to the
> > page address, that arguably looses useful information.
>=20
> It is true that it would be nice to have the exact faulting address, thou=
gh
> having page granularity has been sufficient for the few times I've actual=
ly
> used the address itself (e.g. I could figure out which page of libstdc++ a
> fault occurred on and narrow down from there as to which of the routines =
most
> likely was executed given what the app was doing at the time).  In my case
> knowing how much time was spent handling a page fault has been useful.
>=20
> Would we have to push this logic out of vm_fault and into every trap() ro=
utine
> to get the original address?  That would make the patch quite a bit bigger
> (touching N MD files vs 1 MI file).

Or do the reverse, making vm_fault() do trunc_page() [if doing this
change at all].

Also, I want to note another small detail, that is relevant if you plan
to MFC the change. In HEAD, vm_fault() is only called from trap()s, and
in-kernel page reads where substituted by vm_fault_hold() calls. This is
not true for stable.

Hm, was it indended to report faults from uiomove etc ?
I had this change (that relates to OOM handling) for long time, and
realized that it might be useful there.

diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c
index 4e5f8b8..0d1e68f 100644
--- a/sys/amd64/amd64/trap.c
+++ b/sys/amd64/amd64/trap.c
@@ -697,7 +697,8 @@ trap_pfault(frame, usermode)
 		PROC_UNLOCK(p);
=20
 		/* Fault in the user page: */
-		rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL);
+		rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL |
+		    (usermode ? VM_FAULT_USERMODE : 0));
=20
 		PROC_LOCK(p);
 		--p->p_lock;
diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c
index 5a8016c..236f295 100644
--- a/sys/i386/i386/trap.c
+++ b/sys/i386/i386/trap.c
@@ -856,7 +856,8 @@ trap_pfault(frame, usermode, eva)
 		PROC_UNLOCK(p);
=20
 		/* Fault in the user page: */
-		rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL);
+		rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL |
+		    (usermode ? VM_FAULT_USERMODE : 0));
=20
 		PROC_LOCK(p);
 		--p->p_lock;
diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index d417a84..c1e87ae 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -408,6 +408,11 @@ RetryFault:;
 			 */
 			fs.m =3D NULL;
 			if (!vm_page_count_severe() || P_KILLED(curproc)) {
+				if (P_KILLED(curproc) && (fault_flags &
+				    VM_FAULT_USERMODE) !=3D 0) {
+					unlock_and_deallocate(&fs);
+					return (KERN_RESOURCE_SHORTAGE);
+				}
 #if VM_NRESERVLEVEL > 0
 				if ((fs.object->flags & OBJ_COLORED) =3D=3D 0) {
 					fs.object->flags |=3D OBJ_COLORED;
diff --git a/sys/vm/vm_map.h b/sys/vm/vm_map.h
index 5311e02..7444549 100644
--- a/sys/vm/vm_map.h
+++ b/sys/vm/vm_map.h
@@ -326,6 +326,7 @@ long vmspace_wired_count(struct vmspace *vmspace);
 #define VM_FAULT_NORMAL 0		/* Nothing special */
 #define VM_FAULT_CHANGE_WIRING 1	/* Change the wiring as appropriate */
 #define	VM_FAULT_DIRTY 2		/* Dirty the page; use w/VM_PROT_COPY */
+#define	VM_FAULT_USERMODE 4		/* Fault initiated by usermode */
=20
 /*
  * The following "find_space" options are supported by vm_map_find()

--+NWYRNQpSl62EBce
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk2/EQIACgkQC3+MBN1Mb4hYAACg5gBeNduqrIahvcQ3Y5NxDMnj
wVwAoPXpUBY1Lw5AKPEzOD763BLLt80+
=+AU3
-----END PGP SIGNATURE-----

--+NWYRNQpSl62EBce--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110502201602.GD48734>