Date: Mon, 2 May 2011 23:16:02 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: John Baldwin <jhb@freebsd.org> Cc: arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults Message-ID: <20110502201602.GD48734@deviant.kiev.zoral.com.ua> In-Reply-To: <201105021602.02668.jhb@freebsd.org> References: <201105021537.19507.jhb@freebsd.org> <20110502195555.GC48734@deviant.kiev.zoral.com.ua> <201105021602.02668.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--+NWYRNQpSl62EBce Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, May 02, 2011 at 04:02:02PM -0400, John Baldwin wrote: > On Monday, May 02, 2011 3:55:55 pm Kostik Belousov wrote: > > On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote: > > > One thing I have found useful is knowing when processes are in the ke= rnel=20 > > > instead of in userland. ktrace already provides records for syscall= =20 > > > entry/exit. The other major source of time spent in the kernel that = I've seen=20 > > > is page fault handling. To that end, I have a patch that adds ktrace= records=20 > > > to the beginning and end of VM faults. This gives a pair of records = so a user=20 > > > can see how long a fault took (similar to how one can see how long a = syscall=20 > > > takes now). Sample output from kdump is below: > > >=20 > > > 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ| > > > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0) > > > 47565 echo RET mmap 34370777088/0x800a87000 > > > 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE > > > 47565 echo RET KERN_SUCCESS > > > 47565 echo CALL munmap(0x800887000,0x179000) > > > 47565 echo RET munmap 0 > > > 47565 echo PFLT 0x800a00000 VM_PROT_WRITE > > > 47565 echo RET KERN_SUCCESS > > >=20 > > > The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.p= atch and=20 > > > included below. > >=20 > > One immediate detail is that trap() truncates the fault address to the > > page address, that arguably looses useful information. >=20 > It is true that it would be nice to have the exact faulting address, thou= gh > having page granularity has been sufficient for the few times I've actual= ly > used the address itself (e.g. I could figure out which page of libstdc++ a > fault occurred on and narrow down from there as to which of the routines = most > likely was executed given what the app was doing at the time). In my case > knowing how much time was spent handling a page fault has been useful. >=20 > Would we have to push this logic out of vm_fault and into every trap() ro= utine > to get the original address? That would make the patch quite a bit bigger > (touching N MD files vs 1 MI file). Or do the reverse, making vm_fault() do trunc_page() [if doing this change at all]. Also, I want to note another small detail, that is relevant if you plan to MFC the change. In HEAD, vm_fault() is only called from trap()s, and in-kernel page reads where substituted by vm_fault_hold() calls. This is not true for stable. Hm, was it indended to report faults from uiomove etc ? I had this change (that relates to OOM handling) for long time, and realized that it might be useful there. diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c index 4e5f8b8..0d1e68f 100644 --- a/sys/amd64/amd64/trap.c +++ b/sys/amd64/amd64/trap.c @@ -697,7 +697,8 @@ trap_pfault(frame, usermode) PROC_UNLOCK(p); =20 /* Fault in the user page: */ - rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL); + rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL | + (usermode ? VM_FAULT_USERMODE : 0)); =20 PROC_LOCK(p); --p->p_lock; diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c index 5a8016c..236f295 100644 --- a/sys/i386/i386/trap.c +++ b/sys/i386/i386/trap.c @@ -856,7 +856,8 @@ trap_pfault(frame, usermode, eva) PROC_UNLOCK(p); =20 /* Fault in the user page: */ - rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL); + rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL | + (usermode ? VM_FAULT_USERMODE : 0)); =20 PROC_LOCK(p); --p->p_lock; diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c index d417a84..c1e87ae 100644 --- a/sys/vm/vm_fault.c +++ b/sys/vm/vm_fault.c @@ -408,6 +408,11 @@ RetryFault:; */ fs.m =3D NULL; if (!vm_page_count_severe() || P_KILLED(curproc)) { + if (P_KILLED(curproc) && (fault_flags & + VM_FAULT_USERMODE) !=3D 0) { + unlock_and_deallocate(&fs); + return (KERN_RESOURCE_SHORTAGE); + } #if VM_NRESERVLEVEL > 0 if ((fs.object->flags & OBJ_COLORED) =3D=3D 0) { fs.object->flags |=3D OBJ_COLORED; diff --git a/sys/vm/vm_map.h b/sys/vm/vm_map.h index 5311e02..7444549 100644 --- a/sys/vm/vm_map.h +++ b/sys/vm/vm_map.h @@ -326,6 +326,7 @@ long vmspace_wired_count(struct vmspace *vmspace); #define VM_FAULT_NORMAL 0 /* Nothing special */ #define VM_FAULT_CHANGE_WIRING 1 /* Change the wiring as appropriate */ #define VM_FAULT_DIRTY 2 /* Dirty the page; use w/VM_PROT_COPY */ +#define VM_FAULT_USERMODE 4 /* Fault initiated by usermode */ =20 /* * The following "find_space" options are supported by vm_map_find() --+NWYRNQpSl62EBce Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk2/EQIACgkQC3+MBN1Mb4hYAACg5gBeNduqrIahvcQ3Y5NxDMnj wVwAoPXpUBY1Lw5AKPEzOD763BLLt80+ =+AU3 -----END PGP SIGNATURE----- --+NWYRNQpSl62EBce--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110502201602.GD48734>