Date: Mon, 2 May 2011 16:25:25 -0400 From: John Baldwin <jhb@freebsd.org> To: Kostik Belousov <kostikbel@gmail.com> Cc: arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults Message-ID: <201105021625.25473.jhb@freebsd.org> In-Reply-To: <20110502201602.GD48734@deviant.kiev.zoral.com.ua> References: <201105021537.19507.jhb@freebsd.org> <201105021602.02668.jhb@freebsd.org> <20110502201602.GD48734@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, May 02, 2011 4:16:02 pm Kostik Belousov wrote: > On Mon, May 02, 2011 at 04:02:02PM -0400, John Baldwin wrote: > > On Monday, May 02, 2011 3:55:55 pm Kostik Belousov wrote: > > > On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote: > > > > One thing I have found useful is knowing when processes are in the kernel > > > > instead of in userland. ktrace already provides records for syscall > > > > entry/exit. The other major source of time spent in the kernel that I've seen > > > > is page fault handling. To that end, I have a patch that adds ktrace records > > > > to the beginning and end of VM faults. This gives a pair of records so a user > > > > can see how long a fault took (similar to how one can see how long a syscall > > > > takes now). Sample output from kdump is below: > > > > > > > > 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ| > > > > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0) > > > > 47565 echo RET mmap 34370777088/0x800a87000 > > > > 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE > > > > 47565 echo RET KERN_SUCCESS > > > > 47565 echo CALL munmap(0x800887000,0x179000) > > > > 47565 echo RET munmap 0 > > > > 47565 echo PFLT 0x800a00000 VM_PROT_WRITE > > > > 47565 echo RET KERN_SUCCESS > > > > > > > > The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.patch and > > > > included below. > > > > > > One immediate detail is that trap() truncates the fault address to the > > > page address, that arguably looses useful information. > > > > It is true that it would be nice to have the exact faulting address, though > > having page granularity has been sufficient for the few times I've actually > > used the address itself (e.g. I could figure out which page of libstdc++ a > > fault occurred on and narrow down from there as to which of the routines most > > likely was executed given what the app was doing at the time). In my case > > knowing how much time was spent handling a page fault has been useful. > > > > Would we have to push this logic out of vm_fault and into every trap() routine > > to get the original address? That would make the patch quite a bit bigger > > (touching N MD files vs 1 MI file). > > Or do the reverse, making vm_fault() do trunc_page() [if doing this > change at all]. Ok. That sounds sensible. > Also, I want to note another small detail, that is relevant if you plan > to MFC the change. In HEAD, vm_fault() is only called from trap()s, and > in-kernel page reads where substituted by vm_fault_hold() calls. This is > not true for stable. > > Hm, was it indended to report faults from uiomove etc ? > I had this change (that relates to OOM handling) for long time, and > realized that it might be useful there. I don't mind if it reports those faults as they will be bracketed by a system call entry/exit. However, I primarily care about user-initiated faults. > diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c > index 4e5f8b8..0d1e68f 100644 > --- a/sys/amd64/amd64/trap.c > +++ b/sys/amd64/amd64/trap.c > @@ -697,7 +697,8 @@ trap_pfault(frame, usermode) > PROC_UNLOCK(p); > > /* Fault in the user page: */ > - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL); > + rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL | > + (usermode ? VM_FAULT_USERMODE : 0)); > > PROC_LOCK(p); > --p->p_lock; > diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c > index 5a8016c..236f295 100644 > --- a/sys/i386/i386/trap.c > +++ b/sys/i386/i386/trap.c > @@ -856,7 +856,8 @@ trap_pfault(frame, usermode, eva) > PROC_UNLOCK(p); > > /* Fault in the user page: */ > - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL); > + rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL | > + (usermode ? VM_FAULT_USERMODE : 0)); > > PROC_LOCK(p); > --p->p_lock; > diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c > index d417a84..c1e87ae 100644 > --- a/sys/vm/vm_fault.c > +++ b/sys/vm/vm_fault.c > @@ -408,6 +408,11 @@ RetryFault:; > */ > fs.m = NULL; > if (!vm_page_count_severe() || P_KILLED(curproc)) { > + if (P_KILLED(curproc) && (fault_flags & > + VM_FAULT_USERMODE) != 0) { > + unlock_and_deallocate(&fs); > + return (KERN_RESOURCE_SHORTAGE); > + } > #if VM_NRESERVLEVEL > 0 > if ((fs.object->flags & OBJ_COLORED) == 0) { > fs.object->flags |= OBJ_COLORED; > diff --git a/sys/vm/vm_map.h b/sys/vm/vm_map.h > index 5311e02..7444549 100644 > --- a/sys/vm/vm_map.h > +++ b/sys/vm/vm_map.h > @@ -326,6 +326,7 @@ long vmspace_wired_count(struct vmspace *vmspace); > #define VM_FAULT_NORMAL 0 /* Nothing special */ > #define VM_FAULT_CHANGE_WIRING 1 /* Change the wiring as appropriate */ > #define VM_FAULT_DIRTY 2 /* Dirty the page; use w/VM_PROT_COPY */ > +#define VM_FAULT_USERMODE 4 /* Fault initiated by usermode */ > > /* > * The following "find_space" options are supported by vm_map_find() > -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201105021625.25473.jhb>