From owner-freebsd-arch@FreeBSD.ORG Mon May 2 19:05:48 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8F3D106566C for ; Mon, 2 May 2011 19:05:48 +0000 (UTC) (envelope-from bodonnell@businessgrowthcapital.com) Received: from mail41.atl.registeredsite.com (mail41.atl.registeredsite.com [209.237.134.231]) by mx1.freebsd.org (Postfix) with ESMTP id 641ED8FC15 for ; Mon, 2 May 2011 19:05:48 +0000 (UTC) Received: from mymail.myregisteredsite.com (wmailnode3e.webmail.web.com [209.237.135.47]) by mail41.atl.registeredsite.com (8.12.11.20060308/8.12.11) with SMTP id p42IYbis015128 for ; Mon, 2 May 2011 14:34:37 -0400 Received: (qmail 17071 invoked by uid 80); 2 May 2011 18:34:36 -0000 Received: from unknown (HELO BO1) (bodonnell@businessgrowthcapital.com@173.171.127.193) by wmailnode3e.webmail.web.com with ESMTPA; 2 May 2011 18:34:36 -0000 MIME-Version: 1.0 Date: Mon, 02 May 2011 14:33:48 -0400 X-Priority: 3 (Normal) X-Mailer: EMP Pro From: "Bill O'Donnell" To: arch@freebsd.org X-SMTPQ-Version: 3.3.0 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Business Growth Capital X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: bodonnell@businessgrowthcapital.com List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 19:05:48 -0000 =20 Good afternoon, I am seeking business owners who need working capital and have had a har= d time getting it due to the economy. Many online businesses are growing= and if they had more capital they could reach their goals much faster. = If you are a growing online business and would like to learn more about = our financing solutions please visit: www.BusinessGrowthCapital.com or c= all me: (800) 807-4079 x10 =20 Here is a snap shot of our product menu: 1. Prosper25 unsecured up to $25,000 5 year terms fund less than = 14 days 2. Factor of credit card receivables unsecured up to $1MM funds = in less than a week 3. Fixed loan up to $100,000 =20 Today, we represent private, equity and traditional funding sources. Wi= th over 25 years of business financing experience my resources are very = valuable to your company. =20 Best regards, Bill ODonnell President www.BusinessGrowthCapital.com Confidentiality, Privacy, and Security Notice: The content, materials, a= nd accompanying attachment(s) contained within any E-Mail (electronic ma= il transmission) is intended solely for the individual or entity to whic= h it is addressed [authorized recipient(s)] which may be confidential, e= xempt from disclosure under the Electronic Communications Privacy Act, a= nd/or legally privileged. The message facilitates a previous agreement o= f the transaction/service of a transactional relationship for which the = intended recipient explicitly has double confirmed agreement to be conta= cted and informed in an ongoing capacity. If you are not the intended re= cipient(s), responsible for delivering partially or in full any transmis= sion to the intended recipient(s), and/or have received the transmission= in error, you are hereby notified you are strictly prohibited from read= ing, copying, printing, distributing and/or disclosing any of the conten= t, materials, and accompanying attachment(s) contained within. If you ha= ve received any portion of the transmission in error, please notify the = original sender by forwarding all transmissions to info@BusinessGrowthCa= pital.com and delete the original along with all copies of the transmiss= ion to include any accompanying attachment (s). Any views, commentary, a= nd/or opinions presented within are solely those of the author(s) and do= not necessarily represent those of any other company(s) or parent entit= y(s). At anytime you may stop further transactional communications by le= tting us know by going to our site mentioned below and putting your emai= l in the unsubscribe link at the bottom and you will no longer receive c= ommunications from us or write to: Business Growth Capital, LLC | 100 N.= Tampa Street, Tampa Fl 34654 Attn: Compliance Allow up to ten day to pr= ocess your request please. This advertisement is in complete compliance = with federal and state laws.=20 Truly, if I have disturbed you please accept my apologies. I wont agin. = Click here to unsubscribe From owner-freebsd-arch@FreeBSD.ORG Mon May 2 19:37:21 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4EE87106564A for ; Mon, 2 May 2011 19:37:21 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1382D8FC19 for ; Mon, 2 May 2011 19:37:21 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 9F8AD46B0D for ; Mon, 2 May 2011 15:37:20 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 39E1B8A01B for ; Mon, 2 May 2011 15:37:20 -0400 (EDT) From: John Baldwin To: arch@freebsd.org Date: Mon, 2 May 2011 15:37:19 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201105021537.19507.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 02 May 2011 15:37:20 -0400 (EDT) Cc: Subject: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 19:37:21 -0000 One thing I have found useful is knowing when processes are in the kernel instead of in userland. ktrace already provides records for syscall entry/exit. The other major source of time spent in the kernel that I've seen is page fault handling. To that end, I have a patch that adds ktrace records to the beginning and end of VM faults. This gives a pair of records so a user can see how long a fault took (similar to how one can see how long a syscall takes now). Sample output from kdump is below: 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ| PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0) 47565 echo RET mmap 34370777088/0x800a87000 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE 47565 echo RET KERN_SUCCESS 47565 echo CALL munmap(0x800887000,0x179000) 47565 echo RET munmap 0 47565 echo PFLT 0x800a00000 VM_PROT_WRITE 47565 echo RET KERN_SUCCESS The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.patch and included below. Index: usr.bin/kdump/kdump.c =================================================================== --- usr.bin/kdump/kdump.c (.../mirror/FreeBSD/stable/8) (revision 221926) +++ usr.bin/kdump/kdump.c (.../stable/8) (revision 221926) @@ -103,6 +103,8 @@ void ktrsockaddr(struct sockaddr *); void ktrstat(struct stat *); void ktrstruct(char *, size_t); +void ktrfault(struct ktr_fault *); +void ktrfaultend(struct ktr_faultend *); void usage(void); void sockfamilyname(int); const char *ioctlname(u_long); @@ -306,6 +308,12 @@ case KTR_STRUCT: ktrstruct(m, ktrlen); break; + case KTR_FAULT: + ktrfault((struct ktr_fault *)m); + break; + case KTR_FAULTEND: + ktrfaultend((struct ktr_faultend *)m); + break; default: printf("\n"); break; @@ -445,6 +453,12 @@ /* FALLTHROUGH */ case KTR_PROCDTOR: return; + case KTR_FAULT: + type = "PFLT"; + break; + case KTR_FAULTEND: + type = "RET "; + break; default: (void)sprintf(unknown, "UNKNOWN(%d)", kth->ktr_type); type = unknown; @@ -1505,6 +1519,23 @@ printf("invalid record\n"); } +void +ktrfault(struct ktr_fault *ktr) +{ + + printf("0x%jx ", ktr->vaddr); + vmprotname(ktr->type); + printf("\n"); +} + +void +ktrfaultend(struct ktr_faultend *ktr) +{ + + vmresultname(ktr->result); + printf("\n"); +} + #if defined(__amd64__) || defined(__i386__) void linux_ktrsyscall(struct ktr_syscall *ktr) Index: usr.bin/kdump/kdump_subr.h =================================================================== --- usr.bin/kdump/kdump_subr.h (.../mirror/FreeBSD/stable/8) (revision 221926) +++ usr.bin/kdump/kdump_subr.h (.../stable/8) (revision 221926) @@ -45,3 +45,5 @@ void minheritname (int); void quotactlname (int); void ptraceopname (int); +void vmprotname (int); +void vmresultname (int); Index: usr.bin/kdump/mksubr =================================================================== --- usr.bin/kdump/mksubr (.../mirror/FreeBSD/stable/8) (revision 221926) +++ usr.bin/kdump/mksubr (.../stable/8) (revision 221926) @@ -160,6 +160,8 @@ #include #include #include +#include +#include #include "kdump_subr.h" @@ -304,6 +306,26 @@ } } +/* + * MANUAL + * + * Used for page fault type. Cannot use auto_or_type since the macro + * values contain a cast. Also, VM_PROT_NONE has to be handled specially. + */ +void +vmprotname (int type) +{ + int or = 0; + + if (type == VM_PROT_NONE) { + (void)printf("VM_PROT_NONE"); + return; + } + if_print_or(type, VM_PROT_READ, or); + if_print_or(type, VM_PROT_WRITE, or); + if_print_or(type, VM_PROT_EXECUTE, or); + if_print_or(type, VM_PROT_OVERRIDE_WRITE, or); +} _EOF_ auto_or_type "modename" "S_[A-Z]+[[:space:]]+[0-6]{7}" "sys/stat.h" @@ -344,6 +366,7 @@ auto_switch_type "sockoptname" "SO_[A-Z]+[[:space:]]+0x[0-9]+" "sys/socket.h" auto_switch_type "socktypename" "SOCK_[A-Z]+[[:space:]]+[1-9]+[0-9]*" "sys/socket.h" auto_switch_type "ptraceopname" "PT_[[:alnum:]]+[[:space:]]+[0-9]+" "sys/ptrace.h" +auto_switch_type "vmresultname" "KERN_[A-Z]+[[:space:]]+[0-9]+" "vm/vm_param.h" cat <<_EOF_ /* Index: usr.bin/ktrace/ktrace.h =================================================================== --- usr.bin/ktrace/ktrace.h (.../mirror/FreeBSD/stable/8) (revision 221926) +++ usr.bin/ktrace/ktrace.h (.../stable/8) (revision 221926) @@ -36,7 +36,8 @@ #define DEF_POINTS (KTRFAC_SYSCALL | KTRFAC_SYSRET | KTRFAC_NAMEI | \ KTRFAC_GENIO | KTRFAC_PSIG | KTRFAC_USER | \ - KTRFAC_STRUCT | KTRFAC_SYSCTL) + KTRFAC_STRUCT | KTRFAC_SYSCTL | KTRFAC_FAULT | \ + KTRFAC_FAULTEND) #define PROC_ABI_POINTS (KTRFAC_PROCCTOR | KTRFAC_PROCDTOR) Index: usr.bin/ktrace/ktrace.1 =================================================================== --- usr.bin/ktrace/ktrace.1 (.../mirror/FreeBSD/stable/8) (revision 221926) +++ usr.bin/ktrace/ktrace.1 (.../stable/8) (revision 221926) @@ -112,6 +112,8 @@ .Bl -tag -width flag -compact .It Cm c trace system calls +.It Cm f +trace page faults .It Cm i trace .Tn I/O @@ -131,7 +133,7 @@ requests .It Cm + trace the default set of trace points - -.Cm c , i , n , s , t , u , y +.Cm c , f , i , n , s , t , u , y .El .It Ar command Execute Index: usr.bin/ktrace/subr.c =================================================================== --- usr.bin/ktrace/subr.c (.../mirror/FreeBSD/stable/8) (revision 221926) +++ usr.bin/ktrace/subr.c (.../stable/8) (revision 221926) @@ -65,6 +65,9 @@ case 'c': facs |= KTRFAC_SYSCALL | KTRFAC_SYSRET; break; + case 'f': + facs |= KTRFAC_FAULT | KTRFAC_FAULTEND; + break; case 'n': facs |= KTRFAC_NAMEI; break; Index: sys/kern/kern_ktrace.c =================================================================== --- sys/kern/kern_ktrace.c (.../mirror/FreeBSD/stable/8) (revision 221926) +++ sys/kern/kern_ktrace.c (.../stable/8) (revision 221926) @@ -98,6 +98,8 @@ struct ktr_genio ktr_genio; struct ktr_psig ktr_psig; struct ktr_csw ktr_csw; + struct ktr_fault ktr_fault; + struct ktr_faultend ktr_faultend; } ktr_data; STAILQ_ENTRY(ktr_request) ktr_list; }; @@ -115,6 +117,8 @@ 0, /* KTR_SYSCTL */ sizeof(struct ktr_proc_ctor), /* KTR_PROCCTOR */ 0, /* KTR_PROCDTOR */ + sizeof(struct ktr_fault), /* KTR_FAULT */ + sizeof(struct ktr_faultend), /* KTR_FAULTEND */ }; static STAILQ_HEAD(, ktr_request) ktr_free; @@ -767,6 +769,38 @@ req->ktr_header.ktr_len = buflen; ktr_submitrequest(curthread, req); } + +void +ktrfault(vaddr, type) + vm_offset_t vaddr; + int type; +{ + struct ktr_request *req; + struct ktr_fault *kf; + + req = ktr_getrequest(KTR_FAULT); + if (req == NULL) + return; + kf = &req->ktr_data.ktr_fault; + kf->vaddr = vaddr; + kf->type = type; + ktr_enqueuerequest(curthread, req); +} + +void +ktrfaultend(result) + int result; +{ + struct ktr_request *req; + struct ktr_faultend *kf; + + req = ktr_getrequest(KTR_FAULTEND); + if (req == NULL) + return; + kf = &req->ktr_data.ktr_faultend; + kf->result = result; + ktr_enqueuerequest(curthread, req); +} #endif /* KTRACE */ /* Interface and common routines */ Index: sys/vm/vm_fault.c =================================================================== --- sys/vm/vm_fault.c (.../mirror/FreeBSD/stable/8) (revision 221926) +++ sys/vm/vm_fault.c (.../stable/8) (revision 221926) @@ -74,6 +74,7 @@ #include __FBSDID("$FreeBSD$"); +#include "opt_ktrace.h" #include "opt_vm.h" #include @@ -86,6 +87,9 @@ #include #include #include +#ifdef KTRACE +#include +#endif #include #include @@ -114,6 +118,9 @@ static int vm_fault_additional_pages(vm_page_t, int, int, vm_page_t *, int *); static void vm_fault_prefault(pmap_t, vm_offset_t, vm_map_entry_t); +#ifdef KTRACE +static int vm_fault_traced(vm_map_t, vm_offset_t, vm_prot_t, int); +#endif #define VM_FAULT_READ_AHEAD 8 #define VM_FAULT_READ_BEHIND 7 @@ -209,7 +216,25 @@ int vm_fault(vm_map_t map, vm_offset_t vaddr, vm_prot_t fault_type, int fault_flags) +#ifdef KTRACE { + struct thread *td; + int result; + + td = curthread; + if (map != kernel_map && KTRPOINT(td, KTR_FAULT)) + ktrfault(vaddr, fault_type); + result = vm_fault_traced(map, vaddr, fault_type, fault_flags); + if (map != kernel_map && KTRPOINT(td, KTR_FAULTEND)) + ktrfaultend(result); + return (result); +} + +int +vm_fault_traced(vm_map_t map, vm_offset_t vaddr, vm_prot_t fault_type, + int fault_flags) +#endif +{ vm_prot_t prot; int is_first_object_locked, result; boolean_t are_queues_locked, growstack, wired; Index: sys/sys/ktrace.h =================================================================== --- sys/sys/ktrace.h (.../mirror/FreeBSD/stable/8) (revision 221926) +++ sys/sys/ktrace.h (.../stable/8) (revision 221926) @@ -178,6 +182,23 @@ #define KTR_PROCDTOR 11 /* + * KTR_FAULT - page fault record + */ +#define KTR_FAULT 12 +struct ktr_fault { + vm_offset_t vaddr; + int type; +}; + +/* + * KTR_FAULTEND - end of page fault record + */ +#define KTR_FAULTEND 13 +struct ktr_faultend { + int result; +}; + +/* * KTR_DROP - If this bit is set in ktr_type, then at least one event * between the previous record and this record was dropped. */ @@ -198,6 +219,8 @@ #define KTRFAC_SYSCTL (1< Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1EB9F1065675 for ; Mon, 2 May 2011 20:02:06 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id EAD428FC0A for ; Mon, 2 May 2011 20:02:05 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 9DDFF46B4C; Mon, 2 May 2011 16:02:05 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 2AE548A027; Mon, 2 May 2011 16:02:05 -0400 (EDT) From: John Baldwin To: Kostik Belousov Date: Mon, 2 May 2011 16:02:02 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <201105021537.19507.jhb@freebsd.org> <20110502195555.GC48734@deviant.kiev.zoral.com.ua> In-Reply-To: <20110502195555.GC48734@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201105021602.02668.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 02 May 2011 16:02:05 -0400 (EDT) Cc: arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 20:02:06 -0000 On Monday, May 02, 2011 3:55:55 pm Kostik Belousov wrote: > On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote: > > One thing I have found useful is knowing when processes are in the kernel > > instead of in userland. ktrace already provides records for syscall > > entry/exit. The other major source of time spent in the kernel that I've seen > > is page fault handling. To that end, I have a patch that adds ktrace records > > to the beginning and end of VM faults. This gives a pair of records so a user > > can see how long a fault took (similar to how one can see how long a syscall > > takes now). Sample output from kdump is below: > > > > 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ| > > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0) > > 47565 echo RET mmap 34370777088/0x800a87000 > > 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE > > 47565 echo RET KERN_SUCCESS > > 47565 echo CALL munmap(0x800887000,0x179000) > > 47565 echo RET munmap 0 > > 47565 echo PFLT 0x800a00000 VM_PROT_WRITE > > 47565 echo RET KERN_SUCCESS > > > > The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.patch and > > included below. > > One immediate detail is that trap() truncates the fault address to the > page address, that arguably looses useful information. It is true that it would be nice to have the exact faulting address, though having page granularity has been sufficient for the few times I've actually used the address itself (e.g. I could figure out which page of libstdc++ a fault occurred on and narrow down from there as to which of the routines most likely was executed given what the app was doing at the time). In my case knowing how much time was spent handling a page fault has been useful. Would we have to push this logic out of vm_fault and into every trap() routine to get the original address? That would make the patch quite a bit bigger (touching N MD files vs 1 MI file). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Mon May 2 20:16:06 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 906AD106566B; Mon, 2 May 2011 20:16:06 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 2EC9B8FC1F; Mon, 2 May 2011 20:16:05 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p42KG2cg048512 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 2 May 2011 23:16:02 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p42KG20p089135; Mon, 2 May 2011 23:16:02 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p42KG2fn089134; Mon, 2 May 2011 23:16:02 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 2 May 2011 23:16:02 +0300 From: Kostik Belousov To: John Baldwin Message-ID: <20110502201602.GD48734@deviant.kiev.zoral.com.ua> References: <201105021537.19507.jhb@freebsd.org> <20110502195555.GC48734@deviant.kiev.zoral.com.ua> <201105021602.02668.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="+NWYRNQpSl62EBce" Content-Disposition: inline In-Reply-To: <201105021602.02668.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_05, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 20:16:06 -0000 --+NWYRNQpSl62EBce Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, May 02, 2011 at 04:02:02PM -0400, John Baldwin wrote: > On Monday, May 02, 2011 3:55:55 pm Kostik Belousov wrote: > > On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote: > > > One thing I have found useful is knowing when processes are in the ke= rnel=20 > > > instead of in userland. ktrace already provides records for syscall= =20 > > > entry/exit. The other major source of time spent in the kernel that = I've seen=20 > > > is page fault handling. To that end, I have a patch that adds ktrace= records=20 > > > to the beginning and end of VM faults. This gives a pair of records = so a user=20 > > > can see how long a fault took (similar to how one can see how long a = syscall=20 > > > takes now). Sample output from kdump is below: > > >=20 > > > 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ| > > > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0) > > > 47565 echo RET mmap 34370777088/0x800a87000 > > > 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE > > > 47565 echo RET KERN_SUCCESS > > > 47565 echo CALL munmap(0x800887000,0x179000) > > > 47565 echo RET munmap 0 > > > 47565 echo PFLT 0x800a00000 VM_PROT_WRITE > > > 47565 echo RET KERN_SUCCESS > > >=20 > > > The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.p= atch and=20 > > > included below. > >=20 > > One immediate detail is that trap() truncates the fault address to the > > page address, that arguably looses useful information. >=20 > It is true that it would be nice to have the exact faulting address, thou= gh > having page granularity has been sufficient for the few times I've actual= ly > used the address itself (e.g. I could figure out which page of libstdc++ a > fault occurred on and narrow down from there as to which of the routines = most > likely was executed given what the app was doing at the time). In my case > knowing how much time was spent handling a page fault has been useful. >=20 > Would we have to push this logic out of vm_fault and into every trap() ro= utine > to get the original address? That would make the patch quite a bit bigger > (touching N MD files vs 1 MI file). Or do the reverse, making vm_fault() do trunc_page() [if doing this change at all]. Also, I want to note another small detail, that is relevant if you plan to MFC the change. In HEAD, vm_fault() is only called from trap()s, and in-kernel page reads where substituted by vm_fault_hold() calls. This is not true for stable. Hm, was it indended to report faults from uiomove etc ? I had this change (that relates to OOM handling) for long time, and realized that it might be useful there. diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c index 4e5f8b8..0d1e68f 100644 --- a/sys/amd64/amd64/trap.c +++ b/sys/amd64/amd64/trap.c @@ -697,7 +697,8 @@ trap_pfault(frame, usermode) PROC_UNLOCK(p); =20 /* Fault in the user page: */ - rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL); + rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL | + (usermode ? VM_FAULT_USERMODE : 0)); =20 PROC_LOCK(p); --p->p_lock; diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c index 5a8016c..236f295 100644 --- a/sys/i386/i386/trap.c +++ b/sys/i386/i386/trap.c @@ -856,7 +856,8 @@ trap_pfault(frame, usermode, eva) PROC_UNLOCK(p); =20 /* Fault in the user page: */ - rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL); + rv =3D vm_fault(map, va, ftype, VM_FAULT_NORMAL | + (usermode ? VM_FAULT_USERMODE : 0)); =20 PROC_LOCK(p); --p->p_lock; diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c index d417a84..c1e87ae 100644 --- a/sys/vm/vm_fault.c +++ b/sys/vm/vm_fault.c @@ -408,6 +408,11 @@ RetryFault:; */ fs.m =3D NULL; if (!vm_page_count_severe() || P_KILLED(curproc)) { + if (P_KILLED(curproc) && (fault_flags & + VM_FAULT_USERMODE) !=3D 0) { + unlock_and_deallocate(&fs); + return (KERN_RESOURCE_SHORTAGE); + } #if VM_NRESERVLEVEL > 0 if ((fs.object->flags & OBJ_COLORED) =3D=3D 0) { fs.object->flags |=3D OBJ_COLORED; diff --git a/sys/vm/vm_map.h b/sys/vm/vm_map.h index 5311e02..7444549 100644 --- a/sys/vm/vm_map.h +++ b/sys/vm/vm_map.h @@ -326,6 +326,7 @@ long vmspace_wired_count(struct vmspace *vmspace); #define VM_FAULT_NORMAL 0 /* Nothing special */ #define VM_FAULT_CHANGE_WIRING 1 /* Change the wiring as appropriate */ #define VM_FAULT_DIRTY 2 /* Dirty the page; use w/VM_PROT_COPY */ +#define VM_FAULT_USERMODE 4 /* Fault initiated by usermode */ =20 /* * The following "find_space" options are supported by vm_map_find() --+NWYRNQpSl62EBce Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk2/EQIACgkQC3+MBN1Mb4hYAACg5gBeNduqrIahvcQ3Y5NxDMnj wVwAoPXpUBY1Lw5AKPEzOD763BLLt80+ =+AU3 -----END PGP SIGNATURE----- --+NWYRNQpSl62EBce-- From owner-freebsd-arch@FreeBSD.ORG Mon May 2 20:35:08 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9013C106566C for ; Mon, 2 May 2011 20:35:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 0CD2E8FC1B for ; Mon, 2 May 2011 20:35:07 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p42JttK0047337 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 2 May 2011 22:55:55 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p42Jttfl057238; Mon, 2 May 2011 22:55:55 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p42Jttil057237; Mon, 2 May 2011 22:55:55 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 2 May 2011 22:55:55 +0300 From: Kostik Belousov To: John Baldwin Message-ID: <20110502195555.GC48734@deviant.kiev.zoral.com.ua> References: <201105021537.19507.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Qcz4xfO8nvR/PV7q" Content-Disposition: inline In-Reply-To: <201105021537.19507.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 20:35:08 -0000 --Qcz4xfO8nvR/PV7q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote: > One thing I have found useful is knowing when processes are in the kernel= =20 > instead of in userland. ktrace already provides records for syscall=20 > entry/exit. The other major source of time spent in the kernel that I've= seen=20 > is page fault handling. To that end, I have a patch that adds ktrace rec= ords=20 > to the beginning and end of VM faults. This gives a pair of records so a= user=20 > can see how long a fault took (similar to how one can see how long a sysc= all=20 > takes now). Sample output from kdump is below: >=20 > 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ| > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0) > 47565 echo RET mmap 34370777088/0x800a87000 > 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE > 47565 echo RET KERN_SUCCESS > 47565 echo CALL munmap(0x800887000,0x179000) > 47565 echo RET munmap 0 > 47565 echo PFLT 0x800a00000 VM_PROT_WRITE > 47565 echo RET KERN_SUCCESS >=20 > The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.patch= and=20 > included below. One immediate detail is that trap() truncates the fault address to the page address, that arguably looses useful information. --Qcz4xfO8nvR/PV7q Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk2/DEsACgkQC3+MBN1Mb4jLCQCeOFcMzhtRL8/aHutAqSaKl2wz 7XUAn3O73HqERdXiKwFLqqC+yHJLl2// =Dcqf -----END PGP SIGNATURE----- --Qcz4xfO8nvR/PV7q-- From owner-freebsd-arch@FreeBSD.ORG Mon May 2 21:09:29 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8FE411065674 for ; Mon, 2 May 2011 21:09:29 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 666568FC12 for ; Mon, 2 May 2011 21:09:29 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 0307446B0D; Mon, 2 May 2011 17:09:29 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 8D0F88A01B; Mon, 2 May 2011 17:09:28 -0400 (EDT) From: John Baldwin To: Kostik Belousov Date: Mon, 2 May 2011 16:25:25 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <201105021537.19507.jhb@freebsd.org> <201105021602.02668.jhb@freebsd.org> <20110502201602.GD48734@deviant.kiev.zoral.com.ua> In-Reply-To: <20110502201602.GD48734@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201105021625.25473.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 02 May 2011 17:09:28 -0400 (EDT) Cc: arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 21:09:29 -0000 On Monday, May 02, 2011 4:16:02 pm Kostik Belousov wrote: > On Mon, May 02, 2011 at 04:02:02PM -0400, John Baldwin wrote: > > On Monday, May 02, 2011 3:55:55 pm Kostik Belousov wrote: > > > On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote: > > > > One thing I have found useful is knowing when processes are in the kernel > > > > instead of in userland. ktrace already provides records for syscall > > > > entry/exit. The other major source of time spent in the kernel that I've seen > > > > is page fault handling. To that end, I have a patch that adds ktrace records > > > > to the beginning and end of VM faults. This gives a pair of records so a user > > > > can see how long a fault took (similar to how one can see how long a syscall > > > > takes now). Sample output from kdump is below: > > > > > > > > 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ| > > > > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0) > > > > 47565 echo RET mmap 34370777088/0x800a87000 > > > > 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE > > > > 47565 echo RET KERN_SUCCESS > > > > 47565 echo CALL munmap(0x800887000,0x179000) > > > > 47565 echo RET munmap 0 > > > > 47565 echo PFLT 0x800a00000 VM_PROT_WRITE > > > > 47565 echo RET KERN_SUCCESS > > > > > > > > The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.patch and > > > > included below. > > > > > > One immediate detail is that trap() truncates the fault address to the > > > page address, that arguably looses useful information. > > > > It is true that it would be nice to have the exact faulting address, though > > having page granularity has been sufficient for the few times I've actually > > used the address itself (e.g. I could figure out which page of libstdc++ a > > fault occurred on and narrow down from there as to which of the routines most > > likely was executed given what the app was doing at the time). In my case > > knowing how much time was spent handling a page fault has been useful. > > > > Would we have to push this logic out of vm_fault and into every trap() routine > > to get the original address? That would make the patch quite a bit bigger > > (touching N MD files vs 1 MI file). > > Or do the reverse, making vm_fault() do trunc_page() [if doing this > change at all]. Ok. That sounds sensible. > Also, I want to note another small detail, that is relevant if you plan > to MFC the change. In HEAD, vm_fault() is only called from trap()s, and > in-kernel page reads where substituted by vm_fault_hold() calls. This is > not true for stable. > > Hm, was it indended to report faults from uiomove etc ? > I had this change (that relates to OOM handling) for long time, and > realized that it might be useful there. I don't mind if it reports those faults as they will be bracketed by a system call entry/exit. However, I primarily care about user-initiated faults. > diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c > index 4e5f8b8..0d1e68f 100644 > --- a/sys/amd64/amd64/trap.c > +++ b/sys/amd64/amd64/trap.c > @@ -697,7 +697,8 @@ trap_pfault(frame, usermode) > PROC_UNLOCK(p); > > /* Fault in the user page: */ > - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL); > + rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL | > + (usermode ? VM_FAULT_USERMODE : 0)); > > PROC_LOCK(p); > --p->p_lock; > diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c > index 5a8016c..236f295 100644 > --- a/sys/i386/i386/trap.c > +++ b/sys/i386/i386/trap.c > @@ -856,7 +856,8 @@ trap_pfault(frame, usermode, eva) > PROC_UNLOCK(p); > > /* Fault in the user page: */ > - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL); > + rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL | > + (usermode ? VM_FAULT_USERMODE : 0)); > > PROC_LOCK(p); > --p->p_lock; > diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c > index d417a84..c1e87ae 100644 > --- a/sys/vm/vm_fault.c > +++ b/sys/vm/vm_fault.c > @@ -408,6 +408,11 @@ RetryFault:; > */ > fs.m = NULL; > if (!vm_page_count_severe() || P_KILLED(curproc)) { > + if (P_KILLED(curproc) && (fault_flags & > + VM_FAULT_USERMODE) != 0) { > + unlock_and_deallocate(&fs); > + return (KERN_RESOURCE_SHORTAGE); > + } > #if VM_NRESERVLEVEL > 0 > if ((fs.object->flags & OBJ_COLORED) == 0) { > fs.object->flags |= OBJ_COLORED; > diff --git a/sys/vm/vm_map.h b/sys/vm/vm_map.h > index 5311e02..7444549 100644 > --- a/sys/vm/vm_map.h > +++ b/sys/vm/vm_map.h > @@ -326,6 +326,7 @@ long vmspace_wired_count(struct vmspace *vmspace); > #define VM_FAULT_NORMAL 0 /* Nothing special */ > #define VM_FAULT_CHANGE_WIRING 1 /* Change the wiring as appropriate */ > #define VM_FAULT_DIRTY 2 /* Dirty the page; use w/VM_PROT_COPY */ > +#define VM_FAULT_USERMODE 4 /* Fault initiated by usermode */ > > /* > * The following "find_space" options are supported by vm_map_find() > -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue May 3 11:39:01 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1DEF106566C; Tue, 3 May 2011 11:39:01 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id A336B8FC17; Tue, 3 May 2011 11:39:01 +0000 (UTC) Received: from outgoing.leidinger.net (p5B155A42.dip.t-dialin.net [91.21.90.66]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id E4525844017; Tue, 3 May 2011 13:38:47 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id 192DC11C4; Tue, 3 May 2011 13:38:45 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p43BciNr003835; Tue, 3 May 2011 13:38:44 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Tue, 03 May 2011 13:38:44 +0200 Message-ID: <20110503133844.184523llr0156o9w@webmail.leidinger.net> Date: Tue, 03 May 2011 13:38:44 +0200 From: Alexander Leidinger To: John Baldwin References: <201105021537.19507.jhb@freebsd.org> <20110502195555.GC48734@deviant.kiev.zoral.com.ua> <201105021602.02668.jhb@freebsd.org> In-Reply-To: <201105021602.02668.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: E4525844017.AECFB X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=0, required 6, autolearn=disabled) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1305027528.36203@JBEvHQeH/McqmNqpahOeYg X-EBL-Spam-Status: No Cc: Kostik Belousov , arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 11:39:02 -0000 Quoting John Baldwin (from Mon, 2 May 2011 16:02:02 -0400): > It is true that it would be nice to have the exact faulting address, though > having page granularity has been sufficient for the few times I've actually > used the address itself (e.g. I could figure out which page of libstdc++ a > fault occurred on and narrow down from there as to which of the routines most > likely was executed given what the app was doing at the time). In my case > knowing how much time was spent handling a page fault has been useful. > > Would we have to push this logic out of vm_fault and into every > trap() routine > to get the original address? That would make the patch quite a bit bigger > (touching N MD files vs 1 MI file). dtrace is not a solution here (in general, not to the exact-address problem)? Bye, Alexander. -- I am looking for a honest man. -- Diogenes the Cynic http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-arch@FreeBSD.ORG Tue May 3 12:21:12 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 135F710657A4 for ; Tue, 3 May 2011 12:21:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id D98378FC13 for ; Tue, 3 May 2011 12:21:11 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 92DF246B55; Tue, 3 May 2011 08:21:11 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 318548A027; Tue, 3 May 2011 08:21:11 -0400 (EDT) From: John Baldwin To: Alexander Leidinger Date: Tue, 3 May 2011 08:04:52 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <201105021537.19507.jhb@freebsd.org> <201105021602.02668.jhb@freebsd.org> <20110503133844.184523llr0156o9w@webmail.leidinger.net> In-Reply-To: <20110503133844.184523llr0156o9w@webmail.leidinger.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201105030804.52840.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 03 May 2011 08:21:11 -0400 (EDT) Cc: Kostik Belousov , arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 12:21:12 -0000 On Tuesday, May 03, 2011 7:38:44 am Alexander Leidinger wrote: > Quoting John Baldwin (from Mon, 2 May 2011 16:02:02 -0400): > > > It is true that it would be nice to have the exact faulting address, though > > having page granularity has been sufficient for the few times I've actually > > used the address itself (e.g. I could figure out which page of libstdc++ a > > fault occurred on and narrow down from there as to which of the routines most > > likely was executed given what the app was doing at the time). In my case > > knowing how much time was spent handling a page fault has been useful. > > > > Would we have to push this logic out of vm_fault and into every > > trap() routine > > to get the original address? That would make the patch quite a bit bigger > > (touching N MD files vs 1 MI file). > > dtrace is not a solution here (in general, not to the exact-address problem)? It probably is, but many folks are still quite used to ktrace. At some point I may sit down and spend more time with DTrace, but for now I have other fish to fry. I can just keep the page fault tracing patch private if that is what folks prefer. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue May 3 12:36:10 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCCCC106566B; Tue, 3 May 2011 12:36:10 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 4BDC08FC15; Tue, 3 May 2011 12:36:09 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p43Ca22I057201 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 May 2011 15:36:02 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p43Ca17J080997; Tue, 3 May 2011 15:36:01 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p43Ca1Vd080996; Tue, 3 May 2011 15:36:01 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 3 May 2011 15:36:01 +0300 From: Kostik Belousov To: John Baldwin Message-ID: <20110503123601.GG48734@deviant.kiev.zoral.com.ua> References: <201105021537.19507.jhb@freebsd.org> <201105021602.02668.jhb@freebsd.org> <20110503133844.184523llr0156o9w@webmail.leidinger.net> <201105030804.52840.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="/oilIk/RGk+ZuCFG" Content-Disposition: inline In-Reply-To: <201105030804.52840.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Alexander Leidinger , arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 12:36:10 -0000 --/oilIk/RGk+ZuCFG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, May 03, 2011 at 08:04:52AM -0400, John Baldwin wrote: > It probably is, but many folks are still quite used to ktrace. At some p= oint > I may sit down and spend more time with DTrace, but for now I have other = fish > to fry. I can just keep the page fault tracing patch private if that is = what > folks prefer. >=20 My responses definitely not contained any objections to the commit of the patch, I only nit-picked. --/oilIk/RGk+ZuCFG Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk2/9rEACgkQC3+MBN1Mb4iLRwCglvG3Xmc7QztY8EyyHDIKqe8o Op4AnRTiTdnYZHjFDUm4OMwwPS7LR4Pu =CpLi -----END PGP SIGNATURE----- --/oilIk/RGk+ZuCFG-- From owner-freebsd-arch@FreeBSD.ORG Tue May 3 13:59:45 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1BDE11065672 for ; Tue, 3 May 2011 13:59:45 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id D02BF8FC1F for ; Tue, 3 May 2011 13:59:44 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 76A6B46B4C; Tue, 3 May 2011 09:59:44 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 140498A027; Tue, 3 May 2011 09:59:44 -0400 (EDT) From: John Baldwin To: Kostik Belousov Date: Tue, 3 May 2011 09:59:42 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <201105021537.19507.jhb@freebsd.org> <201105021602.02668.jhb@freebsd.org> <20110502201602.GD48734@deviant.kiev.zoral.com.ua> In-Reply-To: <20110502201602.GD48734@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201105030959.42948.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 03 May 2011 09:59:44 -0400 (EDT) Cc: arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 13:59:45 -0000 On Monday, May 02, 2011 4:16:02 pm Kostik Belousov wrote: > On Mon, May 02, 2011 at 04:02:02PM -0400, John Baldwin wrote: > > On Monday, May 02, 2011 3:55:55 pm Kostik Belousov wrote: > > > On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote: > > > > One thing I have found useful is knowing when processes are in the kernel > > > > instead of in userland. ktrace already provides records for syscall > > > > entry/exit. The other major source of time spent in the kernel that I've seen > > > > is page fault handling. To that end, I have a patch that adds ktrace records > > > > to the beginning and end of VM faults. This gives a pair of records so a user > > > > can see how long a fault took (similar to how one can see how long a syscall > > > > takes now). Sample output from kdump is below: > > > > > > > > 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ| > > > > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0) > > > > 47565 echo RET mmap 34370777088/0x800a87000 > > > > 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE > > > > 47565 echo RET KERN_SUCCESS > > > > 47565 echo CALL munmap(0x800887000,0x179000) > > > > 47565 echo RET munmap 0 > > > > 47565 echo PFLT 0x800a00000 VM_PROT_WRITE > > > > 47565 echo RET KERN_SUCCESS > > > > > > > > The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.patch and > > > > included below. > > > > > > One immediate detail is that trap() truncates the fault address to the > > > page address, that arguably looses useful information. > > > > It is true that it would be nice to have the exact faulting address, though > > having page granularity has been sufficient for the few times I've actually > > used the address itself (e.g. I could figure out which page of libstdc++ a > > fault occurred on and narrow down from there as to which of the routines most > > likely was executed given what the app was doing at the time). In my case > > knowing how much time was spent handling a page fault has been useful. > > > > Would we have to push this logic out of vm_fault and into every trap() routine > > to get the original address? That would make the patch quite a bit bigger > > (touching N MD files vs 1 MI file). > > Or do the reverse, making vm_fault() do trunc_page() [if doing this > change at all]. Hmm, so I have a new version of the patch that is 1) against 9 rather than 8, and 2) pushes the trunc_page() down into vm_fault(). One caveat here is that faults on sparc64 and sun4v only have the page address, never a sub-page address. I haven't tested this, but it gives you an idea of what such a change would look like if we want to do it: --- //depot/user/jhb/ktrace/amd64/amd64/trap.c 2011-03-07 20:37:47.000000000 0000 +++ /home/jhb/work/p4/ktrace/amd64/amd64/trap.c 2011-03-07 20:37:47.000000000 0000 @@ -641,16 +641,14 @@ struct trapframe *frame; int usermode; { - vm_offset_t va; struct vmspace *vm = NULL; vm_map_t map; int rv = 0; vm_prot_t ftype; struct thread *td = curthread; struct proc *p = td->td_proc; - vm_offset_t eva = frame->tf_addr; + vm_offset_t va = frame->tf_addr; - va = trunc_page(eva); if (va >= VM_MIN_KERNEL_ADDRESS) { /* * Don't allow user-mode faults in kernel address space. @@ -716,7 +714,7 @@ frame->tf_rip = (long)PCPU_GET(curpcb)->pcb_onfault; return (0); } - trap_fatal(frame, eva); + trap_fatal(frame, va); return (-1); } --- //depot/user/jhb/ktrace/arm/arm/trap.c 2010-06-01 21:27:15.000000000 0000 +++ /home/jhb/work/p4/ktrace/arm/arm/trap.c 2010-06-01 21:27:15.000000000 0000 @@ -343,7 +343,7 @@ break; } - va = trunc_page((vm_offset_t)far); + va = (vm_offset_t)far; /* * It is only a kernel address space fault iff: @@ -412,8 +412,8 @@ #ifdef DEBUG last_fault_code = fsr; #endif - if (pmap_fault_fixup(vmspace_pmap(td->td_proc->p_vmspace), va, ftype, - user)) { + if (pmap_fault_fixup(vmspace_pmap(td->td_proc->p_vmspace), + trunc_page(va), ftype, user)) { goto out; } @@ -704,7 +704,7 @@ struct thread *td; struct proc * p; struct vm_map *map; - vm_offset_t fault_pc, va; + vm_offset_t fault_pc; int error = 0; struct ksig ksig; @@ -766,7 +766,6 @@ } map = &td->td_proc->p_vmspace->vm_map; - va = trunc_page(fault_pc); /* * See if the pmap can handle this fault on its own... @@ -774,7 +773,7 @@ #ifdef DEBUG last_fault_code = -1; #endif - if (pmap_fault_fixup(map->pmap, va, VM_PROT_READ, 1)) + if (pmap_fault_fixup(map->pmap, trunc_page(fault_pc), VM_PROT_READ, 1)) goto out; if (map != kernel_map) { @@ -783,7 +782,7 @@ PROC_UNLOCK(p); } - error = vm_fault(map, va, VM_PROT_READ | VM_PROT_EXECUTE, + error = vm_fault(map, fault_pc, VM_PROT_READ | VM_PROT_EXECUTE, VM_FAULT_NORMAL); if (map != kernel_map) { PROC_LOCK(p); --- //depot/user/jhb/ktrace/i386/i386/trap.c 2011-03-07 20:37:47.000000000 0000 +++ /home/jhb/work/p4/ktrace/i386/i386/trap.c 2011-03-07 20:37:47.000000000 0000 @@ -788,9 +788,8 @@ trap_pfault(frame, usermode, eva) struct trapframe *frame; int usermode; - vm_offset_t eva; + vm_offset_t va; { - vm_offset_t va; struct vmspace *vm = NULL; vm_map_t map; int rv = 0; @@ -798,7 +797,6 @@ struct thread *td = curthread; struct proc *p = td->td_proc; - va = trunc_page(eva); if (va >= KERNBASE) { /* * Don't allow user-mode faults in kernel address space. @@ -809,7 +807,7 @@ * fault. */ #if defined(I586_CPU) && !defined(NO_F00F_HACK) - if ((eva == (unsigned int)&idt[6]) && has_f00f_bug) + if ((va == (unsigned int)&idt[6]) && has_f00f_bug) return -2; #endif if (usermode) @@ -875,7 +873,7 @@ frame->tf_eip = (int)PCPU_GET(curpcb)->pcb_onfault; return (0); } - trap_fatal(frame, eva); + trap_fatal(frame, va); return (-1); } --- //depot/user/jhb/ktrace/ia64/ia64/trap.c 2010-09-22 15:07:20.000000000 0000 +++ /home/jhb/work/p4/ktrace/ia64/ia64/trap.c 2010-09-22 15:07:20.000000000 0000 @@ -530,7 +530,7 @@ int rv; rv = 0; - va = trunc_page(tf->tf_special.ifa); + va = tf->tf_special.ifa; if (va >= VM_MAX_ADDRESS) { /* @@ -592,6 +592,7 @@ } trap_panic(vector, tf); } + /* XXX: ksi_addr should be 'va', 'ucode' should be fault type. */ ucode = va; sig = (rv == KERN_PROTECTION_FAILURE) ? SIGBUS : SIGSEGV; break; --- //depot/user/jhb/ktrace/kern/kern_ktrace.c 2011-03-07 20:37:47.000000000 0000 +++ /home/jhb/work/p4/ktrace/kern/kern_ktrace.c 2011-03-07 20:37:47.000000000 0000 @@ -100,6 +100,8 @@ struct ktr_genio ktr_genio; struct ktr_psig ktr_psig; struct ktr_csw ktr_csw; + struct ktr_fault ktr_fault; + struct ktr_faultend ktr_faultend; } ktr_data; STAILQ_ENTRY(ktr_request) ktr_list; }; @@ -117,6 +119,8 @@ 0, /* KTR_SYSCTL */ sizeof(struct ktr_proc_ctor), /* KTR_PROCCTOR */ 0, /* KTR_PROCDTOR */ + sizeof(struct ktr_fault), /* KTR_FAULT */ + sizeof(struct ktr_faultend), /* KTR_FAULTEND */ }; static STAILQ_HEAD(, ktr_request) ktr_free; @@ -768,6 +772,38 @@ req->ktr_header.ktr_len = buflen; ktr_submitrequest(curthread, req); } + +void +ktrfault(vaddr, type) + vm_offset_t vaddr; + int type; +{ + struct ktr_request *req; + struct ktr_fault *kf; + + req = ktr_getrequest(KTR_FAULT); + if (req == NULL) + return; + kf = &req->ktr_data.ktr_fault; + kf->vaddr = vaddr; + kf->type = type; + ktr_enqueuerequest(curthread, req); +} + +void +ktrfaultend(result) + int result; +{ + struct ktr_request *req; + struct ktr_faultend *kf; + + req = ktr_getrequest(KTR_FAULTEND); + if (req == NULL) + return; + kf = &req->ktr_data.ktr_faultend; + kf->result = result; + ktr_enqueuerequest(curthread, req); +} #endif /* KTRACE */ /* Interface and common routines */ --- //depot/user/jhb/ktrace/mips/mips/trap.c 2011-01-13 18:03:58.000000000 0000 +++ /home/jhb/work/p4/ktrace/mips/mips/trap.c 2011-01-13 18:03:58.000000000 0000 @@ -395,12 +395,11 @@ ftype = (type == T_TLB_ST_MISS) ? VM_PROT_WRITE : VM_PROT_READ; /* check for kernel address */ if (KERNLAND(trapframe->badvaddr)) { - vm_offset_t va; int rv; kernel_fault: - va = trunc_page((vm_offset_t)trapframe->badvaddr); - rv = vm_fault(kernel_map, va, ftype, VM_FAULT_NORMAL); + rv = vm_fault(kernel_map, trapframe->badvaddr, ftype, + VM_FAULT_NORMAL); if (rv == KERN_SUCCESS) return (trapframe->pc); if (td->td_pcb->pcb_onfault != NULL) { @@ -436,14 +435,12 @@ ftype = VM_PROT_WRITE; dofault: { - vm_offset_t va; struct vmspace *vm; vm_map_t map; int rv = 0; vm = p->p_vmspace; map = &vm->vm_map; - va = trunc_page((vm_offset_t)trapframe->badvaddr); if (KERNLAND(trapframe->badvaddr)) { /* * Don't allow user-mode faults in kernel @@ -460,14 +457,15 @@ ++p->p_lock; PROC_UNLOCK(p); - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL); + rv = vm_fault(map, trapframe->badvaddr, ftype, + VM_FAULT_NORMAL); PROC_LOCK(p); --p->p_lock; PROC_UNLOCK(p); #ifdef VMFAULT_TRACE - printf("vm_fault(%p (pmap %p), %p (%p), %x, %d) -> %x at pc %p\n", - map, &vm->vm_pmap, (void *)va, (void *)(intptr_t)trapframe->badvaddr, + printf("vm_fault(%p (pmap %p), %p, %x, %d) -> %x at pc %p\n", + map, &vm->vm_pmap, (void *)(intptr_t)trapframe->badvaddr, ftype, VM_FAULT_NORMAL, rv, (void *)(intptr_t)trapframe->pc); #endif @@ -488,6 +486,7 @@ } ucode = ftype; i = ((rv == KERN_PROTECTION_FAILURE) ? SIGBUS : SIGSEGV); + /* XXX: Should be badvaddr */ addr = trapframe->pc; msg = "BAD_PAGE_FAULT"; --- //depot/user/jhb/ktrace/powerpc/aim/trap.c 2011-03-07 20:37:47.000000000 0000 +++ /home/jhb/work/p4/ktrace/powerpc/aim/trap.c 2011-03-07 20:37:47.000000000 0000 @@ -511,7 +511,7 @@ static int trap_pfault(struct trapframe *frame, int user) { - vm_offset_t eva, va; + vm_offset_t eva; struct thread *td; struct proc *p; vm_map_t map; @@ -550,7 +550,6 @@ map = kernel_map; } } - va = trunc_page(eva); if (map != kernel_map) { /* @@ -562,7 +561,7 @@ PROC_UNLOCK(p); /* Fault in the user page: */ - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL); + rv = vm_fault(map, eva, ftype, VM_FAULT_NORMAL); PROC_LOCK(p); --p->p_lock; @@ -572,7 +571,7 @@ * Don't have to worry about process locking or stacks in the * kernel. */ - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL); + rv = vm_fault(map, eva, ftype, VM_FAULT_NORMAL); } if (rv == KERN_SUCCESS) --- //depot/user/jhb/ktrace/powerpc/booke/trap.c 2010-09-22 15:07:20.000000000 0000 +++ /home/jhb/work/p4/ktrace/powerpc/booke/trap.c 2010-09-22 15:07:20.000000000 0000 @@ -392,7 +392,7 @@ static int trap_pfault(struct trapframe *frame, int user) { - vm_offset_t eva, va; + vm_offset_t eva; struct thread *td; struct proc *p; vm_map_t map; @@ -429,7 +429,6 @@ map = kernel_map; } } - va = trunc_page(eva); if (map != kernel_map) { /* @@ -441,7 +440,7 @@ PROC_UNLOCK(p); /* Fault in the user page: */ - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL); + rv = vm_fault(map, eva, ftype, VM_FAULT_NORMAL); PROC_LOCK(p); --p->p_lock; @@ -451,7 +450,7 @@ * Don't have to worry about process locking or stacks in the * kernel. */ - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL); + rv = vm_fault(map, eva, ftype, VM_FAULT_NORMAL); } if (rv == KERN_SUCCESS) --- //depot/user/jhb/ktrace/sys/ktrace.h 2011-03-07 20:37:47.000000000 0000 +++ /home/jhb/work/p4/ktrace/sys/ktrace.h 2011-03-07 20:37:47.000000000 0000 @@ -178,6 +178,23 @@ #define KTR_PROCDTOR 11 /* + * KTR_FAULT - page fault record + */ +#define KTR_FAULT 12 +struct ktr_fault { + vm_offset_t vaddr; + int type; +}; + +/* + * KTR_FAULTEND - end of page fault record + */ +#define KTR_FAULTEND 13 +struct ktr_faultend { + int result; +}; + +/* * KTR_DROP - If this bit is set in ktr_type, then at least one event * between the previous record and this record was dropped. */ @@ -198,6 +215,8 @@ #define KTRFAC_SYSCTL (1< __FBSDID("$FreeBSD: src/sys/vm/vm_fault.c,v 1.288 2011/01/15 19:21:28 alc Exp $"); +#include "opt_ktrace.h" #include "opt_vm.h" #include @@ -86,6 +87,9 @@ #include #include #include +#ifdef KTRACE +#include +#endif #include #include @@ -208,8 +212,23 @@ vm_fault(vm_map_t map, vm_offset_t vaddr, vm_prot_t fault_type, int fault_flags) { +#ifdef KTRACE + struct thread *td; +#endif + int result; - return (vm_fault_hold(map, vaddr, fault_type, fault_flags, NULL)); +#ifdef KTRACE + td = curthread; + if (map != kernel_map && KTRPOINT(td, KTR_FAULT)) + ktrfault(vaddr, fault_type); +#endif + result = vm_fault_hold(map, trunc_page(vaddr), fault_type, fault_flags, + NULL); +#ifdef KTRACE + if (map != kernel_map && KTRPOINT(td, KTR_FAULTEND)) + ktrfaultend(result); +#endif + return (result); } int -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue May 3 14:42:04 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BA9E7106566B; Tue, 3 May 2011 14:42:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 2C8938FC16; Tue, 3 May 2011 14:42:03 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p43Eg063070232 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 May 2011 17:42:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p43Eg0lr081639; Tue, 3 May 2011 17:42:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p43Eg0mb081638; Tue, 3 May 2011 17:42:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 3 May 2011 17:42:00 +0300 From: Kostik Belousov To: John Baldwin Message-ID: <20110503144200.GH48734@deviant.kiev.zoral.com.ua> References: <201105021537.19507.jhb@freebsd.org> <201105021602.02668.jhb@freebsd.org> <20110502201602.GD48734@deviant.kiev.zoral.com.ua> <201105030959.42948.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Sup2ovFryIwBP1d4" Content-Disposition: inline In-Reply-To: <201105030959.42948.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 14:42:04 -0000 --Sup2ovFryIwBP1d4 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, May 03, 2011 at 09:59:42AM -0400, John Baldwin wrote: > On Monday, May 02, 2011 4:16:02 pm Kostik Belousov wrote: > > Or do the reverse, making vm_fault() do trunc_page() [if doing this > > change at all]. >=20 > Hmm, so I have a new version of the patch that is 1) against 9 rather > than 8, and 2) pushes the trunc_page() down into vm_fault(). One > caveat here is that faults on sparc64 and sun4v only have the page > address, never a sub-page address. I haven't tested this, but it gives > you an idea of what such a change would look like if we want to do it: It looks fine to me. I did similar change once for x86 only. --Sup2ovFryIwBP1d4 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk3AFDcACgkQC3+MBN1Mb4iZ0QCgi82d/CxoXvo3ntizf27c9PDj PQwAoM4+lMxHjv8SGDXTN4nlO5K92Ald =rkwn -----END PGP SIGNATURE----- --Sup2ovFryIwBP1d4-- From owner-freebsd-arch@FreeBSD.ORG Wed May 4 20:39:23 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C9B31065737 for ; Wed, 4 May 2011 20:39:22 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mx1.sbone.de (mx1.sbone.de [IPv6:2a01:4f8:130:3ffc::401:25]) by mx1.freebsd.org (Postfix) with ESMTP id 1EFE38FC13 for ; Wed, 4 May 2011 20:39:22 +0000 (UTC) Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:31::2013:587]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.sbone.de (Postfix) with ESMTPS id 1997725D388B; Wed, 4 May 2011 20:39:21 +0000 (UTC) Received: from content-filter.sbone.de (content-filter.sbone.de [IPv6:fde9:577b:c1a9:31::2013:2742]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPS id 5D6D8159EABF; Wed, 4 May 2011 20:39:20 +0000 (UTC) X-Virus-Scanned: amavisd-new at sbone.de Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:31::2013:587]) by content-filter.sbone.de (content-filter.sbone.de [fde9:577b:c1a9:31::2013:2742]) (amavisd-new, port 10024) with ESMTP id ZwFKcuPQb+rv; Wed, 4 May 2011 20:39:19 +0000 (UTC) Received: from orange-en1.sbone.de (orange-en1.sbone.de [IPv6:fde9:577b:c1a9:31:cabc:c8ff:fecf:e8e3]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPSA id D08D5159EA75; Wed, 4 May 2011 20:39:18 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: "Bjoern A. Zeeb" In-Reply-To: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> Date: Wed, 4 May 2011 20:39:17 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <73F7C4EA-6627-4428-8130-D77443722E15@lists.zabbadoz.net> References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> To: George Neville-Neil X-Mailer: Apple Mail (2.1084) Cc: arch@freebsd.org Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 20:39:23 -0000 On Mar 19, 2011, at 6:37 AM, George Neville-Neil wrote: Hey, > I believe it's time for us to upgrade our sysctl values for TCP = sockets so that > they are more in line with the modern world. At the moment we have = these limits on > our buffering: >=20 > kern.ipc.maxsockbuf: 262144 > net.inet.tcp.recvbuf_max: 262144 > net.inet.tcp.sendbuf_max: 262144 >=20 > I believe it's time to up these values to something that's in line = with higher speed > local networks, such as 10G. Perhaps it's time to move these to 2MB = instead of 256K. >=20 > Thoughts? Yes, did you ever commit a change? I would even go further up to 4M. 300ms x 100Mbit/s =3D~ 3.6M which is about what I can get here as = residential customer here as you can probably get in Japan, and that's = about 300ms from some parts of Europe. Equally it would allow me to get Gbit/s throughout most parts of the = continent here and it's still 400Mbit/s East-to-West coast in theory if = I got the maths right. In addition to Gordon's values: I think the current OSX maximum send/recvspace values you can set are = around 3720000. The defaults are even more abysmal than FreeBSD's and I = have yet to figure out to make the changes persistent over a reboot but = ELIST. So all in all I think the 2M are a save bet at least. /bz --=20 Bjoern A. Zeeb You have to have visions! Stop bit received. Insert coin for new address family. From owner-freebsd-arch@FreeBSD.ORG Wed May 4 21:26:06 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 34D51106564A; Wed, 4 May 2011 21:26:05 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) by mx1.freebsd.org (Postfix) with ESMTP id 8C5908FC15; Wed, 4 May 2011 21:26:05 +0000 (UTC) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mx1.stack.nl (Postfix) with ESMTP id E98691DD82D; Wed, 4 May 2011 23:26:04 +0200 (CEST) Received: by turtle.stack.nl (Postfix, from userid 1677) id D5B6817395; Wed, 4 May 2011 23:26:04 +0200 (CEST) Date: Wed, 4 May 2011 23:26:04 +0200 From: Jilles Tjoelker To: John Baldwin Message-ID: <20110504212604.GA13717@stack.nl> References: <201105021537.19507.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201105021537.19507.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@freebsd.org Subject: Re: [PATCH] Add ktrace records for user page faults X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 21:26:06 -0000 On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote: > One thing I have found useful is knowing when processes are in the > kernel instead of in userland. ktrace already provides records for > syscall entry/exit. The other major source of time spent in the > kernel that I've seen is page fault handling. To that end, I have a > patch that adds ktrace records to the beginning and end of VM faults. > This gives a pair of records so a user can see how long a fault took > (similar to how one can see how long a syscall takes now). Sample > output from kdump is below: > 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ| > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0) > 47565 echo RET mmap 34370777088/0x800a87000 > 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE > 47565 echo RET KERN_SUCCESS > 47565 echo CALL munmap(0x800887000,0x179000) > 47565 echo RET munmap 0 > 47565 echo PFLT 0x800a00000 VM_PROT_WRITE > 47565 echo RET KERN_SUCCESS Just a small nitpick, I think the return from a page fault should not use the same "RET" keyword; even though the next word unambiguously distinguishes it from a return from a syscall, I think it is clearer in the documentation and possibly useful for automated processing to use a separate keyword such as "PRET". -- Jilles Tjoelker