From owner-freebsd-current@freebsd.org Fri Jan 17 19:29:29 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 46B2D1F88DD; Fri, 17 Jan 2020 19:29:29 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 47zrjP0rwYz40Z0; Fri, 17 Jan 2020 19:29:28 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id 00HJTQm3009068 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 17 Jan 2020 11:29:26 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id 00HJTQJJ009067; Fri, 17 Jan 2020 11:29:26 -0800 (PST) (envelope-from sgk) Date: Fri, 17 Jan 2020 11:29:26 -0800 From: Steve Kargl To: Ed Maste Cc: FreeBSD Hackers , FreeBSD Current Subject: Re: Turn off PROFILE option and remove WITH_PROFILE after FreeBSD 13? Message-ID: <20200117192926.GA68201@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20200117171950.GA79297@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.12.2 (2019-09-21) X-Rspamd-Queue-Id: 47zrjP0rwYz40Z0 X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-5.96 / 15.00]; NEURAL_HAM_MEDIUM(-0.99)[-0.992,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-0.97)[-0.969,0] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jan 2020 19:29:29 -0000 On Fri, Jan 17, 2020 at 01:12:32PM -0500, Ed Maste wrote: > On Fri, 17 Jan 2020 at 12:19, Steve Kargl > wrote: > > > > Why? Because adding -pg to the gfortran command line is sufficient > > to getting profiling information for long running numerically > > intensive codes. 'gfortran -pg', of course, loosk for libc_p.a > > and libm_p.a. > > Have you tried sampling-based profiling (i.e., hwpmc)? I'm curious if > it provides equal utility for you, or if there's some shortcoming. Never needed to try hwpmc. % gfortran9 -o z -pg fortran_file.f90 just works if libc_p.a and libm_p.a are present. There is a link-time failure if the libraries are missing. Here's an example of using -pg that found a bottleneck in my code (which I haven't profiled recently). Each sample counts as 0.000123062 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 46.80 275.68 275.68 1178817696 0.00 0.00 __lum_MOD_cludet_dble 11.55 343.73 68.05 19458348 0.00 0.00 __sjnm_MOD_csjn_dble 7.09 385.47 41.73 19458348 0.00 0.00 __sphere_MOD_sphere_shell_formfcn 5.97 420.63 35.16 97291740 0.00 0.00 __sjnm_MOD_sjn_dble 3.84 443.24 22.61 23712564606 0.00 0.00 cabs (w_cabs.c:17 @ 4968f0) The cludet_dble() routine is a bottleneck, which makes heavy use of cabs(). It so happens that cludet_dble doesn't need to use cabs, and instead can look at the magnitude square. Replacing cabs(z) with creal(z)**2 + cimag(z)**2 gives Each sample counts as 0.000123062 seconds. % cumulative self self total 53.93 232.70 232.70 1178817696 0.00 0.00 __lum_MOD_cludet_dble 15.84 301.02 68.32 19458348 0.00 0.00 __sjnm_MOD_csjn_dble 10.63 346.91 45.88 19458348 0.00 0.00 __sphere_MOD_sphere_shell_formfcn 7.84 380.71 33.81 97291740 0.00 0.00 __sjnm_MOD_sjn_dble Nominally, a 43 CPU seconds decrease. That 43 seconds accumulates quickly, when the code is executed a few thousand times for Monte Carlo simulations. Is there a trivially stupid way of using hwpmc that requires no changes to fortran_file.f90? PS: For those snickering about the word Fortran. Go read the Fortran 2018 standard and educate yourselves. You want document 007 from https://j3-fortran.org/doc/standing. -- Steve