Date: Fri, 17 Jan 2020 11:29:26 -0800 From: Steve Kargl <sgk@troutmask.apl.washington.edu> To: Ed Maste <emaste@freebsd.org> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: Turn off PROFILE option and remove WITH_PROFILE after FreeBSD 13? Message-ID: <20200117192926.GA68201@troutmask.apl.washington.edu> In-Reply-To: <CAPyFy2CN1cDPHwDsUzCvYSd6FeyOb1jAEOWrfXX2N_rZFEqaEw@mail.gmail.com> References: <CAPyFy2Bk6DTYrDgkhra9xP03bJZXq5vDkD8iXbTZZGpfj3MUZA@mail.gmail.com> <20200117171950.GA79297@troutmask.apl.washington.edu> <CAPyFy2CN1cDPHwDsUzCvYSd6FeyOb1jAEOWrfXX2N_rZFEqaEw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jan 17, 2020 at 01:12:32PM -0500, Ed Maste wrote: > On Fri, 17 Jan 2020 at 12:19, Steve Kargl > <sgk@troutmask.apl.washington.edu> wrote: > > > > Why? Because adding -pg to the gfortran command line is sufficient > > to getting profiling information for long running numerically > > intensive codes. 'gfortran -pg', of course, loosk for libc_p.a > > and libm_p.a. > > Have you tried sampling-based profiling (i.e., hwpmc)? I'm curious if > it provides equal utility for you, or if there's some shortcoming. Never needed to try hwpmc. % gfortran9 -o z -pg fortran_file.f90 just works if libc_p.a and libm_p.a are present. There is a link-time failure if the libraries are missing. Here's an example of using -pg that found a bottleneck in my code (which I haven't profiled recently). Each sample counts as 0.000123062 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 46.80 275.68 275.68 1178817696 0.00 0.00 __lum_MOD_cludet_dble 11.55 343.73 68.05 19458348 0.00 0.00 __sjnm_MOD_csjn_dble 7.09 385.47 41.73 19458348 0.00 0.00 __sphere_MOD_sphere_shell_formfcn 5.97 420.63 35.16 97291740 0.00 0.00 __sjnm_MOD_sjn_dble 3.84 443.24 22.61 23712564606 0.00 0.00 cabs (w_cabs.c:17 @ 4968f0) The cludet_dble() routine is a bottleneck, which makes heavy use of cabs(). It so happens that cludet_dble doesn't need to use cabs, and instead can look at the magnitude square. Replacing cabs(z) with creal(z)**2 + cimag(z)**2 gives Each sample counts as 0.000123062 seconds. % cumulative self self total 53.93 232.70 232.70 1178817696 0.00 0.00 __lum_MOD_cludet_dble 15.84 301.02 68.32 19458348 0.00 0.00 __sjnm_MOD_csjn_dble 10.63 346.91 45.88 19458348 0.00 0.00 __sphere_MOD_sphere_shell_formfcn 7.84 380.71 33.81 97291740 0.00 0.00 __sjnm_MOD_sjn_dble Nominally, a 43 CPU seconds decrease. That 43 seconds accumulates quickly, when the code is executed a few thousand times for Monte Carlo simulations. Is there a trivially stupid way of using hwpmc that requires no changes to fortran_file.f90? PS: For those snickering about the word Fortran. Go read the Fortran 2018 standard and educate yourselves. You want document 007 from https://j3-fortran.org/doc/standing. -- Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200117192926.GA68201>