From owner-freebsd-hackers@freebsd.org Sun Feb 2 12:58:33 2020 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 587E322DD2E for ; Sun, 2 Feb 2020 12:58:33 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 489WGw1Tc1z477T for ; Sun, 2 Feb 2020 12:58:31 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1iyEpS-0008xH-SB for freebsd-hackers@freebsd.org; Sun, 02 Feb 2020 15:58:22 +0300 Date: Sun, 2 Feb 2020 15:58:22 +0300 From: Slawa Olhovchenkov To: freebsd-hackers@freebsd.org Subject: HWPMC trouble Message-ID: <20200202125822.GB8028@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-Rspamd-Queue-Id: 489WGw1Tc1z477T X-Spamd-Bar: ++++ Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of slw@zxy.spb.ru has no SPF policy when checking 195.70.199.98) smtp.mailfrom=slw@zxy.spb.ru X-Spamd-Result: default: False [4.89 / 15.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; IP_SCORE(0.08)[asn: 5495(0.40), country: RU(0.01)]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[zxy.spb.ru]; AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_SPAM_MEDIUM(0.90)[0.905,0]; MIME_TRACE(0.00)[0:+]; NEURAL_SPAM_LONG(1.00)[0.999,0]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; INTRODUCTION(2.00)[]; ASN(0.00)[asn:5495, ipnet:195.70.192.0/19, country:RU]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Feb 2020 12:58:33 -0000 I am try use hwpmc on this hardware: CPU: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz (3066.83-MHz K8-class CPU) hwpmc: SOFT/16/64/0x67 TSC/1/64/0x20 IAP/4/48/0x3ff IAF/3/48/0x67 UCP/8/48/0x3f8 UCF/1/48/0x60 I am have some troubles: 1. I am try to discover hotspots in userland code run on 4th CPU: # pmcstat -c 4 -l 4 -S cpu_clk_unhalted.thread -O sample.out # ls -l sample.out -rw-r--r-- 1 root wheel 70480 Feb 2 15:46 sample.out Too small! # pmcstat -R sample.out -z100 -G out.txt CONVERSION STATISTICS: #samples/total 564 #callchain/dubious-frames 551 # head -30 out.txt @ cpu_clk_unhalted.thread [13 samples] 23.08% [3] em_update_stats_counters @ /boot/kernel/kernel 100.0% [3] em_if_update_admin_status 100.0% [3] _task_fn_admin 100.0% [3] gtaskqueue_run_locked 100.0% [3] gtaskqueue_thread_loop 100.0% [3] fork_exit 15.38% [2] spinlock_exit @ /boot/kernel/kernel 50.00% [1] spinlock_exit 100.0% [1] wakeup_any 100.0% [1] grouptaskqueue_enqueue 100.0% [1] iflib_timer 100.0% [1] softclock_call_cc 100.0% [1] softclock 100.0% [1] ithread_loop 100.0% [1] fork_exit 50.00% [1] cpuctl_do_msr @ /boot/kernel/cpuctl.ko 100.0% [1] devfs_ioctl @ /boot/kernel/kernel 100.0% [1] VOP_IOCTL_APV 100.0% [1] vn_ioctl 100.0% [1] devfs_ioctl_f 100.0% [1] kern_ioctl 100.0% [1] sys_ioctl 100.0% [1] amd64_syscall I am don't see any actual code path. In reality `top` show: CPU 4: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle Where all samples from userland? 2. I am try to discover mostly memory touch codeptah: # pmcstat -l 4 -S mem_load_retired.llc_miss -O sample.out # ls -l sample.out -rw-r--r-- 1 root wheel 235896 Feb 2 15:51 sample.out # pmcstat -R sample.out initlog 0x9030000 "INTEL_WESTMERE" allocate 0x711 "mem_load_retired.llc_miss" 0x20080 allocate 0x100711 "mem_load_retired.llc_miss" 0x20080 allocate 0x200711 "mem_load_retired.llc_miss" 0x20080 allocate 0x300711 "mem_load_retired.llc_miss" 0x20080 allocate 0x400711 "mem_load_retired.llc_miss" 0x20080 allocate 0x500711 "mem_load_retired.llc_miss" 0x20080 allocate 0x600711 "mem_load_retired.llc_miss" 0x20080 allocate 0x700711 "mem_load_retired.llc_miss" 0x20080 allocate 0x800711 "mem_load_retired.llc_miss" 0x20080 allocate 0x900711 "mem_load_retired.llc_miss" 0x20080 allocate 0xa00711 "mem_load_retired.llc_miss" 0x20080 allocate 0xb00711 "mem_load_retired.llc_miss" 0x20080 unknown event (type 20). unknown event (type 18). unknown event (type 20). unknown event (type 18). unknown event (type 20). .... unknown event (type 18). map-in -1 0xffffffff80200000 "kernel" map-in -1 0xffffffff82471000 "xhci.ko" ... # pmcstat -R sample.out -z100 -G out.txt CONVERSION STATISTICS: #samples/total 27 # less out.txt @ mem_load_retired.llc_miss [27 samples] 81.48% [22] native_lapic_ipi_raw @ /boot/kernel/kernel 100.0% [22] sched_add 100.0% [22] setrunnable 100.0% [22] sleepq_signal 100.0% [22] wakeup_any 100.0% [22] grouptaskqueue_enqueue 100.0% [22] iflib_fast_intr 100.0% [22] intr_event_handle 100.0% [22] intr_execute_handlers 100.0% [22] lapic_handle_intr 14.81% [4] lapic_handle_intr @ /boot/kernel/kernel 03.70% [1] sched_clock @ /boot/kernel/kernel 100.0% [1] statclock 100.0% [1] handleevents 100.0% [1] timercb 100.0% [1] lapic_handle_timer Where all userland? `pcm.x` output: EXEC : instructions per nominal CPU cycle IPC : instructions per CPU cycle FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost) AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost) L3MISS: L3 (read) cache misses L2MISS: L2 (read) cache misses (including other core's L2 cache *hits*) L3HIT : L3 (read) cache hit ratio (0.00-1.00) L2HIT : L2 cache hit ratio (0.00-1.00) L3MPI : number of L3 (read) cache misses per instruction L2MPI : number of L2 (read) cache misses per instruction READ : bytes read from main memory controller (in GBytes) WRITE : bytes written to main memory controller (in GBytes) TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature energy: Energy in Joules Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | TEMP 0 0 0.70 0.64 1.09 1.09 14 M 20 M 0.31 0.26 0.01 0.01 24 1 0 0.89 0.81 1.09 1.09 3084 K 17 M 0.83 0.31 0.00 0.01 24 2 0 0.97 0.89 1.09 1.09 17 M 25 M 0.30 0.39 0.01 0.01 28 3 0 0.01 0.20 0.04 1.09 37 K 327 K 0.89 0.73 0.00 0.01 28 4 0 0.68 0.63 1.09 1.09 7268 K 9959 K 0.27 0.13 0.00 0.00 28 5 0 0.68 0.63 1.09 1.09 7275 K 9789 K 0.26 0.12 0.00 0.00 28 6 0 0.68 0.63 1.09 1.09 7271 K 10 M 0.28 0.11 0.00 0.00 24 7 0 0.68 0.62 1.09 1.09 7269 K 9761 K 0.26 0.11 0.00 0.00 24 8 0 0.65 0.60 1.09 1.09 7265 K 10 M 0.29 0.18 0.00 0.01 23 9 0 0.68 0.62 1.09 1.09 7270 K 10 M 0.28 0.17 0.00 0.00 23 10 0 0.65 0.60 1.09 1.09 7272 K 10 M 0.28 0.18 0.00 0.01 26 11 0 0.67 0.62 1.09 1.09 7283 K 9946 K 0.27 0.18 0.00 0.00 26 --------------------------------------------------------------------------------------------------------------- SKT 0 0.66 0.66 1.00 1.09 93 M 143 M 0.35 0.24 0.00 0.01 N/A --------------------------------------------------------------------------------------------------------------- TOTAL * 0.66 0.66 1.00 1.09 93 M 143 M 0.35 0.24 0.00 0.01 N/A Instructions retired: 24 G ; Active cycles: 36 G ; Time (TSC): 3069 Mticks ; C0 (active,non-halted) core residency: 92.01 % C1 core residency: 7.99 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %; PHYSICAL CORE IPC : 1.32 => corresponds to 33.07 % utilization for cores in active state Instructions per nominal CPU cycle: 1.32 => corresponds to 33.07 % core utilization over time interval SMI count: 0 --------------------------------------------------------------------------------------------------------------- MEM (GB)->| READ | WRITE | --------------------------------------------------------------------------------------------------------------- SKT 0 7.36 5.04 --------------------------------------------------------------------------------------------------------------- I.e I am got 7.2M LLC miss in reality and don't collect any. What I am miss?