Date: Sun, 2 Feb 2020 15:58:22 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: freebsd-hackers@freebsd.org Subject: HWPMC trouble Message-ID: <20200202125822.GB8028@zxy.spb.ru>
next in thread | raw e-mail | index | archive | help
I am try use hwpmc on this hardware: CPU: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz (3066.83-MHz K8-class CPU) hwpmc: SOFT/16/64/0x67<INT,USR,SYS,REA,WRI> TSC/1/64/0x20<REA> IAP/4/48/0x3ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC> IAF/3/48/0x67<INT,USR,SYS,REA,WRI> UCP/8/48/0x3f8<EDG,THR,REA,WRI,INV,QUA,PRC> UCF/1/48/0x60<REA,WRI> I am have some troubles: 1. I am try to discover hotspots in userland code run on 4th CPU: # pmcstat -c 4 -l 4 -S cpu_clk_unhalted.thread -O sample.out # ls -l sample.out -rw-r--r-- 1 root wheel 70480 Feb 2 15:46 sample.out Too small! # pmcstat -R sample.out -z100 -G out.txt CONVERSION STATISTICS: #samples/total 564 #callchain/dubious-frames 551 # head -30 out.txt @ cpu_clk_unhalted.thread [13 samples] 23.08% [3] em_update_stats_counters @ /boot/kernel/kernel 100.0% [3] em_if_update_admin_status 100.0% [3] _task_fn_admin 100.0% [3] gtaskqueue_run_locked 100.0% [3] gtaskqueue_thread_loop 100.0% [3] fork_exit 15.38% [2] spinlock_exit @ /boot/kernel/kernel 50.00% [1] spinlock_exit 100.0% [1] wakeup_any 100.0% [1] grouptaskqueue_enqueue 100.0% [1] iflib_timer 100.0% [1] softclock_call_cc 100.0% [1] softclock 100.0% [1] ithread_loop 100.0% [1] fork_exit 50.00% [1] cpuctl_do_msr @ /boot/kernel/cpuctl.ko 100.0% [1] devfs_ioctl @ /boot/kernel/kernel 100.0% [1] VOP_IOCTL_APV 100.0% [1] vn_ioctl 100.0% [1] devfs_ioctl_f 100.0% [1] kern_ioctl 100.0% [1] sys_ioctl 100.0% [1] amd64_syscall I am don't see any actual code path. In reality `top` show: CPU 4: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle Where all samples from userland? 2. I am try to discover mostly memory touch codeptah: # pmcstat -l 4 -S mem_load_retired.llc_miss -O sample.out # ls -l sample.out -rw-r--r-- 1 root wheel 235896 Feb 2 15:51 sample.out # pmcstat -R sample.out initlog 0x9030000 "INTEL_WESTMERE" allocate 0x711 "mem_load_retired.llc_miss" 0x20080 allocate 0x100711 "mem_load_retired.llc_miss" 0x20080 allocate 0x200711 "mem_load_retired.llc_miss" 0x20080 allocate 0x300711 "mem_load_retired.llc_miss" 0x20080 allocate 0x400711 "mem_load_retired.llc_miss" 0x20080 allocate 0x500711 "mem_load_retired.llc_miss" 0x20080 allocate 0x600711 "mem_load_retired.llc_miss" 0x20080 allocate 0x700711 "mem_load_retired.llc_miss" 0x20080 allocate 0x800711 "mem_load_retired.llc_miss" 0x20080 allocate 0x900711 "mem_load_retired.llc_miss" 0x20080 allocate 0xa00711 "mem_load_retired.llc_miss" 0x20080 allocate 0xb00711 "mem_load_retired.llc_miss" 0x20080 unknown event (type 20). unknown event (type 18). unknown event (type 20). unknown event (type 18). unknown event (type 20). .... unknown event (type 18). map-in -1 0xffffffff80200000 "kernel" map-in -1 0xffffffff82471000 "xhci.ko" ... # pmcstat -R sample.out -z100 -G out.txt CONVERSION STATISTICS: #samples/total 27 # less out.txt @ mem_load_retired.llc_miss [27 samples] 81.48% [22] native_lapic_ipi_raw @ /boot/kernel/kernel 100.0% [22] sched_add 100.0% [22] setrunnable 100.0% [22] sleepq_signal 100.0% [22] wakeup_any 100.0% [22] grouptaskqueue_enqueue 100.0% [22] iflib_fast_intr 100.0% [22] intr_event_handle 100.0% [22] intr_execute_handlers 100.0% [22] lapic_handle_intr 14.81% [4] lapic_handle_intr @ /boot/kernel/kernel 03.70% [1] sched_clock @ /boot/kernel/kernel 100.0% [1] statclock 100.0% [1] handleevents 100.0% [1] timercb 100.0% [1] lapic_handle_timer Where all userland? `pcm.x` output: EXEC : instructions per nominal CPU cycle IPC : instructions per CPU cycle FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost) AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost) L3MISS: L3 (read) cache misses L2MISS: L2 (read) cache misses (including other core's L2 cache *hits*) L3HIT : L3 (read) cache hit ratio (0.00-1.00) L2HIT : L2 cache hit ratio (0.00-1.00) L3MPI : number of L3 (read) cache misses per instruction L2MPI : number of L2 (read) cache misses per instruction READ : bytes read from main memory controller (in GBytes) WRITE : bytes written to main memory controller (in GBytes) TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature energy: Energy in Joules Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | TEMP 0 0 0.70 0.64 1.09 1.09 14 M 20 M 0.31 0.26 0.01 0.01 24 1 0 0.89 0.81 1.09 1.09 3084 K 17 M 0.83 0.31 0.00 0.01 24 2 0 0.97 0.89 1.09 1.09 17 M 25 M 0.30 0.39 0.01 0.01 28 3 0 0.01 0.20 0.04 1.09 37 K 327 K 0.89 0.73 0.00 0.01 28 4 0 0.68 0.63 1.09 1.09 7268 K 9959 K 0.27 0.13 0.00 0.00 28 5 0 0.68 0.63 1.09 1.09 7275 K 9789 K 0.26 0.12 0.00 0.00 28 6 0 0.68 0.63 1.09 1.09 7271 K 10 M 0.28 0.11 0.00 0.00 24 7 0 0.68 0.62 1.09 1.09 7269 K 9761 K 0.26 0.11 0.00 0.00 24 8 0 0.65 0.60 1.09 1.09 7265 K 10 M 0.29 0.18 0.00 0.01 23 9 0 0.68 0.62 1.09 1.09 7270 K 10 M 0.28 0.17 0.00 0.00 23 10 0 0.65 0.60 1.09 1.09 7272 K 10 M 0.28 0.18 0.00 0.01 26 11 0 0.67 0.62 1.09 1.09 7283 K 9946 K 0.27 0.18 0.00 0.00 26 --------------------------------------------------------------------------------------------------------------- SKT 0 0.66 0.66 1.00 1.09 93 M 143 M 0.35 0.24 0.00 0.01 N/A --------------------------------------------------------------------------------------------------------------- TOTAL * 0.66 0.66 1.00 1.09 93 M 143 M 0.35 0.24 0.00 0.01 N/A Instructions retired: 24 G ; Active cycles: 36 G ; Time (TSC): 3069 Mticks ; C0 (active,non-halted) core residency: 92.01 % C1 core residency: 7.99 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %; PHYSICAL CORE IPC : 1.32 => corresponds to 33.07 % utilization for cores in active state Instructions per nominal CPU cycle: 1.32 => corresponds to 33.07 % core utilization over time interval SMI count: 0 --------------------------------------------------------------------------------------------------------------- MEM (GB)->| READ | WRITE | --------------------------------------------------------------------------------------------------------------- SKT 0 7.36 5.04 --------------------------------------------------------------------------------------------------------------- I.e I am got 7.2M LLC miss in reality and don't collect any. What I am miss?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200202125822.GB8028>