Date: Fri, 19 Jun 2020 20:15:39 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 247432] panic: general protection fault in ucp_start_pmc for uncore on E5504 processor Message-ID: <bug-247432-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D247432 Bug ID: 247432 Summary: panic: general protection fault in ucp_start_pmc for uncore on E5504 processor Product: Base System Version: 12.1-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: dgmorris@earthlink.net Internal Dell FreeBSD-based product testing includes a pmc test that among other things does: for i in $(pmccontrol -L | grep -v -e "IAF" -e "IAP" -e "TSC" -e "U= NC" \ -e "UCF" -e "UCP" -e "SOFT"); do pmcstat -p $i ls process_cnt=3D`echo $?` # Error 71 is returned if counter is system specific and # not process specific so skip then if [ $process_cnt -ne 0 ] && [ $process_cnt -ne 71 ]; then atf_fail "PMC counter not working" fi done This produces a panic on E5504 processor systems. Reproducing locally to narrow it down, it became apparent that the uncore options are triggering the panic: mem_uncore_retired.local_dram mem_uncore_retired.other_core_l2_hitm mem_uncore_retired.remote_cache_local_home_hit mem_uncore_retired.remote_dram mem_uncore_retired.uncacheable Panic information: Fatal trap 9: general protection fault while in kernel mode cpuid =3D 0; apic id =3D 00 instruction pointer =3D 0x20:0xffffffff82c30604 stack pointer =3D 0x28:0xfffffe0044204640 frame pointer =3D 0x28:0xfffffe0044204640 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D resume, IOPL =3D 0 current process =3D 1115 (pmcstat) trap number =3D 9 panic: general protection fault cpuid =3D 0 time =3D 1592596633 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0044204= 350 vpanic() at vpanic+0x19d/frame 0xfffffe00442043a0 panic() at panic+0x43/frame 0xfffffe0044204400 trap_fatal() at trap_fatal+0x39c/frame 0xfffffe0044204460 trap() at trap+0x6c/frame 0xfffffe0044204570 calltrap() at calltrap+0x8/frame 0xfffffe0044204570 --- trap 0x9, rip =3D 0xffffffff82c30604, rsp =3D 0xfffffe0044204640, rbp = =3D 0xfffffe0044204640 --- ucp_start_pmc() at ucp_start_pmc+0xa4/frame 0xfffffe0044204640 pmc_hook_handler() at pmc_hook_handler+0xfda/frame 0xfffffe0044204700 sched_switch() at sched_switch+0x691/frame 0xfffffe00442047d0 mi_switch() at mi_switch+0xe2/frame 0xfffffe0044204800 sleepq_catch_signals() at sleepq_catch_signals+0x425/frame 0xfffffe00442048= 50 sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe0044204880 _sleep() at _sleep+0x23a/frame 0xfffffe00442048f0 sbwait() at sbwait+0x4c/frame 0xfffffe0044204910 soreceive_generic() at soreceive_generic+0x286/frame 0xfffffe00442049e0 soreceive() at soreceive+0x44/frame 0xfffffe0044204a00 dofileread() at dofileread+0x95/frame 0xfffffe0044204a40 sys_read() at sys_read+0xc1/frame 0xfffffe0044204ab0 amd64_syscall() at amd64_syscall+0x364/frame 0xfffffe0044204bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0044204bf0 --- syscall (3, FreeBSD ELF64, sys_read), rip =3D 0x80095adfa, rsp =3D 0x7fffffffe3f8, rbp =3D 0x7fffffffe470 --- Uptime: 1m4s Dumping 435 out of 6085 MB:..4%..12%..23%..34%..41%..52%..63%..74%..81%..92% __curthread () at /usr/src/sys/amd64/include/pcpu.h:234 234 __asm("movq %%gs:%P1,%0" : "=3Dr" (td) : "n" (OFFSETOF_CURTHREAD)); (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu.h:234 #1 doadump (textdump=3D1) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0xffffffff80bdf95d in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0xffffffff80bdfde9 in vpanic (fmt=3D<optimized out>, ap=3D<optimized ou= t>) at /usr/src/sys/kern/kern_shutdown.c:877 #4 0xffffffff80bdfbe3 in panic (fmt=3D<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:804 #5 0xffffffff810c93cc in trap_fatal (frame=3D0xfffffe0044204580, eva=3D0) = at /usr/src/sys/amd64/amd64/trap.c:943 #6 0xffffffff810c87dc in trap (frame=3D0xfffffe0044204580) at /usr/src/sys/amd64/amd64/trap.c:221 #7 <signal handler called> #8 0xffffffff82c30604 in wrmsr (msr=3D960, newval=3D<optimized out>) at /usr/src/sys/amd64/include/cpufunc.h:433 #9 ucp_start_pmc (cpu=3D<optimized out>, ri=3D0) at /usr/src/sys/dev/hwpmc/hwpmc_uncore.c:707 #10 0xffffffff82c2556a in pmc_process_csw_in (td=3D<optimized out>) at /usr/src/sys/dev/hwpmc/hwpmc_mod.c:1492 #11 pmc_hook_handler (td=3D0xfffff80009bf75e0, function=3D<optimized out>, arg=3D<optimized out>) at /usr/src/sys/dev/hwpmc/hwpmc_mod.c:2210 #12 0xffffffff80c119f1 in sched_switch (td=3D0xfffff80009bf75e0, newtd=3D<o= ptimized out>, flags=3D<optimized out>) at /usr/src/sys/kern/sched_ule.c:2120 #13 0xffffffff80beb922 in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src/sys/kern/kern_synch.c:452 #14 0xffffffff80c3c265 in sleepq_catch_signals (wchan=3D0xfffff800097c053c, pri=3D-1) at /usr/src/sys/kern/subr_sleepqueue.c:528 #15 0xffffffff80c3bd9f in sleepq_wait_sig (wchan=3D0xfffff8000fbaf500, pri= =3D0) at /usr/src/sys/kern/subr_sleepqueue.c:719 #16 0xffffffff80beb34a in _sleep (ident=3D0xfffff800097c053c, lock=3D0xfffff800097c04c0, priority=3D360, wmesg=3D0xffffffff81258462 "sbwa= it", sbt=3D0, pr=3D0, flags=3D0) at /usr/src/sys/kern/kern_synch.c:215 #17 0xffffffff80c77cec in sbwait (sb=3D0x100000000) at /usr/src/sys/kern/uipc_sockbuf.c:267 #18 0xffffffff80c7d176 in soreceive_generic (so=3D<optimized out>, psa=3D0x= 0, uio=3D0xfffffe0044204a50, mp0=3D0x0, controlp=3D0x0, flagsp=3D0x0) at /usr/src/sys/kern/uipc_socket.c:1813 #19 0xffffffff80c7ef94 in soreceive (so=3D0xfffff8000fbaf500, psa=3D0x10000= 0000, uio=3D0x0, mp0=3D0x3c0, controlp=3D0x43200f, flagsp=3D0x0) at /usr/src/sys/kern/uipc_socket.c:2563 #20 0xffffffff80c4c505 in fo_read (fp=3D<optimized out>, uio=3D<optimized o= ut>, active_cred=3D0x0, flags=3D<optimized out>, td=3D<optimized out>) at /usr/src/sys/sys/file.h:313 #21 dofileread (td=3D<optimized out>, fd=3D5, fp=3D<optimized out>, auio=3D0xfffffe0044204a50, offset=3D5, flags=3D<optimized out>) at /usr/src/sys/kern/sys_generic.c:368 #22 0xffffffff80c4c081 in kern_readv (td=3D<optimized out>, fd=3D5, auio=3D= <optimized out>) at /usr/src/sys/kern/sys_generic.c:289 #23 sys_read (td=3D0xfffff80009bf75e0, uap=3D<optimized out>) at /usr/src/sys/kern/sys_generic.c:205 #24 0xffffffff810c9f84 in syscallenter (td=3D0xfffff80009bf75e0) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #25 amd64_syscall (td=3D0xfffff80009bf75e0, traced=3D0) at /usr/src/sys/amd64/amd64/trap.c:1186 #26 <signal handler called> #27 0x000000080095adfa in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffe3f8 (kgdb) frame 9 #9 ucp_start_pmc (cpu=3D<optimized out>, ri=3D0) at /usr/src/sys/dev/hwpmc/hwpmc_uncore.c:707 707 wrmsr(SELECTSEL(uncore_cputype) + ri, evsel); (kgdb) p ri $1 =3D 0 (kgdb) p uncore_cputype $2 =3D PMC_CPU_INTEL_COREI7 (kgdb) p evsel $3 =3D 4399119 (kgdb) p/x evsel $4 =3D 0x43200f Note that the 960 passed to wrmsr does properly correspond to 0x3c0 (UCP_EVSEL0) as SELECTSEL(PMC_CPU_INTEL_COREI7) should be returning. This reproduces 100% for me on a Z600 Workstation with: CPU: Intel(R) Xeon(R) CPU E5504 @ 2.00GHz (1995.04-MHz K8-class = CPU) Origin=3D"GenuineIntel" Id=3D0x106a5 Family=3D0x6 Model=3D0x1a Steppi= ng=3D5 =20 Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,= MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> =20 Features2=3D0x9ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDC= M,DCA,SSE4.1,SSE4.2,POPCNT> AMD Features=3D0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=3D0x1<LAHF> VT-x: PAT,HLT,MTF,PAUSE,EPT,VPID TSC: P-state invariant, performance statistics I suspect it does for any other E5504 system as well. This is a dual socket motherboard with a single socket populated, but based on the Intel Software Manuals, the uncore stuff should be within the package - so I don't think t= hat should matter (just reporting it in case it rings a bell). Older hardware, I know - but figured it was worth reporting. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-247432-227>