Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Jun 2020 20:15:39 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 247432] panic: general protection fault in ucp_start_pmc for uncore on E5504 processor
Message-ID:  <bug-247432-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D247432

            Bug ID: 247432
           Summary: panic: general protection fault in ucp_start_pmc for
                    uncore on E5504 processor
           Product: Base System
           Version: 12.1-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: dgmorris@earthlink.net

Internal Dell FreeBSD-based product testing includes a pmc test that among
other things does:

        for i in $(pmccontrol -L | grep -v -e "IAF" -e "IAP" -e "TSC" -e "U=
NC"
\
                 -e "UCF" -e "UCP" -e "SOFT"); do

                pmcstat -p $i ls
                process_cnt=3D`echo $?`

                # Error 71 is returned if counter is system specific and
                # not process specific so skip then
                if [ $process_cnt -ne 0 ] && [ $process_cnt -ne 71 ]; then
                        atf_fail "PMC counter not working"
                fi
        done

This produces a panic on E5504 processor systems.

Reproducing locally to narrow it down, it became apparent that the uncore
options are triggering the panic:
        mem_uncore_retired.local_dram
        mem_uncore_retired.other_core_l2_hitm
        mem_uncore_retired.remote_cache_local_home_hit
        mem_uncore_retired.remote_dram
        mem_uncore_retired.uncacheable

Panic information:

Fatal trap 9: general protection fault while in kernel mode
cpuid =3D 0; apic id =3D 00
instruction pointer     =3D 0x20:0xffffffff82c30604
stack pointer           =3D 0x28:0xfffffe0044204640
frame pointer           =3D 0x28:0xfffffe0044204640
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D resume, IOPL =3D 0
current process         =3D 1115 (pmcstat)
trap number             =3D 9
panic: general protection fault
cpuid =3D 0
time =3D 1592596633
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0044204=
350
vpanic() at vpanic+0x19d/frame 0xfffffe00442043a0
panic() at panic+0x43/frame 0xfffffe0044204400
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe0044204460
trap() at trap+0x6c/frame 0xfffffe0044204570
calltrap() at calltrap+0x8/frame 0xfffffe0044204570
--- trap 0x9, rip =3D 0xffffffff82c30604, rsp =3D 0xfffffe0044204640, rbp =
=3D
0xfffffe0044204640 ---
ucp_start_pmc() at ucp_start_pmc+0xa4/frame 0xfffffe0044204640
pmc_hook_handler() at pmc_hook_handler+0xfda/frame 0xfffffe0044204700
sched_switch() at sched_switch+0x691/frame 0xfffffe00442047d0
mi_switch() at mi_switch+0xe2/frame 0xfffffe0044204800
sleepq_catch_signals() at sleepq_catch_signals+0x425/frame 0xfffffe00442048=
50
sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe0044204880
_sleep() at _sleep+0x23a/frame 0xfffffe00442048f0
sbwait() at sbwait+0x4c/frame 0xfffffe0044204910
soreceive_generic() at soreceive_generic+0x286/frame 0xfffffe00442049e0
soreceive() at soreceive+0x44/frame 0xfffffe0044204a00
dofileread() at dofileread+0x95/frame 0xfffffe0044204a40
sys_read() at sys_read+0xc1/frame 0xfffffe0044204ab0
amd64_syscall() at amd64_syscall+0x364/frame 0xfffffe0044204bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0044204bf0
--- syscall (3, FreeBSD ELF64, sys_read), rip =3D 0x80095adfa, rsp =3D
0x7fffffffe3f8, rbp =3D 0x7fffffffe470 ---
Uptime: 1m4s
Dumping 435 out of 6085 MB:..4%..12%..23%..34%..41%..52%..63%..74%..81%..92%

__curthread () at /usr/src/sys/amd64/include/pcpu.h:234
234             __asm("movq %%gs:%P1,%0" : "=3Dr" (td) : "n"
(OFFSETOF_CURTHREAD));
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu.h:234
#1  doadump (textdump=3D1) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff80bdf95d in kern_reboot (howto=3D260) at
/usr/src/sys/kern/kern_shutdown.c:451
#3  0xffffffff80bdfde9 in vpanic (fmt=3D<optimized out>, ap=3D<optimized ou=
t>) at
/usr/src/sys/kern/kern_shutdown.c:877
#4  0xffffffff80bdfbe3 in panic (fmt=3D<unavailable>) at
/usr/src/sys/kern/kern_shutdown.c:804
#5  0xffffffff810c93cc in trap_fatal (frame=3D0xfffffe0044204580, eva=3D0) =
at
/usr/src/sys/amd64/amd64/trap.c:943
#6  0xffffffff810c87dc in trap (frame=3D0xfffffe0044204580) at
/usr/src/sys/amd64/amd64/trap.c:221
#7  <signal handler called>
#8  0xffffffff82c30604 in wrmsr (msr=3D960, newval=3D<optimized out>) at
/usr/src/sys/amd64/include/cpufunc.h:433
#9  ucp_start_pmc (cpu=3D<optimized out>, ri=3D0) at
/usr/src/sys/dev/hwpmc/hwpmc_uncore.c:707
#10 0xffffffff82c2556a in pmc_process_csw_in (td=3D<optimized out>) at
/usr/src/sys/dev/hwpmc/hwpmc_mod.c:1492
#11 pmc_hook_handler (td=3D0xfffff80009bf75e0, function=3D<optimized out>,
arg=3D<optimized out>) at /usr/src/sys/dev/hwpmc/hwpmc_mod.c:2210
#12 0xffffffff80c119f1 in sched_switch (td=3D0xfffff80009bf75e0, newtd=3D<o=
ptimized
out>, flags=3D<optimized out>) at /usr/src/sys/kern/sched_ule.c:2120
#13 0xffffffff80beb922 in mi_switch (flags=3D260, newtd=3D0x0) at
/usr/src/sys/kern/kern_synch.c:452
#14 0xffffffff80c3c265 in sleepq_catch_signals (wchan=3D0xfffff800097c053c,
pri=3D-1) at /usr/src/sys/kern/subr_sleepqueue.c:528
#15 0xffffffff80c3bd9f in sleepq_wait_sig (wchan=3D0xfffff8000fbaf500, pri=
=3D0) at
/usr/src/sys/kern/subr_sleepqueue.c:719
#16 0xffffffff80beb34a in _sleep (ident=3D0xfffff800097c053c,
lock=3D0xfffff800097c04c0, priority=3D360, wmesg=3D0xffffffff81258462 "sbwa=
it",
sbt=3D0, pr=3D0, flags=3D0)
    at /usr/src/sys/kern/kern_synch.c:215
#17 0xffffffff80c77cec in sbwait (sb=3D0x100000000) at
/usr/src/sys/kern/uipc_sockbuf.c:267
#18 0xffffffff80c7d176 in soreceive_generic (so=3D<optimized out>, psa=3D0x=
0,
uio=3D0xfffffe0044204a50, mp0=3D0x0, controlp=3D0x0, flagsp=3D0x0)
    at /usr/src/sys/kern/uipc_socket.c:1813
#19 0xffffffff80c7ef94 in soreceive (so=3D0xfffff8000fbaf500, psa=3D0x10000=
0000,
uio=3D0x0, mp0=3D0x3c0, controlp=3D0x43200f, flagsp=3D0x0)
    at /usr/src/sys/kern/uipc_socket.c:2563
#20 0xffffffff80c4c505 in fo_read (fp=3D<optimized out>, uio=3D<optimized o=
ut>,
active_cred=3D0x0, flags=3D<optimized out>, td=3D<optimized out>)
    at /usr/src/sys/sys/file.h:313
#21 dofileread (td=3D<optimized out>, fd=3D5, fp=3D<optimized out>,
auio=3D0xfffffe0044204a50, offset=3D5, flags=3D<optimized out>) at
/usr/src/sys/kern/sys_generic.c:368
#22 0xffffffff80c4c081 in kern_readv (td=3D<optimized out>, fd=3D5, auio=3D=
<optimized
out>) at /usr/src/sys/kern/sys_generic.c:289
#23 sys_read (td=3D0xfffff80009bf75e0, uap=3D<optimized out>) at
/usr/src/sys/kern/sys_generic.c:205
#24 0xffffffff810c9f84 in syscallenter (td=3D0xfffff80009bf75e0) at
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#25 amd64_syscall (td=3D0xfffff80009bf75e0, traced=3D0) at
/usr/src/sys/amd64/amd64/trap.c:1186
#26 <signal handler called>
#27 0x000000080095adfa in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffe3f8
(kgdb) frame 9
#9  ucp_start_pmc (cpu=3D<optimized out>, ri=3D0) at
/usr/src/sys/dev/hwpmc/hwpmc_uncore.c:707
707             wrmsr(SELECTSEL(uncore_cputype) + ri, evsel);
(kgdb) p ri
$1 =3D 0
(kgdb) p uncore_cputype
$2 =3D PMC_CPU_INTEL_COREI7
(kgdb) p evsel
$3 =3D 4399119
(kgdb) p/x evsel
$4 =3D 0x43200f

Note that the 960 passed to wrmsr does properly correspond to 0x3c0
(UCP_EVSEL0) as SELECTSEL(PMC_CPU_INTEL_COREI7) should be returning.

This reproduces 100% for me on a Z600 Workstation with:
CPU: Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz (1995.04-MHz K8-class =
CPU)
  Origin=3D"GenuineIntel"  Id=3D0x106a5  Family=3D0x6  Model=3D0x1a  Steppi=
ng=3D5
=20
Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,=
MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
=20
Features2=3D0x9ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDC=
M,DCA,SSE4.1,SSE4.2,POPCNT>
  AMD Features=3D0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=3D0x1<LAHF>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,VPID
  TSC: P-state invariant, performance statistics

I suspect it does for any other E5504 system as well. This is a dual socket
motherboard with a single socket populated, but based on the Intel Software
Manuals, the uncore stuff should be within the package - so I don't think t=
hat
should matter (just reporting it in case it rings a bell).

Older hardware, I know - but figured it was worth reporting.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-247432-227>