FreeBSD Mail Archives

Date:      Tue, 26 Jan 2016 10:45:31 +0100
From:      Wojciech Macek <wma@semihalf.com>
To:        Zbigniew Bodek <zbb@semihalf.com>
Cc:        Ruslan Bukin <br@freebsd.org>, Andrew Turner <andrew@freebsd.org>,  "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
Subject:   Re: Various panics while using HWPMC on ARM64
Message-ID:  <CANsEV8fJsbLkzOhT_ikycW7%2BKO1OJf83eVWMC84Q3yfvvWySvw@mail.gmail.com>
In-Reply-To: <CAG7dG%2ByqCQBAgmzAqn_2NT4_b6FGxhMJ3Q%2BDX5c5KapmC_7Rbw@mail.gmail.com>
References:  <CAG7dG%2BxBF_9bmQkwNsO77b2eOZ1jfOFafey2_Yw=wuJ45TwyMQ@mail.gmail.com> <CAPyFy2APpyUcFTjaZS9j4ZbRo-t=ta6ws-btGLZOqDK1ySPWgw@mail.gmail.com> <CAG7dG%2Bwcbb=%2B2Nvb1iWV=KNUdczXaWr_tWQketHS8XL5757%2BAQ@mail.gmail.com> <CAG7dG%2ByCJ=_OzBG-Pskt522W1ra2BqOxWY4X--vOkLx-qmMt8w@mail.gmail.com> <CAG7dG%2ByqCQBAgmzAqn_2NT4_b6FGxhMJ3Q%2BDX5c5KapmC_7Rbw@mail.gmail.com>

I started to look intu this and there is a race condition during pmcstat
thread migration - both CPUX and CPU0 (the one that the pmcstat is migrated
on) are touching the same structures at the same time. I don't have any fix
for that yet, but one might use a simple workaround with cpuset to force
pmcstat to execute on CPU0.

Wojtek

root@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc

Setting pmcstat to any other CPU causes an immediate crash when exiting the
process.


^Croot@thunder_crb4:~ # cpuset -c -l 47 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^C  x0:                0
  x1: ffffffc00a3c99e0
  x2: ffffff800065c00d
  x3: ffffff80007fabf0
  x4: ffffff876b7e1cd0
  x5: ffffff876b7e1480
  x6:         40761000
  x7:              2fe
  x8:         10000284
  x9: ffffff800074e550
 x10:         7ff96938
 x11:             2710
 x12:             2af8
 x13:         7ff93eec
 x14:                b
 x15:             2a4c
 x16:         7ff93fc1
 x17:                b
 x18: ffffff876b7e14f0
 x19: ffffff850dc92d00
 x20: ffffff850dc92d38
 x21: ffffffc004103a00
 x22:                0
 x23:                1
 x24:                0
 x25: ffffff80005eb790
 x26: ffffff876b7e14f0
 x27:                1
 x28: ffffff850dc92d00
 x29: ffffffc004116848
 x30: ffffffc004116848
  sp: ffffff876b7e14f0
  lr:       aa59bd6cae
 elr:       aa59bd6cae
spsr:         20000085
panic: Unknown kernel exception 22 esr_el1 8a000000

cpuid = 0
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
         pc = 0xffffff80005c43dc  lr = 0xffffff8000074798
         sp = 0xffffff876b7e0ff0  fp = 0xffffff876b7e1200

db_trace_self_wrapper() at vpanic+0x144
         pc = 0xffffff8000074798  lr = 0xffffff80002d0f68
         sp = 0xffffff876b7e1210  fp = 0xffffff876b7e1280

vpanic() at panic+0x4c
         pc = 0xffffff80002d0f68  lr = 0xffffff80002d0e20
         sp = 0xffffff876b7e1290  fp = 0xffffff876b7e1310

panic() at do_el1h_sync+0x128
         pc = 0xffffff80002d0e20  lr = 0xffffff80005d762c
         sp = 0xffffff876b7e1320  fp = 0xffffff876b7e1340

do_el1h_sync() at handle_el1h_sync+0x68
         pc = 0xffffff80005d762c  lr = 0xffffff80005c5868
         sp = 0xffffff876b7e1350  fp = 0xffffff876b7e1460

handle_el1h_sync() at 0xaa59bd6caa
         pc = 0xffffff80005c5868  lr = 0x000000aa59bd6caa
         sp = 0xffffff876b7e1470  fp = 0xffffffc004116848

KDB: enter: panic
[ thread pid 766 tid 100065 ]
Stopped at      kdb_enter+0x40:
db>





2015-12-23 17:53 GMT+01:00 Zbigniew Bodek <zbb@semihalf.com>:

> Hi,
>
> I go into new panics although quite similar to the previously
> mentioned in both SMP and UP but this time I tried to use DTrace.
> Can anyone check this on another ARM64 platform? I used ThunderX and
> FreeBSD-CURRENT r292654.
>
> Best regards
> zbb
>
> Please check out below for panic outputs:
>
> SMP:
> --------
> root@thunder_crb4:~ # procsystime
> Tracing... Hit Ctrl-C to end...
> ^C
>
> Elapsed Times for all processes,
>
>       x0:                1
>   x1:                0
>   x2: ffffff800052c911
>   x3:         deadc0d8
>   x4:              1a2
>   x5: ffffff854a976860
>   x6:                0
>   x7:              100
>   x8:         deadc0de
>   x9:                0
>  x10:         c4d4ae68
>  x11:        6c4d4ae68
>  x12:         80000000
>  x13:                0
>  x14:                b
>  x15:             274f
>  x16: ffffff80481a97f8
>  x17: ffffff8048196814
>  x18: ffffff854a9764b0
>  x19:           989680
>  x20: ffffff8000708000
>  x21: ffffff8000702038
>  x22: ffffffc004219000
>  x23: ffffff8000702050
>  x24:          3938700
>  x25:          3938700
>  x26: ffffff80006e2000
>  x27: ffffffc0041129a0
>  x28:          39386ff
>  x29: ffffff854a976530
>  x30: ffffff854a976530
>   sp: ffffff854a9764b0
>   lr: ffffff8000244760
>  elr: ffffff80002448e8
> spsr:         600003c5
>  far:         deadc174
>  esr:         96000007
> panic: data abort in critical section or under mutex
> cpuid = 0
> KDB: enter: panic
> [ thread pid 11 tid 100004 ]
> Stopped at      kdb_enter+0x40:
> db>
>
> UP:
> --------
> root@thunder_crb4:~ # kldload dtraceall
> IMPLEMENT ME: dtrace_toxic_ranges
>   x0: ffffff8000299740
>   x1: ffffffc0040cd100
>   x2: ffffff8000299740
>   x3: ffffff8043655a70
>   x4: ffffff8000299740
>   x5:                0
>   x6: ffffff87cba9c0a8
>   x7:                0
>   x8: ffffffc0129484d0
>   x9:                1
>  x10:                1
>  x11:                0
>  x12: ffffffc003e7ed88
>  x13: ffffffc007ab0280
>  x14:           400000
>  x15: ffffff800068c000
>  x16: ffffff80436687b8
>  x17: ffffff8000299690
>  x18: ffffff87cba9c010
>  x19: ffffff8043655a70
>  x20: ffffff8000299740
>  x21: ffffff8000299740
>  x22: ffffffc0040cd100
>  x23: ffffffc003e7ed40
>  x24:                8
>  x25:                1
>  x26: ffffff8043669000
>  x27: ffffff8043669000
>  x28: ffffff8043669000
>  x29: ffffff87cba9c030
>  x30: ffffff87cba9c030
>   sp: ffffff87cba9c010
>   lr: ffffff80002996c0
>  elr: ffffffc0040cd100
> spsr:         800003c5
> panic: Unknown kernel exception 0 esr_el1 2000000
>
> KDB: enter: panic
> [ thread pid 719 tid 100062 ]
> Stopped at      kdb_enter+0x40:
> db>
>
> 2015-12-14 16:24 GMT+01:00 Zbigniew Bodek <zbb@semihalf.com>:
> > Hello,
> >
> > Did you have time to look into that? Do you have any clues what could
> > be wrong here?
> > We would like to use hwpmc for profiling so your help will be very
> > much appreciated.
> >
> > Best regards
> > zbb
> >
> > 2015-12-09 13:06 GMT+01:00 Zbigniew Bodek <zbb@semihalf.com>:
> >> Hello Ed,
> >>
> >> Done. I also check what happens when SMP is disabled and the kassert
> >> is triggered:
> >>
> >> root@thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc
> >> ^Cpanic: [pmc,4256] cpu 0 didn't find a sample to collect
> >> KDB: stack backtrace:
> >> db_trace_self() at db_trace_self_wrapper+0x28
> >> pc = 0xffffff80004e9aac  lr = 0xffffff800006d8b4
> >> sp = 0xffffff87cba976e0  fp = 0xffffff87cba97800
> >>
> >> db_trace_self_wrapper() at vpanic+0x9c
> >> pc = 0xffffff800006d8b4  lr = 0xffffff800027136c
> >> sp = 0xffffff87cba97810  fp = 0xffffff87cba97880
> >>
> >> vpanic() at kassert_panic+0x160
> >> pc = 0xffffff800027136c  lr = 0xffffff80002712cc
> >> sp = 0xffffff87cba97890  fp = 0xffffff87cba97950
> >>
> >> kassert_panic() at pmc_capture_user_callchain+0x1a4
> >> pc = 0xffffff80002712cc  lr = 0xffffff80000e1444
> >> sp = 0xffffff87cba97960  fp = 0xffffff87cba979c0
> >>
> >> pmc_capture_user_callchain() at pmc_hook_handler+0x7c0
> >> pc = 0xffffff80000e1444  lr = 0xffffff80000dfb78
> >> sp = 0xffffff87cba979d0  fp = 0xffffff87cba97a50
> >>
> >> pmc_hook_handler() at ast+0x14c
> >> pc = 0xffffff80000dfb78  lr = 0xffffff80002b976c
> >> sp = 0xffffff87cba97a60  fp = 0xffffff87cba97a90
> >>
> >> ast() at handle_el0_sync+0x90
> >> pc = 0xffffff80002b976c  lr = 0xffffff80004eb224
> >> sp = 0xffffff87cba97aa0  fp = 0xffffff87cba97bb0
> >>
> >> handle_el0_sync() at 0x406d60
> >> pc = 0xffffff80004eb224  lr = 0x0000000000406d60
> >> sp = 0xffffff87cba97bc0  fp = 0x0000007ffffff540
> >>
> >> KDB: enter: panic
> >> [ thread pid 679 tid 100061 ]
> >> Stopped at      kdb_enter+0x40:
> >> db>
> >>
> >>
> >> when invariants options is disabled I only get:
> >>
> >> root@thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc
> >> ^Cpmcstat: WARNING: sampling was paused at least 1 time.
> >> Please consider tuning the "kern.hwpmc.nsamples" tunable.
> >>
> >>
> >> Best regards
> >> zbb
> >>
> >> 2015-12-08 20:59 GMT+01:00 Ed Maste <emaste@freebsd.org>:
> >>> On 8 December 2015 at 14:34, Zbigniew Bodek <zbb@semihalf.com> wrote:
> >>>> Hello,
> >>>>
> >>>> I encountered some problems with FreeBSD on ARM64 while using hwpmc.
> >>>> Some of the errors that I found are listed below:
> >>>>
> >>>> * panic: Unknown kernel exception 0 esr_el1 2000000
> >>>> * panic: data abort in critical section or under mutex
> >>>> * panic: VFP exception in the kernel
> >>>> * panic: Unknown kernel exception 21 esr_el1 86000006
> >>>
> >>> Can you add these notes to PR 204686? I think there are SMP issues in
> >>> arm64 hwpmc that need to be resolved.
> _______________________________________________
> freebsd-arm@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org"
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANsEV8fJsbLkzOhT_ikycW7%2BKO1OJf83eVWMC84Q3yfvvWySvw>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation