Date: Tue, 26 Jan 2016 10:45:31 +0100 From: Wojciech Macek <wma@semihalf.com> To: Zbigniew Bodek <zbb@semihalf.com> Cc: Ruslan Bukin <br@freebsd.org>, Andrew Turner <andrew@freebsd.org>, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org> Subject: Re: Various panics while using HWPMC on ARM64 Message-ID: <CANsEV8fJsbLkzOhT_ikycW7%2BKO1OJf83eVWMC84Q3yfvvWySvw@mail.gmail.com> In-Reply-To: <CAG7dG%2ByqCQBAgmzAqn_2NT4_b6FGxhMJ3Q%2BDX5c5KapmC_7Rbw@mail.gmail.com> References: <CAG7dG%2BxBF_9bmQkwNsO77b2eOZ1jfOFafey2_Yw=wuJ45TwyMQ@mail.gmail.com> <CAPyFy2APpyUcFTjaZS9j4ZbRo-t=ta6ws-btGLZOqDK1ySPWgw@mail.gmail.com> <CAG7dG%2Bwcbb=%2B2Nvb1iWV=KNUdczXaWr_tWQketHS8XL5757%2BAQ@mail.gmail.com> <CAG7dG%2ByCJ=_OzBG-Pskt522W1ra2BqOxWY4X--vOkLx-qmMt8w@mail.gmail.com> <CAG7dG%2ByqCQBAgmzAqn_2NT4_b6FGxhMJ3Q%2BDX5c5KapmC_7Rbw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I started to look intu this and there is a race condition during pmcstat thread migration - both CPUX and CPU0 (the one that the pmcstat is migrated on) are touching the same structures at the same time. I don't have any fix for that yet, but one might use a simple workaround with cpuset to force pmcstat to execute on CPU0. Wojtek root@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc Setting pmcstat to any other CPU causes an immediate crash when exiting the process. ^Croot@thunder_crb4:~ # cpuset -c -l 47 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^C x0: 0 x1: ffffffc00a3c99e0 x2: ffffff800065c00d x3: ffffff80007fabf0 x4: ffffff876b7e1cd0 x5: ffffff876b7e1480 x6: 40761000 x7: 2fe x8: 10000284 x9: ffffff800074e550 x10: 7ff96938 x11: 2710 x12: 2af8 x13: 7ff93eec x14: b x15: 2a4c x16: 7ff93fc1 x17: b x18: ffffff876b7e14f0 x19: ffffff850dc92d00 x20: ffffff850dc92d38 x21: ffffffc004103a00 x22: 0 x23: 1 x24: 0 x25: ffffff80005eb790 x26: ffffff876b7e14f0 x27: 1 x28: ffffff850dc92d00 x29: ffffffc004116848 x30: ffffffc004116848 sp: ffffff876b7e14f0 lr: aa59bd6cae elr: aa59bd6cae spsr: 20000085 panic: Unknown kernel exception 22 esr_el1 8a000000 cpuid = 0 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc = 0xffffff80005c43dc lr = 0xffffff8000074798 sp = 0xffffff876b7e0ff0 fp = 0xffffff876b7e1200 db_trace_self_wrapper() at vpanic+0x144 pc = 0xffffff8000074798 lr = 0xffffff80002d0f68 sp = 0xffffff876b7e1210 fp = 0xffffff876b7e1280 vpanic() at panic+0x4c pc = 0xffffff80002d0f68 lr = 0xffffff80002d0e20 sp = 0xffffff876b7e1290 fp = 0xffffff876b7e1310 panic() at do_el1h_sync+0x128 pc = 0xffffff80002d0e20 lr = 0xffffff80005d762c sp = 0xffffff876b7e1320 fp = 0xffffff876b7e1340 do_el1h_sync() at handle_el1h_sync+0x68 pc = 0xffffff80005d762c lr = 0xffffff80005c5868 sp = 0xffffff876b7e1350 fp = 0xffffff876b7e1460 handle_el1h_sync() at 0xaa59bd6caa pc = 0xffffff80005c5868 lr = 0x000000aa59bd6caa sp = 0xffffff876b7e1470 fp = 0xffffffc004116848 KDB: enter: panic [ thread pid 766 tid 100065 ] Stopped at kdb_enter+0x40: db> 2015-12-23 17:53 GMT+01:00 Zbigniew Bodek <zbb@semihalf.com>: > Hi, > > I go into new panics although quite similar to the previously > mentioned in both SMP and UP but this time I tried to use DTrace. > Can anyone check this on another ARM64 platform? I used ThunderX and > FreeBSD-CURRENT r292654. > > Best regards > zbb > > Please check out below for panic outputs: > > SMP: > -------- > root@thunder_crb4:~ # procsystime > Tracing... Hit Ctrl-C to end... > ^C > > Elapsed Times for all processes, > > x0: 1 > x1: 0 > x2: ffffff800052c911 > x3: deadc0d8 > x4: 1a2 > x5: ffffff854a976860 > x6: 0 > x7: 100 > x8: deadc0de > x9: 0 > x10: c4d4ae68 > x11: 6c4d4ae68 > x12: 80000000 > x13: 0 > x14: b > x15: 274f > x16: ffffff80481a97f8 > x17: ffffff8048196814 > x18: ffffff854a9764b0 > x19: 989680 > x20: ffffff8000708000 > x21: ffffff8000702038 > x22: ffffffc004219000 > x23: ffffff8000702050 > x24: 3938700 > x25: 3938700 > x26: ffffff80006e2000 > x27: ffffffc0041129a0 > x28: 39386ff > x29: ffffff854a976530 > x30: ffffff854a976530 > sp: ffffff854a9764b0 > lr: ffffff8000244760 > elr: ffffff80002448e8 > spsr: 600003c5 > far: deadc174 > esr: 96000007 > panic: data abort in critical section or under mutex > cpuid = 0 > KDB: enter: panic > [ thread pid 11 tid 100004 ] > Stopped at kdb_enter+0x40: > db> > > UP: > -------- > root@thunder_crb4:~ # kldload dtraceall > IMPLEMENT ME: dtrace_toxic_ranges > x0: ffffff8000299740 > x1: ffffffc0040cd100 > x2: ffffff8000299740 > x3: ffffff8043655a70 > x4: ffffff8000299740 > x5: 0 > x6: ffffff87cba9c0a8 > x7: 0 > x8: ffffffc0129484d0 > x9: 1 > x10: 1 > x11: 0 > x12: ffffffc003e7ed88 > x13: ffffffc007ab0280 > x14: 400000 > x15: ffffff800068c000 > x16: ffffff80436687b8 > x17: ffffff8000299690 > x18: ffffff87cba9c010 > x19: ffffff8043655a70 > x20: ffffff8000299740 > x21: ffffff8000299740 > x22: ffffffc0040cd100 > x23: ffffffc003e7ed40 > x24: 8 > x25: 1 > x26: ffffff8043669000 > x27: ffffff8043669000 > x28: ffffff8043669000 > x29: ffffff87cba9c030 > x30: ffffff87cba9c030 > sp: ffffff87cba9c010 > lr: ffffff80002996c0 > elr: ffffffc0040cd100 > spsr: 800003c5 > panic: Unknown kernel exception 0 esr_el1 2000000 > > KDB: enter: panic > [ thread pid 719 tid 100062 ] > Stopped at kdb_enter+0x40: > db> > > 2015-12-14 16:24 GMT+01:00 Zbigniew Bodek <zbb@semihalf.com>: > > Hello, > > > > Did you have time to look into that? Do you have any clues what could > > be wrong here? > > We would like to use hwpmc for profiling so your help will be very > > much appreciated. > > > > Best regards > > zbb > > > > 2015-12-09 13:06 GMT+01:00 Zbigniew Bodek <zbb@semihalf.com>: > >> Hello Ed, > >> > >> Done. I also check what happens when SMP is disabled and the kassert > >> is triggered: > >> > >> root@thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc > >> ^Cpanic: [pmc,4256] cpu 0 didn't find a sample to collect > >> KDB: stack backtrace: > >> db_trace_self() at db_trace_self_wrapper+0x28 > >> pc = 0xffffff80004e9aac lr = 0xffffff800006d8b4 > >> sp = 0xffffff87cba976e0 fp = 0xffffff87cba97800 > >> > >> db_trace_self_wrapper() at vpanic+0x9c > >> pc = 0xffffff800006d8b4 lr = 0xffffff800027136c > >> sp = 0xffffff87cba97810 fp = 0xffffff87cba97880 > >> > >> vpanic() at kassert_panic+0x160 > >> pc = 0xffffff800027136c lr = 0xffffff80002712cc > >> sp = 0xffffff87cba97890 fp = 0xffffff87cba97950 > >> > >> kassert_panic() at pmc_capture_user_callchain+0x1a4 > >> pc = 0xffffff80002712cc lr = 0xffffff80000e1444 > >> sp = 0xffffff87cba97960 fp = 0xffffff87cba979c0 > >> > >> pmc_capture_user_callchain() at pmc_hook_handler+0x7c0 > >> pc = 0xffffff80000e1444 lr = 0xffffff80000dfb78 > >> sp = 0xffffff87cba979d0 fp = 0xffffff87cba97a50 > >> > >> pmc_hook_handler() at ast+0x14c > >> pc = 0xffffff80000dfb78 lr = 0xffffff80002b976c > >> sp = 0xffffff87cba97a60 fp = 0xffffff87cba97a90 > >> > >> ast() at handle_el0_sync+0x90 > >> pc = 0xffffff80002b976c lr = 0xffffff80004eb224 > >> sp = 0xffffff87cba97aa0 fp = 0xffffff87cba97bb0 > >> > >> handle_el0_sync() at 0x406d60 > >> pc = 0xffffff80004eb224 lr = 0x0000000000406d60 > >> sp = 0xffffff87cba97bc0 fp = 0x0000007ffffff540 > >> > >> KDB: enter: panic > >> [ thread pid 679 tid 100061 ] > >> Stopped at kdb_enter+0x40: > >> db> > >> > >> > >> when invariants options is disabled I only get: > >> > >> root@thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc > >> ^Cpmcstat: WARNING: sampling was paused at least 1 time. > >> Please consider tuning the "kern.hwpmc.nsamples" tunable. > >> > >> > >> Best regards > >> zbb > >> > >> 2015-12-08 20:59 GMT+01:00 Ed Maste <emaste@freebsd.org>: > >>> On 8 December 2015 at 14:34, Zbigniew Bodek <zbb@semihalf.com> wrote: > >>>> Hello, > >>>> > >>>> I encountered some problems with FreeBSD on ARM64 while using hwpmc. > >>>> Some of the errors that I found are listed below: > >>>> > >>>> * panic: Unknown kernel exception 0 esr_el1 2000000 > >>>> * panic: data abort in critical section or under mutex > >>>> * panic: VFP exception in the kernel > >>>> * panic: Unknown kernel exception 21 esr_el1 86000006 > >>> > >>> Can you add these notes to PR 204686? I think there are SMP issues in > >>> arm64 hwpmc that need to be resolved. > _______________________________________________ > freebsd-arm@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arm > To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANsEV8fJsbLkzOhT_ikycW7%2BKO1OJf83eVWMC84Q3yfvvWySvw>