From owner-freebsd-arm@freebsd.org  Tue Jan 26 09:45:52 2016
Return-Path: <owner-freebsd-arm@freebsd.org>
Delivered-To: freebsd-arm@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E03D6A46211
 for <freebsd-arm@mailman.ysv.freebsd.org>;
 Tue, 26 Jan 2016 09:45:51 +0000 (UTC)
 (envelope-from wma@semihalf.com)
Received: from mail-io0-x22e.google.com (mail-io0-x22e.google.com
 [IPv6:2607:f8b0:4001:c06::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B1CFEECE
 for <freebsd-arm@freebsd.org>; Tue, 26 Jan 2016 09:45:51 +0000 (UTC)
 (envelope-from wma@semihalf.com)
Received: by mail-io0-x22e.google.com with SMTP id q21so180215162iod.0
 for <freebsd-arm@freebsd.org>; Tue, 26 Jan 2016 01:45:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=semihalf-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=FpPnaqhMGVJvpuwiN8V3FZwBcP2dOGa4RIQOKBYhEqw=;
 b=zVqz5kO61QpTU/KN9ds2FArtbFTevMeYMY38VIHtKtJeFtVeLeq4DT9nBy3ColPiRR
 nvOfY71vKsaiY4pM8bjyeAsTh0/D6fkbvTNt6C5ICjpwybCn7hzpydlhucGAPxsi7S3M
 pF+2Y6egzlG6yryBSeB7b8XHnkVZiaCpebst2dDejfkFUCJaIhNtOpyctCVAJxj0R/n5
 mzfk8gppymW9899Yq+cZYunuJNnjLLuzoWjlfw9entiUPjREoNiXWbiQlcR56S7z2p3A
 6C3W5aXhy6GxJOj5Y+/kqgjOdpYKCRcb4gvLEbmvr7lAGNlX3kDuPBwWUrEFR9uBeEk/
 PImQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-type;
 bh=FpPnaqhMGVJvpuwiN8V3FZwBcP2dOGa4RIQOKBYhEqw=;
 b=mOvVFbxQjqbGooyYXGJtFXzqZPzqjksGvEBbu/D4o2qjCW1dN9m5hsp2XquBsdooym
 pBsXPsB7wztCBbvLSpv/9Ty7bEa0IrhCPIDTGQq2xpM+VLuuhz8bXlqXmpTCPTZD9onJ
 07bm1uP5sVVLYWy20nMuaNEc9jeyXDmme57N6vIEsc6ytz6K2Kyyd1FaoYzY6bBObZGF
 6zUoFwHTKWguymlgvSEYqymfilBHy4CaYSyFNw65fb71L9WYWKh+VM+trzfta+ym/ayt
 O44CTbF3fDWaG7gzSVDyeAPys4vvxrd4FPKC0KaXUm1ll8kTvaKB6Fp7Ujzlh/uIJmW9
 spyg==
X-Gm-Message-State: AG10YOTmNQiGJWdklL/jHwnsTh18BediwwrG1P8+7wFXv8miu1ibAI8HZWXWEJhe4I/6MyltAIAHSaxWkBEmVQ==
X-Received: by 10.107.34.201 with SMTP id i192mr21690321ioi.161.1453801550660; 
 Tue, 26 Jan 2016 01:45:50 -0800 (PST)
MIME-Version: 1.0
Received: by 10.107.135.82 with HTTP; Tue, 26 Jan 2016 01:45:31 -0800 (PST)
In-Reply-To: <CAG7dG+yqCQBAgmzAqn_2NT4_b6FGxhMJ3Q+DX5c5KapmC_7Rbw@mail.gmail.com>
References: <CAG7dG+xBF_9bmQkwNsO77b2eOZ1jfOFafey2_Yw=wuJ45TwyMQ@mail.gmail.com>
 <CAPyFy2APpyUcFTjaZS9j4ZbRo-t=ta6ws-btGLZOqDK1ySPWgw@mail.gmail.com>
 <CAG7dG+wcbb=+2Nvb1iWV=KNUdczXaWr_tWQketHS8XL5757+AQ@mail.gmail.com>
 <CAG7dG+yCJ=_OzBG-Pskt522W1ra2BqOxWY4X--vOkLx-qmMt8w@mail.gmail.com>
 <CAG7dG+yqCQBAgmzAqn_2NT4_b6FGxhMJ3Q+DX5c5KapmC_7Rbw@mail.gmail.com>
From: Wojciech Macek <wma@semihalf.com>
Date: Tue, 26 Jan 2016 10:45:31 +0100
Message-ID: <CANsEV8fJsbLkzOhT_ikycW7+KO1OJf83eVWMC84Q3yfvvWySvw@mail.gmail.com>
Subject: Re: Various panics while using HWPMC on ARM64
To: Zbigniew Bodek <zbb@semihalf.com>
Cc: Ruslan Bukin <br@freebsd.org>, Andrew Turner <andrew@freebsd.org>, 
 "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: "Porting FreeBSD to ARM processors." <freebsd-arm.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm/>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jan 2016 09:45:52 -0000

I started to look intu this and there is a race condition during pmcstat
thread migration - both CPUX and CPU0 (the one that the pmcstat is migrated
on) are touching the same structures at the same time. I don't have any fix
for that yet, but one might use a simple workaround with cpuset to force
pmcstat to execute on CPU0.

Wojtek

root@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc

Setting pmcstat to any other CPU causes an immediate crash when exiting the
process.


^Croot@thunder_crb4:~ # cpuset -c -l 47 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^C  x0:                0
  x1: ffffffc00a3c99e0
  x2: ffffff800065c00d
  x3: ffffff80007fabf0
  x4: ffffff876b7e1cd0
  x5: ffffff876b7e1480
  x6:         40761000
  x7:              2fe
  x8:         10000284
  x9: ffffff800074e550
 x10:         7ff96938
 x11:             2710
 x12:             2af8
 x13:         7ff93eec
 x14:                b
 x15:             2a4c
 x16:         7ff93fc1
 x17:                b
 x18: ffffff876b7e14f0
 x19: ffffff850dc92d00
 x20: ffffff850dc92d38
 x21: ffffffc004103a00
 x22:                0
 x23:                1
 x24:                0
 x25: ffffff80005eb790
 x26: ffffff876b7e14f0
 x27:                1
 x28: ffffff850dc92d00
 x29: ffffffc004116848
 x30: ffffffc004116848
  sp: ffffff876b7e14f0
  lr:       aa59bd6cae
 elr:       aa59bd6cae
spsr:         20000085
panic: Unknown kernel exception 22 esr_el1 8a000000

cpuid = 0
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
         pc = 0xffffff80005c43dc  lr = 0xffffff8000074798
         sp = 0xffffff876b7e0ff0  fp = 0xffffff876b7e1200

db_trace_self_wrapper() at vpanic+0x144
         pc = 0xffffff8000074798  lr = 0xffffff80002d0f68
         sp = 0xffffff876b7e1210  fp = 0xffffff876b7e1280

vpanic() at panic+0x4c
         pc = 0xffffff80002d0f68  lr = 0xffffff80002d0e20
         sp = 0xffffff876b7e1290  fp = 0xffffff876b7e1310

panic() at do_el1h_sync+0x128
         pc = 0xffffff80002d0e20  lr = 0xffffff80005d762c
         sp = 0xffffff876b7e1320  fp = 0xffffff876b7e1340

do_el1h_sync() at handle_el1h_sync+0x68
         pc = 0xffffff80005d762c  lr = 0xffffff80005c5868
         sp = 0xffffff876b7e1350  fp = 0xffffff876b7e1460

handle_el1h_sync() at 0xaa59bd6caa
         pc = 0xffffff80005c5868  lr = 0x000000aa59bd6caa
         sp = 0xffffff876b7e1470  fp = 0xffffffc004116848

KDB: enter: panic
[ thread pid 766 tid 100065 ]
Stopped at      kdb_enter+0x40:
db>


2015-12-23 17:53 GMT+01:00 Zbigniew Bodek <zbb@semihalf.com>:

> Hi,
>
> I go into new panics although quite similar to the previously
> mentioned in both SMP and UP but this time I tried to use DTrace.
> Can anyone check this on another ARM64 platform? I used ThunderX and
> FreeBSD-CURRENT r292654.
>
> Best regards
> zbb
>
> Please check out below for panic outputs:
>
> SMP:
> --------
> root@thunder_crb4:~ # procsystime
> Tracing... Hit Ctrl-C to end...
> ^C
>
> Elapsed Times for all processes,
>
>       x0:                1
>   x1:                0
>   x2: ffffff800052c911
>   x3:         deadc0d8
>   x4:              1a2
>   x5: ffffff854a976860
>   x6:                0
>   x7:              100
>   x8:         deadc0de
>   x9:                0
>  x10:         c4d4ae68
>  x11:        6c4d4ae68
>  x12:         80000000
>  x13:                0
>  x14:                b
>  x15:             274f
>  x16: ffffff80481a97f8
>  x17: ffffff8048196814
>  x18: ffffff854a9764b0
>  x19:           989680
>  x20: ffffff8000708000
>  x21: ffffff8000702038
>  x22: ffffffc004219000
>  x23: ffffff8000702050
>  x24:          3938700
>  x25:          3938700
>  x26: ffffff80006e2000
>  x27: ffffffc0041129a0
>  x28:          39386ff
>  x29: ffffff854a976530
>  x30: ffffff854a976530
>   sp: ffffff854a9764b0
>   lr: ffffff8000244760
>  elr: ffffff80002448e8
> spsr:         600003c5
>  far:         deadc174
>  esr:         96000007
> panic: data abort in critical section or under mutex
> cpuid = 0
> KDB: enter: panic
> [ thread pid 11 tid 100004 ]
> Stopped at      kdb_enter+0x40:
> db>
>
> UP:
> --------
> root@thunder_crb4:~ # kldload dtraceall
> IMPLEMENT ME: dtrace_toxic_ranges
>   x0: ffffff8000299740
>   x1: ffffffc0040cd100
>   x2: ffffff8000299740
>   x3: ffffff8043655a70
>   x4: ffffff8000299740
>   x5:                0
>   x6: ffffff87cba9c0a8
>   x7:                0
>   x8: ffffffc0129484d0
>   x9:                1
>  x10:                1
>  x11:                0
>  x12: ffffffc003e7ed88
>  x13: ffffffc007ab0280
>  x14:           400000
>  x15: ffffff800068c000
>  x16: ffffff80436687b8
>  x17: ffffff8000299690
>  x18: ffffff87cba9c010
>  x19: ffffff8043655a70
>  x20: ffffff8000299740
>  x21: ffffff8000299740
>  x22: ffffffc0040cd100
>  x23: ffffffc003e7ed40
>  x24:                8
>  x25:                1
>  x26: ffffff8043669000
>  x27: ffffff8043669000
>  x28: ffffff8043669000
>  x29: ffffff87cba9c030
>  x30: ffffff87cba9c030
>   sp: ffffff87cba9c010
>   lr: ffffff80002996c0
>  elr: ffffffc0040cd100
> spsr:         800003c5
> panic: Unknown kernel exception 0 esr_el1 2000000
>
> KDB: enter: panic
> [ thread pid 719 tid 100062 ]
> Stopped at      kdb_enter+0x40:
> db>
>
> 2015-12-14 16:24 GMT+01:00 Zbigniew Bodek <zbb@semihalf.com>:
> > Hello,
> >
> > Did you have time to look into that? Do you have any clues what could
> > be wrong here?
> > We would like to use hwpmc for profiling so your help will be very
> > much appreciated.
> >
> > Best regards
> > zbb
> >
> > 2015-12-09 13:06 GMT+01:00 Zbigniew Bodek <zbb@semihalf.com>:
> >> Hello Ed,
> >>
> >> Done. I also check what happens when SMP is disabled and the kassert
> >> is triggered:
> >>
> >> root@thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc
> >> ^Cpanic: [pmc,4256] cpu 0 didn't find a sample to collect
> >> KDB: stack backtrace:
> >> db_trace_self() at db_trace_self_wrapper+0x28
> >> pc = 0xffffff80004e9aac  lr = 0xffffff800006d8b4
> >> sp = 0xffffff87cba976e0  fp = 0xffffff87cba97800
> >>
> >> db_trace_self_wrapper() at vpanic+0x9c
> >> pc = 0xffffff800006d8b4  lr = 0xffffff800027136c
> >> sp = 0xffffff87cba97810  fp = 0xffffff87cba97880
> >>
> >> vpanic() at kassert_panic+0x160
> >> pc = 0xffffff800027136c  lr = 0xffffff80002712cc
> >> sp = 0xffffff87cba97890  fp = 0xffffff87cba97950
> >>
> >> kassert_panic() at pmc_capture_user_callchain+0x1a4
> >> pc = 0xffffff80002712cc  lr = 0xffffff80000e1444
> >> sp = 0xffffff87cba97960  fp = 0xffffff87cba979c0
> >>
> >> pmc_capture_user_callchain() at pmc_hook_handler+0x7c0
> >> pc = 0xffffff80000e1444  lr = 0xffffff80000dfb78
> >> sp = 0xffffff87cba979d0  fp = 0xffffff87cba97a50
> >>
> >> pmc_hook_handler() at ast+0x14c
> >> pc = 0xffffff80000dfb78  lr = 0xffffff80002b976c
> >> sp = 0xffffff87cba97a60  fp = 0xffffff87cba97a90
> >>
> >> ast() at handle_el0_sync+0x90
> >> pc = 0xffffff80002b976c  lr = 0xffffff80004eb224
> >> sp = 0xffffff87cba97aa0  fp = 0xffffff87cba97bb0
> >>
> >> handle_el0_sync() at 0x406d60
> >> pc = 0xffffff80004eb224  lr = 0x0000000000406d60
> >> sp = 0xffffff87cba97bc0  fp = 0x0000007ffffff540
> >>
> >> KDB: enter: panic
> >> [ thread pid 679 tid 100061 ]
> >> Stopped at      kdb_enter+0x40:
> >> db>
> >>
> >>
> >> when invariants options is disabled I only get:
> >>
> >> root@thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc
> >> ^Cpmcstat: WARNING: sampling was paused at least 1 time.
> >> Please consider tuning the "kern.hwpmc.nsamples" tunable.
> >>
> >>
> >> Best regards
> >> zbb
> >>
> >> 2015-12-08 20:59 GMT+01:00 Ed Maste <emaste@freebsd.org>:
> >>> On 8 December 2015 at 14:34, Zbigniew Bodek <zbb@semihalf.com> wrote:
> >>>> Hello,
> >>>>
> >>>> I encountered some problems with FreeBSD on ARM64 while using hwpmc.
> >>>> Some of the errors that I found are listed below:
> >>>>
> >>>> * panic: Unknown kernel exception 0 esr_el1 2000000
> >>>> * panic: data abort in critical section or under mutex
> >>>> * panic: VFP exception in the kernel
> >>>> * panic: Unknown kernel exception 21 esr_el1 86000006
> >>>
> >>> Can you add these notes to PR 204686? I think there are SMP issues in
> >>> arm64 hwpmc that need to be resolved.
> _______________________________________________
> freebsd-arm@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org"
>