From owner-freebsd-arm@freebsd.org Tue Jan 26 09:45:52 2016 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E03D6A46211 for ; Tue, 26 Jan 2016 09:45:51 +0000 (UTC) (envelope-from wma@semihalf.com) Received: from mail-io0-x22e.google.com (mail-io0-x22e.google.com [IPv6:2607:f8b0:4001:c06::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B1CFEECE for ; Tue, 26 Jan 2016 09:45:51 +0000 (UTC) (envelope-from wma@semihalf.com) Received: by mail-io0-x22e.google.com with SMTP id q21so180215162iod.0 for ; Tue, 26 Jan 2016 01:45:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=FpPnaqhMGVJvpuwiN8V3FZwBcP2dOGa4RIQOKBYhEqw=; b=zVqz5kO61QpTU/KN9ds2FArtbFTevMeYMY38VIHtKtJeFtVeLeq4DT9nBy3ColPiRR nvOfY71vKsaiY4pM8bjyeAsTh0/D6fkbvTNt6C5ICjpwybCn7hzpydlhucGAPxsi7S3M pF+2Y6egzlG6yryBSeB7b8XHnkVZiaCpebst2dDejfkFUCJaIhNtOpyctCVAJxj0R/n5 mzfk8gppymW9899Yq+cZYunuJNnjLLuzoWjlfw9entiUPjREoNiXWbiQlcR56S7z2p3A 6C3W5aXhy6GxJOj5Y+/kqgjOdpYKCRcb4gvLEbmvr7lAGNlX3kDuPBwWUrEFR9uBeEk/ PImQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=FpPnaqhMGVJvpuwiN8V3FZwBcP2dOGa4RIQOKBYhEqw=; b=mOvVFbxQjqbGooyYXGJtFXzqZPzqjksGvEBbu/D4o2qjCW1dN9m5hsp2XquBsdooym pBsXPsB7wztCBbvLSpv/9Ty7bEa0IrhCPIDTGQq2xpM+VLuuhz8bXlqXmpTCPTZD9onJ 07bm1uP5sVVLYWy20nMuaNEc9jeyXDmme57N6vIEsc6ytz6K2Kyyd1FaoYzY6bBObZGF 6zUoFwHTKWguymlgvSEYqymfilBHy4CaYSyFNw65fb71L9WYWKh+VM+trzfta+ym/ayt O44CTbF3fDWaG7gzSVDyeAPys4vvxrd4FPKC0KaXUm1ll8kTvaKB6Fp7Ujzlh/uIJmW9 spyg== X-Gm-Message-State: AG10YOTmNQiGJWdklL/jHwnsTh18BediwwrG1P8+7wFXv8miu1ibAI8HZWXWEJhe4I/6MyltAIAHSaxWkBEmVQ== X-Received: by 10.107.34.201 with SMTP id i192mr21690321ioi.161.1453801550660; Tue, 26 Jan 2016 01:45:50 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.135.82 with HTTP; Tue, 26 Jan 2016 01:45:31 -0800 (PST) In-Reply-To: References: From: Wojciech Macek Date: Tue, 26 Jan 2016 10:45:31 +0100 Message-ID: Subject: Re: Various panics while using HWPMC on ARM64 To: Zbigniew Bodek Cc: Ruslan Bukin , Andrew Turner , "freebsd-arm@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jan 2016 09:45:52 -0000 I started to look intu this and there is a race condition during pmcstat thread migration - both CPUX and CPU0 (the one that the pmcstat is migrated on) are touching the same structures at the same time. I don't have any fix for that yet, but one might use a simple workaround with cpuset to force pmcstat to execute on CPU0. Wojtek root@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^Croot@thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc Setting pmcstat to any other CPU causes an immediate crash when exiting the process. ^Croot@thunder_crb4:~ # cpuset -c -l 47 pmcstat -N -S CPU_CYCLES -O cpu_cycles.pmc ^C x0: 0 x1: ffffffc00a3c99e0 x2: ffffff800065c00d x3: ffffff80007fabf0 x4: ffffff876b7e1cd0 x5: ffffff876b7e1480 x6: 40761000 x7: 2fe x8: 10000284 x9: ffffff800074e550 x10: 7ff96938 x11: 2710 x12: 2af8 x13: 7ff93eec x14: b x15: 2a4c x16: 7ff93fc1 x17: b x18: ffffff876b7e14f0 x19: ffffff850dc92d00 x20: ffffff850dc92d38 x21: ffffffc004103a00 x22: 0 x23: 1 x24: 0 x25: ffffff80005eb790 x26: ffffff876b7e14f0 x27: 1 x28: ffffff850dc92d00 x29: ffffffc004116848 x30: ffffffc004116848 sp: ffffff876b7e14f0 lr: aa59bd6cae elr: aa59bd6cae spsr: 20000085 panic: Unknown kernel exception 22 esr_el1 8a000000 cpuid = 0 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc = 0xffffff80005c43dc lr = 0xffffff8000074798 sp = 0xffffff876b7e0ff0 fp = 0xffffff876b7e1200 db_trace_self_wrapper() at vpanic+0x144 pc = 0xffffff8000074798 lr = 0xffffff80002d0f68 sp = 0xffffff876b7e1210 fp = 0xffffff876b7e1280 vpanic() at panic+0x4c pc = 0xffffff80002d0f68 lr = 0xffffff80002d0e20 sp = 0xffffff876b7e1290 fp = 0xffffff876b7e1310 panic() at do_el1h_sync+0x128 pc = 0xffffff80002d0e20 lr = 0xffffff80005d762c sp = 0xffffff876b7e1320 fp = 0xffffff876b7e1340 do_el1h_sync() at handle_el1h_sync+0x68 pc = 0xffffff80005d762c lr = 0xffffff80005c5868 sp = 0xffffff876b7e1350 fp = 0xffffff876b7e1460 handle_el1h_sync() at 0xaa59bd6caa pc = 0xffffff80005c5868 lr = 0x000000aa59bd6caa sp = 0xffffff876b7e1470 fp = 0xffffffc004116848 KDB: enter: panic [ thread pid 766 tid 100065 ] Stopped at kdb_enter+0x40: db> 2015-12-23 17:53 GMT+01:00 Zbigniew Bodek : > Hi, > > I go into new panics although quite similar to the previously > mentioned in both SMP and UP but this time I tried to use DTrace. > Can anyone check this on another ARM64 platform? I used ThunderX and > FreeBSD-CURRENT r292654. > > Best regards > zbb > > Please check out below for panic outputs: > > SMP: > -------- > root@thunder_crb4:~ # procsystime > Tracing... Hit Ctrl-C to end... > ^C > > Elapsed Times for all processes, > > x0: 1 > x1: 0 > x2: ffffff800052c911 > x3: deadc0d8 > x4: 1a2 > x5: ffffff854a976860 > x6: 0 > x7: 100 > x8: deadc0de > x9: 0 > x10: c4d4ae68 > x11: 6c4d4ae68 > x12: 80000000 > x13: 0 > x14: b > x15: 274f > x16: ffffff80481a97f8 > x17: ffffff8048196814 > x18: ffffff854a9764b0 > x19: 989680 > x20: ffffff8000708000 > x21: ffffff8000702038 > x22: ffffffc004219000 > x23: ffffff8000702050 > x24: 3938700 > x25: 3938700 > x26: ffffff80006e2000 > x27: ffffffc0041129a0 > x28: 39386ff > x29: ffffff854a976530 > x30: ffffff854a976530 > sp: ffffff854a9764b0 > lr: ffffff8000244760 > elr: ffffff80002448e8 > spsr: 600003c5 > far: deadc174 > esr: 96000007 > panic: data abort in critical section or under mutex > cpuid = 0 > KDB: enter: panic > [ thread pid 11 tid 100004 ] > Stopped at kdb_enter+0x40: > db> > > UP: > -------- > root@thunder_crb4:~ # kldload dtraceall > IMPLEMENT ME: dtrace_toxic_ranges > x0: ffffff8000299740 > x1: ffffffc0040cd100 > x2: ffffff8000299740 > x3: ffffff8043655a70 > x4: ffffff8000299740 > x5: 0 > x6: ffffff87cba9c0a8 > x7: 0 > x8: ffffffc0129484d0 > x9: 1 > x10: 1 > x11: 0 > x12: ffffffc003e7ed88 > x13: ffffffc007ab0280 > x14: 400000 > x15: ffffff800068c000 > x16: ffffff80436687b8 > x17: ffffff8000299690 > x18: ffffff87cba9c010 > x19: ffffff8043655a70 > x20: ffffff8000299740 > x21: ffffff8000299740 > x22: ffffffc0040cd100 > x23: ffffffc003e7ed40 > x24: 8 > x25: 1 > x26: ffffff8043669000 > x27: ffffff8043669000 > x28: ffffff8043669000 > x29: ffffff87cba9c030 > x30: ffffff87cba9c030 > sp: ffffff87cba9c010 > lr: ffffff80002996c0 > elr: ffffffc0040cd100 > spsr: 800003c5 > panic: Unknown kernel exception 0 esr_el1 2000000 > > KDB: enter: panic > [ thread pid 719 tid 100062 ] > Stopped at kdb_enter+0x40: > db> > > 2015-12-14 16:24 GMT+01:00 Zbigniew Bodek : > > Hello, > > > > Did you have time to look into that? Do you have any clues what could > > be wrong here? > > We would like to use hwpmc for profiling so your help will be very > > much appreciated. > > > > Best regards > > zbb > > > > 2015-12-09 13:06 GMT+01:00 Zbigniew Bodek : > >> Hello Ed, > >> > >> Done. I also check what happens when SMP is disabled and the kassert > >> is triggered: > >> > >> root@thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc > >> ^Cpanic: [pmc,4256] cpu 0 didn't find a sample to collect > >> KDB: stack backtrace: > >> db_trace_self() at db_trace_self_wrapper+0x28 > >> pc = 0xffffff80004e9aac lr = 0xffffff800006d8b4 > >> sp = 0xffffff87cba976e0 fp = 0xffffff87cba97800 > >> > >> db_trace_self_wrapper() at vpanic+0x9c > >> pc = 0xffffff800006d8b4 lr = 0xffffff800027136c > >> sp = 0xffffff87cba97810 fp = 0xffffff87cba97880 > >> > >> vpanic() at kassert_panic+0x160 > >> pc = 0xffffff800027136c lr = 0xffffff80002712cc > >> sp = 0xffffff87cba97890 fp = 0xffffff87cba97950 > >> > >> kassert_panic() at pmc_capture_user_callchain+0x1a4 > >> pc = 0xffffff80002712cc lr = 0xffffff80000e1444 > >> sp = 0xffffff87cba97960 fp = 0xffffff87cba979c0 > >> > >> pmc_capture_user_callchain() at pmc_hook_handler+0x7c0 > >> pc = 0xffffff80000e1444 lr = 0xffffff80000dfb78 > >> sp = 0xffffff87cba979d0 fp = 0xffffff87cba97a50 > >> > >> pmc_hook_handler() at ast+0x14c > >> pc = 0xffffff80000dfb78 lr = 0xffffff80002b976c > >> sp = 0xffffff87cba97a60 fp = 0xffffff87cba97a90 > >> > >> ast() at handle_el0_sync+0x90 > >> pc = 0xffffff80002b976c lr = 0xffffff80004eb224 > >> sp = 0xffffff87cba97aa0 fp = 0xffffff87cba97bb0 > >> > >> handle_el0_sync() at 0x406d60 > >> pc = 0xffffff80004eb224 lr = 0x0000000000406d60 > >> sp = 0xffffff87cba97bc0 fp = 0x0000007ffffff540 > >> > >> KDB: enter: panic > >> [ thread pid 679 tid 100061 ] > >> Stopped at kdb_enter+0x40: > >> db> > >> > >> > >> when invariants options is disabled I only get: > >> > >> root@thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc > >> ^Cpmcstat: WARNING: sampling was paused at least 1 time. > >> Please consider tuning the "kern.hwpmc.nsamples" tunable. > >> > >> > >> Best regards > >> zbb > >> > >> 2015-12-08 20:59 GMT+01:00 Ed Maste : > >>> On 8 December 2015 at 14:34, Zbigniew Bodek wrote: > >>>> Hello, > >>>> > >>>> I encountered some problems with FreeBSD on ARM64 while using hwpmc. > >>>> Some of the errors that I found are listed below: > >>>> > >>>> * panic: Unknown kernel exception 0 esr_el1 2000000 > >>>> * panic: data abort in critical section or under mutex > >>>> * panic: VFP exception in the kernel > >>>> * panic: Unknown kernel exception 21 esr_el1 86000006 > >>> > >>> Can you add these notes to PR 204686? I think there are SMP issues in > >>> arm64 hwpmc that need to be resolved. > _______________________________________________ > freebsd-arm@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arm > To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org" >