Date: Mon, 16 Dec 2019 20:39:52 +0100 From: Mateusz Guzik <mjguzik@gmail.com> To: Jeff Roberson <jroberson@jroberson.net> Cc: Ryan Libby <rlibby@freebsd.org>, Ed Maste <emaste@freebsd.org>, Jeff Roberson <jeff@freebsd.org>, src-committers <src-committers@freebsd.org>, svn-src-all <svn-src-all@freebsd.org>, svn-src-head <svn-src-head@freebsd.org>, Mark Johnston <markj@freebsd.org>, Konstantin Belousov <kib@freebsd.org> Subject: Re: svn commit: r355784 - in head/sys: compat/linuxkpi/common/src dev/dpaa kern mips/nlm sys Message-ID: <CAGudoHGo1bRpzmGuRB38%2BOZPpdNU2c2ksbZFM7mhaQF1PeDg=g@mail.gmail.com> In-Reply-To: <alpine.BSF.2.21.9999.1912160748510.1198@desktop> References: <201912152126.xBFLQoJ7037440@repo.freebsd.org> <CAPyFy2BcUzxaOkpkf9M7ZPOENSHuBsLm6CRgV2WSybCNCbSTXQ@mail.gmail.com> <CAHgpiFxNTa%2Byz4YdvCqzE-GfxTNOB%2BNZOc%2BYy61zVxVF-4P-7Q@mail.gmail.com> <alpine.BSF.2.21.9999.1912160748510.1198@desktop>
next in thread | previous in thread | raw e-mail | index | archive | help
The entire thing reads like a bug -- for each platform critical_exit can venture into mi_switch with interrupts disabled. On amd64 it just happens to do it after the count was updated, but still before they got-reenabled. Other platform also do critical enter/exit dance on each call instead of just on the 0<->1 transition of md_spinlock_count. iow each platform should be probably doing something like this instead: diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c index 7aadd86e3cd6..c90fe8ef961d 100644 --- a/sys/amd64/amd64/machdep.c +++ b/sys/amd64/amd64/machdep.c @@ -1991,10 +1991,10 @@ spinlock_enter(void) td = curthread; if (td->td_md.md_spinlock_count == 0) { + critical_enter(); flags = intr_disable(); td->td_md.md_spinlock_count = 1; td->td_md.md_saved_flags = flags; - critical_enter(); } else td->td_md.md_spinlock_count++; } @@ -2009,8 +2009,8 @@ spinlock_exit(void) flags = td->td_md.md_saved_flags; td->td_md.md_spinlock_count--; if (td->td_md.md_spinlock_count == 0) { - critical_exit(); intr_restore(flags); + critical_exit(); } } On 12/16/19, Jeff Roberson <jroberson@jroberson.net> wrote: > On Mon, 16 Dec 2019, Ryan Libby wrote: > >> On Mon, Dec 16, 2019 at 7:30 AM Ed Maste <emaste@freebsd.org> wrote: >>> >>> On Sun, 15 Dec 2019 at 16:27, Jeff Roberson <jeff@freebsd.org> wrote: >>>> >>>> Author: jeff >>>> Date: Sun Dec 15 21:26:50 2019 >>>> New Revision: 355784 >>>> URL: https://svnweb.freebsd.org/changeset/base/355784 >>>> >>>> Log: >>>> schedlock 4/4 >>> >>> FYI i386, arm, arm64, riscv fail to boot now, with "panic: invalid count >>> 2" >>> >>> Boot logs: >>> i386: https://ci.freebsd.org/job/FreeBSD-head-i386-test/7797/console >>> arm: >>> https://ci.freebsd.org/hwlab/job/FreeBSD-device-head-beaglebone-test/1317/artifact/device_tests/beaglebone.boot.log >>> arm64: >>> https://ci.freebsd.org/hwlab/job/FreeBSD-device-head-pinea64-test/1194/artifact/device_tests/pinea64.boot.log >>> riscv: >>> https://ci.freebsd.org/hwlab/job/FreeBSD-device-head-pinea64-test/1194/artifact/device_tests/pinea64.boot.log >>> >>> arm64 is: >>> >>> panic: invalid count 2 >>> cpuid = 0 >>> time = 1 >>> KDB: stack backtrace: >>> db_trace_self() at db_trace_self_wrapper+0x28 >>> pc = 0xffff0000007359ec lr = 0xffff000000106744 >>> sp = 0xffff000056b063c0 fp = 0xffff000056b065d0 >>> >>> db_trace_self_wrapper() at vpanic+0x18c >>> pc = 0xffff000000106744 lr = 0xffff000000408128 >>> sp = 0xffff000056b065e0 fp = 0xffff000056b06690 >>> >>> vpanic() at panic+0x44 >>> pc = 0xffff000000408128 lr = 0xffff000000407ed8 >>> sp = 0xffff000056b066a0 fp = 0xffff000056b06720 >>> >>> panic() at sched_switch+0x81c >>> pc = 0xffff000000407ed8 lr = 0xffff000000434264 >>> sp = 0xffff000056b06730 fp = 0xffff000056b06810 >>> >>> sched_switch() at mi_switch+0x170 >>> pc = 0xffff000000434264 lr = 0xffff000000413690 >>> sp = 0xffff000056b06820 fp = 0xffff000056b06840 >>> >>> mi_switch() at cpu_idle+0xc8 >>> pc = 0xffff000000413690 lr = 0xffff0000007400a0 >>> sp = 0xffff000056b06850 fp = 0xffff000056b06860 >>> >>> cpu_idle() at sched_idletd+0x380 >>> pc = 0xffff0000007400a0 lr = 0xffff000000436a90 >>> sp = 0xffff000056b06870 fp = 0xffff000056b06940 >>> >>> sched_idletd() at fork_exit+0x7c >>> pc = 0xffff000000436a90 lr = 0xffff0000003c7ba4 >>> sp = 0xffff000056b06950 fp = 0xffff000056b06980 >>> >>> fork_exit() at fork_trampoline+0x10 >>> pc = 0xffff0000003c7ba4 lr = 0xffff0000007521ac >>> sp = 0xffff000056b06990 fp = 0x0000000000000000 >>> >>> KDB: enter: panic >>> [ thread pid 11 tid 100003 ] >>> Stopped at 0 >>> db> >> >> It looks like amd64 vs i386, riscv, etc are using different motifs in >> spinlock_exit(). Perhaps we just need to rearrange them to drop the >> spinlock count before critical_exit(), like in amd64. > > It took me a moment to see why but I believe you are right. Interrupts > being disabled would prevent a local preemption with the flags out of sync > but critical_exit() might have owepreempt set so we will switch before > updating the count. > > Jeff > >> >> Ryan >> > -- Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHGo1bRpzmGuRB38%2BOZPpdNU2c2ksbZFM7mhaQF1PeDg=g>