Date: Mon, 16 Dec 2019 20:39:52 +0100 From: Mateusz Guzik <mjguzik@gmail.com> To: Jeff Roberson <jroberson@jroberson.net> Cc: Ryan Libby <rlibby@freebsd.org>, Ed Maste <emaste@freebsd.org>, Jeff Roberson <jeff@freebsd.org>, src-committers <src-committers@freebsd.org>, svn-src-all <svn-src-all@freebsd.org>, svn-src-head <svn-src-head@freebsd.org>, Mark Johnston <markj@freebsd.org>, Konstantin Belousov <kib@freebsd.org> Subject: Re: svn commit: r355784 - in head/sys: compat/linuxkpi/common/src dev/dpaa kern mips/nlm sys Message-ID: <CAGudoHGo1bRpzmGuRB38%2BOZPpdNU2c2ksbZFM7mhaQF1PeDg=g@mail.gmail.com> In-Reply-To: <alpine.BSF.2.21.9999.1912160748510.1198@desktop> References: <201912152126.xBFLQoJ7037440@repo.freebsd.org> <CAPyFy2BcUzxaOkpkf9M7ZPOENSHuBsLm6CRgV2WSybCNCbSTXQ@mail.gmail.com> <CAHgpiFxNTa%2Byz4YdvCqzE-GfxTNOB%2BNZOc%2BYy61zVxVF-4P-7Q@mail.gmail.com> <alpine.BSF.2.21.9999.1912160748510.1198@desktop>
next in thread | previous in thread | raw e-mail | index | archive | help
The entire thing reads like a bug -- for each platform critical_exit
can venture into mi_switch with interrupts disabled. On amd64 it just
happens to do it after the count was updated, but still before they
got-reenabled. Other platform also do critical enter/exit dance on
each call instead of just on the 0<->1 transition of
md_spinlock_count.
iow each platform should be probably doing something like this instead:
diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
index 7aadd86e3cd6..c90fe8ef961d 100644
--- a/sys/amd64/amd64/machdep.c
+++ b/sys/amd64/amd64/machdep.c
@@ -1991,10 +1991,10 @@ spinlock_enter(void)
td = curthread;
if (td->td_md.md_spinlock_count == 0) {
+ critical_enter();
flags = intr_disable();
td->td_md.md_spinlock_count = 1;
td->td_md.md_saved_flags = flags;
- critical_enter();
} else
td->td_md.md_spinlock_count++;
}
@@ -2009,8 +2009,8 @@ spinlock_exit(void)
flags = td->td_md.md_saved_flags;
td->td_md.md_spinlock_count--;
if (td->td_md.md_spinlock_count == 0) {
- critical_exit();
intr_restore(flags);
+ critical_exit();
}
}
On 12/16/19, Jeff Roberson <jroberson@jroberson.net> wrote:
> On Mon, 16 Dec 2019, Ryan Libby wrote:
>
>> On Mon, Dec 16, 2019 at 7:30 AM Ed Maste <emaste@freebsd.org> wrote:
>>>
>>> On Sun, 15 Dec 2019 at 16:27, Jeff Roberson <jeff@freebsd.org> wrote:
>>>>
>>>> Author: jeff
>>>> Date: Sun Dec 15 21:26:50 2019
>>>> New Revision: 355784
>>>> URL: https://svnweb.freebsd.org/changeset/base/355784
>>>>
>>>> Log:
>>>> schedlock 4/4
>>>
>>> FYI i386, arm, arm64, riscv fail to boot now, with "panic: invalid count
>>> 2"
>>>
>>> Boot logs:
>>> i386: https://ci.freebsd.org/job/FreeBSD-head-i386-test/7797/console
>>> arm:
>>> https://ci.freebsd.org/hwlab/job/FreeBSD-device-head-beaglebone-test/1317/artifact/device_tests/beaglebone.boot.log
>>> arm64:
>>> https://ci.freebsd.org/hwlab/job/FreeBSD-device-head-pinea64-test/1194/artifact/device_tests/pinea64.boot.log
>>> riscv:
>>> https://ci.freebsd.org/hwlab/job/FreeBSD-device-head-pinea64-test/1194/artifact/device_tests/pinea64.boot.log
>>>
>>> arm64 is:
>>>
>>> panic: invalid count 2
>>> cpuid = 0
>>> time = 1
>>> KDB: stack backtrace:
>>> db_trace_self() at db_trace_self_wrapper+0x28
>>> pc = 0xffff0000007359ec lr = 0xffff000000106744
>>> sp = 0xffff000056b063c0 fp = 0xffff000056b065d0
>>>
>>> db_trace_self_wrapper() at vpanic+0x18c
>>> pc = 0xffff000000106744 lr = 0xffff000000408128
>>> sp = 0xffff000056b065e0 fp = 0xffff000056b06690
>>>
>>> vpanic() at panic+0x44
>>> pc = 0xffff000000408128 lr = 0xffff000000407ed8
>>> sp = 0xffff000056b066a0 fp = 0xffff000056b06720
>>>
>>> panic() at sched_switch+0x81c
>>> pc = 0xffff000000407ed8 lr = 0xffff000000434264
>>> sp = 0xffff000056b06730 fp = 0xffff000056b06810
>>>
>>> sched_switch() at mi_switch+0x170
>>> pc = 0xffff000000434264 lr = 0xffff000000413690
>>> sp = 0xffff000056b06820 fp = 0xffff000056b06840
>>>
>>> mi_switch() at cpu_idle+0xc8
>>> pc = 0xffff000000413690 lr = 0xffff0000007400a0
>>> sp = 0xffff000056b06850 fp = 0xffff000056b06860
>>>
>>> cpu_idle() at sched_idletd+0x380
>>> pc = 0xffff0000007400a0 lr = 0xffff000000436a90
>>> sp = 0xffff000056b06870 fp = 0xffff000056b06940
>>>
>>> sched_idletd() at fork_exit+0x7c
>>> pc = 0xffff000000436a90 lr = 0xffff0000003c7ba4
>>> sp = 0xffff000056b06950 fp = 0xffff000056b06980
>>>
>>> fork_exit() at fork_trampoline+0x10
>>> pc = 0xffff0000003c7ba4 lr = 0xffff0000007521ac
>>> sp = 0xffff000056b06990 fp = 0x0000000000000000
>>>
>>> KDB: enter: panic
>>> [ thread pid 11 tid 100003 ]
>>> Stopped at 0
>>> db>
>>
>> It looks like amd64 vs i386, riscv, etc are using different motifs in
>> spinlock_exit(). Perhaps we just need to rearrange them to drop the
>> spinlock count before critical_exit(), like in amd64.
>
> It took me a moment to see why but I believe you are right. Interrupts
> being disabled would prevent a local preemption with the flags out of sync
> but critical_exit() might have owepreempt set so we will switch before
> updating the count.
>
> Jeff
>
>>
>> Ryan
>>
>
--
Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHGo1bRpzmGuRB38%2BOZPpdNU2c2ksbZFM7mhaQF1PeDg=g>
