Date: Thu, 22 Sep 2022 21:07:08 +0200 From: Mateusz Guzik <mjguzik@gmail.com> To: sgk@troutmask.apl.washington.edu Cc: Mark Johnston <markj@freebsd.org>, freebsd-current@freebsd.org Subject: Re: A panic a day Message-ID: <CAGudoHGhsj2__OFwSrm4=8_f0FKirRaj%2BjtfYEx5LMPLaJkMwQ@mail.gmail.com> In-Reply-To: <YyyyCh32i8LfGhqS@troutmask.apl.washington.edu> References: <YyyqDEPL3X3esFYl@troutmask.apl.washington.edu> <Yyyw5bnWO1y6veYl@nuc> <YyyyCh32i8LfGhqS@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On 9/22/22, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote: > On Thu, Sep 22, 2022 at 03:00:53PM -0400, Mark Johnston wrote: >> On Thu, Sep 22, 2022 at 11:31:40AM -0700, Steve Kargl wrote: >> > All, >> > >> > I updated my kernel/world/all ports on Sept 19 2022. >> > Since then, I have had daily panics and hard lock-up >> > (no panic, keyboard, mouse, network, ...). The one >> > panic I did witness sent text scolling off the screen. >> > There is no dump, or at least, I haven't figured out >> > a way to get a dump. >> > >> > Using ports/graphics/tesseract and then hand editing >> > the OCR result, the last visible portions is >> > >> > > > (panic messages removed). > >> It looks like you use the 4BSD scheduler? I think there's a bug in >> kick_other_cpu() in that it doesn't make sure that the remote CPU's >> curthread lock is held when modifying thread state. Because 4BSD has a >> global scheduler lock, this is often true in practice, but doesn't have >> to be. > > Yes, I use 4BSD. ULE has very poor performance for HPC type work with > OpenMPI. > Is there an easy way to set it up for testing purposes? >> I think this untested patch will address the panics. The bug was there >> for a long time but some recent restructuring added an assertion which >> caught it. > > I'll give it a try, and report back. Thanks! > > -- > steve > >> diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c >> index 9d48aa746f6d..484864b66c1c 100644 >> --- a/sys/kern/sched_4bsd.c >> +++ b/sys/kern/sched_4bsd.c >> @@ -1282,9 +1282,10 @@ kick_other_cpu(int pri, int cpuid) >> } >> #endif /* defined(IPI_PREEMPTION) && defined(PREEMPTION) */ >> >> - ast_sched_locked(pcpu->pc_curthread, TDA_SCHED); >> - ipi_cpu(cpuid, IPI_AST); >> - return; >> + if (pcpu->pc_curthread->td_lock == &sched_lock) { >> + ast_sched_locked(pcpu->pc_curthread, TDA_SCHED); >> + ipi_cpu(cpuid, IPI_AST); >> + } >> } >> #endif /* SMP */ >> >> @@ -1397,7 +1398,7 @@ sched_add(struct thread *td, int flags) >> >> cpuid = PCPU_GET(cpuid); >> if (single_cpu && cpu != cpuid) { >> - kick_other_cpu(td->td_priority, cpu); >> + kick_other_cpu(td->td_priority, cpu); >> } else { >> if (!single_cpu) { >> tidlemsk = idle_cpus_mask; > > -- > Steve > > -- Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHGhsj2__OFwSrm4=8_f0FKirRaj%2BjtfYEx5LMPLaJkMwQ>