Date: Thu, 22 Sep 2022 15:00:53 -0400 From: Mark Johnston <markj@freebsd.org> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: freebsd-current@freebsd.org Subject: Re: A panic a day Message-ID: <Yyyw5bnWO1y6veYl@nuc> In-Reply-To: <YyyqDEPL3X3esFYl@troutmask.apl.washington.edu> References: <YyyqDEPL3X3esFYl@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Sep 22, 2022 at 11:31:40AM -0700, Steve Kargl wrote: > All, > > I updated my kernel/world/all ports on Sept 19 2022. > Since then, I have had daily panics and hard lock-up > (no panic, keyboard, mouse, network, ...). The one > panic I did witness sent text scolling off the screen. > There is no dump, or at least, I haven't figured out > a way to get a dump. > > Using ports/graphics/tesseract and then hand editing > the OCR result, the last visible portions is > > > panic() at panic+0x43/frame 0xfffffe00daf65550 > __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0xc6/frame 0xfffffe00daf655e0 > sched_add() at sched_add+0x98/frame 0xfffffe00daf656a0 > setrunnable() at setrunnable+0x73/frame 0xfffffe00daf656d0 > wakeup_any() at wakeup_any+0x1f/frame 0xfffffe00daf656f0 > taskqueue_enqueue_locked() at taskqueue_enqueue_locked+0x13e/frame 0xfffffe00daf65720 > taskqueue_enqueue_timeout_sbt() at taskqueue_enqueue_timeout_sbt+0xe5/frame 0xfffffe00daf65770 > resettodr() at resettodr+0x7a/frame 0xfffffe00daf657b0 > kern_reboot() at kern_reboot+0x2ae/frame 0xfffffe00daf657f0 > vpanic() at vpanic+0x1be/frame 0xfffffe00daf65840 > panic() at panic+0x43/frame 0xfffffe00daf658a0 > __mtx_lock_spin_flags() at __mix_lock_spin_flags+0xc6/frame 0xfffffe00daf65ab0 > sched_add() at sched_add+0x98/frame 0xfffffe00daf65990 > setrunnable() at setrunnable+0x73/frame 0xfffffe008daf659c0 > wakeup_any() at wakeup_any+0x1f/frame 0xfffffe00daf659e0 > taskqueue_enqueue_locked() at taskqueue_enqueue_locked+0x13e/frame 0xfffffe00daf65a11 > drm_crtc_helper_set_config() at drm_crtc_helper_set_config+0x971/frame 0xfffffe00daf65abl > radeon_crtc_set_config() at radeon_crtc_set_config+0x22/frame 0xfffffe00daf65ad0 > __drm_mode_set_config_internal() at __drm_mode_set_config_internal+0xdd/frame 0xfffffe00daf65b10 > drm_client_modeset_commit_locked() at drm_client_modeset_commit_locked+0x160/frame 0xfffffe00daf65b50 > drm_client_modeset_commit() at drm_client_modeset_commit+0x21/frame 0xfffffe00daf65b70 > drm_fb_helper_restore_fbdev_mode_unlocked() at drm_fb_helper_restore_fbdev_mode_unlocked+0x81/frame > vt_kms_postswitch() at vt_kms_postswitch+0x166/frame 0xfffffe00daf65bd0 > vt_window_switch() at vt_window_switch+0x119/frame 0xfffffe00daf65c1d > vtterm_cngrab() at vtterm_cngrab+0x4f/frame 0xfffffe00daf65c30 > cngrab() at cngrab+0x26/frame 0xfffffe00daf65ca0 > vpanic() at vpanic+0xf0/frame 0xfffffe00daf65ca0 > panic() at panic+0x43/frame 0xfffffe00daf65d00 > __mtx_assert() at __mtx_assert+0x9d/frame 0xfffffe00daf65d10 > ast_sched_locked() at ast_sched_locked+0x29/frame 0xfffffe00daf65d30 > sched_add() at sched_add+0x4c5/frame 0xfffffe00daf65df0 > sched_switch() at sched_switch+0x9f/frame 0xfffffe00daf65e20 > mi_switch() at mi_switch+0x14b/frame 0xfffffe00daf65e40 > sched_bind() at sched_bind+0x73/frame 0xfffffe00daf65e60 > pcpu_cache_drain_safe() at pcpu_cache_drain_safe+0x25a/frame 0xfffffe00daf65e90 > uma_reclaim_domain() at uma_reclain_domain+0x279/frame Buf ffffe00dafohech > uma_reclaim_worker() at uma_reclaim_worker+0x35/frame 0xfffffe00daf65ef0 > fork_exit() at fork_exit+0x80/frame 0xfffffe00daf65f30 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00daf65f30 > --- trap 0, rip = 0, rop = 0, rbp = 0 --- It looks like you use the 4BSD scheduler? I think there's a bug in kick_other_cpu() in that it doesn't make sure that the remote CPU's curthread lock is held when modifying thread state. Because 4BSD has a global scheduler lock, this is often true in practice, but doesn't have to be. I think this untested patch will address the panics. The bug was there for a long time but some recent restructuring added an assertion which caught it. diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c index 9d48aa746f6d..484864b66c1c 100644 --- a/sys/kern/sched_4bsd.c +++ b/sys/kern/sched_4bsd.c @@ -1282,9 +1282,10 @@ kick_other_cpu(int pri, int cpuid) } #endif /* defined(IPI_PREEMPTION) && defined(PREEMPTION) */ - ast_sched_locked(pcpu->pc_curthread, TDA_SCHED); - ipi_cpu(cpuid, IPI_AST); - return; + if (pcpu->pc_curthread->td_lock == &sched_lock) { + ast_sched_locked(pcpu->pc_curthread, TDA_SCHED); + ipi_cpu(cpuid, IPI_AST); + } } #endif /* SMP */ @@ -1397,7 +1398,7 @@ sched_add(struct thread *td, int flags) cpuid = PCPU_GET(cpuid); if (single_cpu && cpu != cpuid) { - kick_other_cpu(td->td_priority, cpu); + kick_other_cpu(td->td_priority, cpu); } else { if (!single_cpu) { tidlemsk = idle_cpus_mask;
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Yyyw5bnWO1y6veYl>