Date: Thu, 2 Jun 2022 12:49:45 +0200 From: Jan Mikkelsen <janm@transactionware.com> To: Paul Floyd <paulf2718@gmail.com> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: Hang ast / pipelk / piperd Message-ID: <6DBE2C6F-7F8B-457A-AB10-1912965C3376@transactionware.com> In-Reply-To: <bf595064-e03c-e69d-9d93-3f7de52360c0@gmail.com> References: <84015bf9-8504-1c3c-0ba5-58d0d7824843@gmail.com> <dca6a5b4-6f0c-98c0-2f2d-6e5da7405af4@gmail.com> <YpTRj7jVE0jfbxPO@nuc> <b598c89e-f11a-eed1-3d74-c3ef37bc400a@gmail.com> <Ypd0ziZKcZ2pgm0P@nuc> <bf595064-e03c-e69d-9d93-3f7de52360c0@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 1 Jun 2022, at 23:27, Paul Floyd <paulf2718@gmail.com> wrote: >=20 >=20 >=20 > On 6/1/22 14:16, Mark Johnston wrote: >> On Mon, May 30, 2022 at 09:35:05PM +0200, Paul Floyd wrote: >>>=20 >>>=20 >>> On 5/30/22 14:15, Mark Johnston wrote: >>>=20 >>>> "procstat -kk <valgrind PID>" might help to reveal what's going on, >>>> since it sounds like the hand/livelock is happening somewhere in = the >>>> kernel. >>>=20 >>> Not knowing much about the kernel, my guess is that this is related = to >>>=20 >>> commit 4808bab7fa6c3ec49b49476b8326d7a0250a03fa >>> Author: Alexander Motin <mav@FreeBSD.org> >>> Date: Tue Sep 21 18:14:22 2021 -0400 >>>=20 >>> sched_ule(4): Improve long-term load balancer. >>>=20 >>> and this bit of ast code >>>=20 >>> doreti_ast: >>> /* >>> * Check for ASTs atomically with returning. Disabling CPU >>> * interrupts provides sufficient locking even in the SMP case, >>> * since we will be informed of any new ASTs by an IPI. >>> */ >>> cli >>> movq PCPU(CURTHREAD),%rax >>> testl $TDF_ASTPENDING | TDF_NEEDRESCHED,TD_FLAGS(%rax) >>> je doreti_exit >>> sti >>> movq %rsp,%rdi /* pass a pointer to the trapframe */ >>> call ast >>> jmp doreti_ast >>>=20 >>>=20 >>> The above commit seems to be migrating loaded threads to another = CPU. >> How did you infer that? The long-term load balancer should be = running >> fairly infrequently. >=20 >=20 > Well one thread is hung in > mi_switch+0xc2 ast+0x1e6 doreti_ast+0x1f >=20 > I found the above code referring to doreti_ast with grep and thought = that if the test always fails then it will loop forever. So I looked for = git commits containing TDF_ASTPENDING and TDF_NEEDRESCHED and found a = couple of commits mentioning these flags, scheduling and found the above = and the followup >=20 > commit 11f14b33629e552a451fdbfe653ebb0addd27700 > Author: Alexander Motin <mav@FreeBSD.org> > Date: Sun Sep 26 12:03:05 2021 -0400 >=20 > sched_ule(4): Fix hang with steal_thresh < 2. >=20 > e745d729be60 caused infinite loop with interrupts disabled in load > stealing code if steal_thresh set below 2. Such configuration = should > not generally be used, but appeared some people are using it to > workaround some problems. >=20 > and I guessed they might be related. >=20 >> As a side note, I think we are missing ktrcsw() calls in some places, >> e.g., in turnstile_wait(). >>> My test system is a VirtualBox amd64 FreeBSD 13.1 with one CPU = running >>> on a 13.0 host. >>>=20 >>> I just tried restarting the VM with 2 CPUs and the testcase seems to = be >>> a lot better - it's been running in a loop for 10 minutes whereas >>> previously it would hang at least 1 in 5 times. >> Hmm. Could you, please, show the ktrace output with -H -T passed to >> kdump(1), together with fresh "procstat -kk" output? >> The fact that the problem apparently only occurs with 1 CPU suggests = a >> scheduler bug, indeed. >=20 >=20 > Here is the fresh procstat output >=20 > paulf@freebsd:~ $ procstat -kk 12339 > PID TID COMM TDNAME KSTACK=20 > 12339 100089 none-amd64-freebsd - mi_switch+0xc2 = sleepq_catch_signals+0x2e6 sleepq_wait_sig+0x9 _sleep+0x1f2 = umtxq_sleep+0x143 do_wait+0x3e5 __umtx_op_wait+0x53 sys__umtx_op+0x7e = amd64_syscall+0x10c fast_syscall_common+0xf8 >=20 > 12339 100582 none-amd64-freebsd - mi_switch+0xc2 = sleepq_catch_signals+0x2e6 sleepq_timedwait_sig+0x12 _sleep+0x1d1 = kern_clock_nanosleep+0x1c1 sys_nanosleep+0x3b amd64_syscall+0x10c = fast_syscall_common+0xf8 >=20 > 12339 100583 none-amd64-freebsd - mi_switch+0xc2 = ast+0x1e6 doreti_ast+0x1f >=20 > 12339 100584 none-amd64-freebsd - mi_switch+0xc2 = sleepq_catch_signals+0x2e6 sleepq_wait_sig+0x9 _sleep+0x1f2 = pipe_read+0x3d6 dofileread+0x81 sys_read+0xbc amd64_syscall+0x10c = fast_syscall_common+0xf8 >=20 > 12339 100585 none-amd64-freebsd - mi_switch+0xc2 = intr_event_handle+0x167 intr_execute_handlers+0x4b Xapic_isr1+0xdc = sleepq_catch_signals+0x2e6 sleepq_wait_sig+0x9 _sleep+0x1f2 = pipe_read+0x3d6 dofileread+0x81 sys_read+0xbc amd64_syscall+0x10c = fast_syscall_common+0xf8 > [ =E2=80=A6 ] All these mi_switch+0xc2 hangs reminded me of something I saw once on = 13.1-RC2 back in April. The machine was running five concurrent =E2=80=9Cm= ake -j32 installword=E2=80=9D processes. The machine hung, disk activity stopped. Results of ^T on various = running commands: ^T on a =E2=80=9Ctail -F=E2=80=9D command: load: 1.93 cmd: tail 27541 [zfs teardown inactive] 393.65r 0.06u 0.10s = 0% 2548k mi_switch+0xc2 _sleep+0x1fc rms_rlock_fallback+0x90 = zfs_freebsd_reclaim+0x26 VOP_RECLAIM_APV+0x1f vgonel+0x342 = vnlru_free_impl+0x2f7 vn_alloc_hard+0xc8 getnewvnode_reserve+0x93 = zfs_zget+0x22 zfs_dirent_lookup+0x16b zfs_dirlook+0x7a zfs_lookup+0x3d0 = zfs_cache_lookup+0xa9 VOP_LOOKUP+0x30 cache_fplookup_noentry+0x1a3 = cache_fplookup+0x366 namei+0x12a=20 ^T on a zsh doing a cd to a UFS directory: load: 0.48 cmd: zsh 86937 [zfs teardown inactive] 84663.01r 0.06u 0.01s = 0% 6412k mi_switch+0xc2 _sleep+0x1fc rms_rlock_fallback+0x90 = zfs_freebsd_reclaim+0x26 VOP_RECLAIM_APV+0x1f vgonel+0x342 = vnlru_free_impl+0x2f7 vn_alloc_hard+0xc8 getnewvnode_reserve+0x93 = zfs_zget+0x22 zfs_dirent_lookup+0x16b zfs_dirlook+0x7a zfs_lookup+0x3d0 = zfs_cache_lookup+0xa9 lookup+0x45c namei+0x259 kern_statat+0xf3 = sys_fstatat+0x2f=20 ^T on an attempt to start gstat load: 0.17 cmd: gstat 63307 [ufs] 298.29r 0.00u 0.00s 0% 228k mi_switch+0xc2 sleeplk+0xf6 lockmgr_slock_hard+0x3e7 ffs_lock+0x6c = _vn_lock+0x48 vget_finish+0x21 cache_lookup+0x26c vfs_cache_lookup+0x7b = lookup+0x45c namei+0x259 vn_open_cred+0x533 kern_openat+0x283 = amd64_syscall+0x10c fast_syscall_common+0xf8=20 A short press of the system power button did nothing. The installworld target directories were on a ZFS filesystem with a = single mirror of two SATA SSDs. Unsure if it=E2=80=99s related because the rest of the stack traces are = different. However, the mi_switch+0xc2 triggered a memory. Regards, Jan M.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6DBE2C6F-7F8B-457A-AB10-1912965C3376>