Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Jun 2022 12:49:45 +0200
From:      Jan Mikkelsen <janm@transactionware.com>
To:        Paul Floyd <paulf2718@gmail.com>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Hang ast / pipelk / piperd
Message-ID:  <6DBE2C6F-7F8B-457A-AB10-1912965C3376@transactionware.com>
In-Reply-To: <bf595064-e03c-e69d-9d93-3f7de52360c0@gmail.com>
References:  <84015bf9-8504-1c3c-0ba5-58d0d7824843@gmail.com> <dca6a5b4-6f0c-98c0-2f2d-6e5da7405af4@gmail.com> <YpTRj7jVE0jfbxPO@nuc> <b598c89e-f11a-eed1-3d74-c3ef37bc400a@gmail.com> <Ypd0ziZKcZ2pgm0P@nuc> <bf595064-e03c-e69d-9d93-3f7de52360c0@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 1 Jun 2022, at 23:27, Paul Floyd <paulf2718@gmail.com> wrote:
>=20
>=20
>=20
> On 6/1/22 14:16, Mark Johnston wrote:
>> On Mon, May 30, 2022 at 09:35:05PM +0200, Paul Floyd wrote:
>>>=20
>>>=20
>>> On 5/30/22 14:15, Mark Johnston wrote:
>>>=20
>>>> "procstat -kk <valgrind PID>" might help to reveal what's going on,
>>>> since it sounds like the hand/livelock is happening somewhere in =
the
>>>> kernel.
>>>=20
>>> Not knowing much about the kernel, my guess is that this is related =
to
>>>=20
>>> commit 4808bab7fa6c3ec49b49476b8326d7a0250a03fa
>>> Author: Alexander Motin <mav@FreeBSD.org>
>>> Date:   Tue Sep 21 18:14:22 2021 -0400
>>>=20
>>>      sched_ule(4): Improve long-term load balancer.
>>>=20
>>> and this bit of ast code
>>>=20
>>> doreti_ast:
>>> 	/*
>>> 	 * Check for ASTs atomically with returning.  Disabling CPU
>>> 	 * interrupts provides sufficient locking even in the SMP case,
>>> 	 * since we will be informed of any new ASTs by an IPI.
>>> 	 */
>>> 	cli
>>> 	movq	PCPU(CURTHREAD),%rax
>>> 	testl	$TDF_ASTPENDING | TDF_NEEDRESCHED,TD_FLAGS(%rax)
>>> 	je	doreti_exit
>>> 	sti
>>> 	movq	%rsp,%rdi	/* pass a pointer to the trapframe */
>>> 	call	ast
>>> 	jmp	doreti_ast
>>>=20
>>>=20
>>> The above commit seems to be migrating loaded threads to another =
CPU.
>> How did you infer that?  The long-term load balancer should be =
running
>> fairly infrequently.
>=20
>=20
> Well one thread is hung in
> mi_switch+0xc2 ast+0x1e6 doreti_ast+0x1f
>=20
> I found the above code referring to doreti_ast with grep and thought =
that if the test always fails then it will loop forever. So I looked for =
git commits containing TDF_ASTPENDING and TDF_NEEDRESCHED and found a =
couple of commits mentioning these flags, scheduling and found the above =
and the followup
>=20
> commit 11f14b33629e552a451fdbfe653ebb0addd27700
> Author: Alexander Motin <mav@FreeBSD.org>
> Date:   Sun Sep 26 12:03:05 2021 -0400
>=20
>    sched_ule(4): Fix hang with steal_thresh < 2.
>=20
>    e745d729be60 caused infinite loop with interrupts disabled in load
>    stealing code if steal_thresh set below 2.  Such configuration =
should
>    not generally be used, but appeared some people are using it to
>    workaround some problems.
>=20
> and I guessed they might be related.
>=20
>> As a side note, I think we are missing ktrcsw() calls in some places,
>> e.g., in turnstile_wait().
>>> My test system is a VirtualBox amd64 FreeBSD 13.1 with one CPU =
running
>>> on a 13.0 host.
>>>=20
>>> I just tried restarting the VM with 2 CPUs and the testcase seems to =
be
>>> a lot better - it's been running in a loop for 10 minutes whereas
>>> previously it would hang at least 1 in 5 times.
>> Hmm.  Could you, please, show the ktrace output with -H -T passed to
>> kdump(1), together with fresh "procstat -kk" output?
>> The fact that the problem apparently only occurs with 1 CPU suggests =
a
>> scheduler bug, indeed.
>=20
>=20
> Here is the fresh procstat output
>=20
> paulf@freebsd:~ $ procstat -kk 12339
>  PID    TID COMM                TDNAME              KSTACK=20
> 12339 100089 none-amd64-freebsd  -                   mi_switch+0xc2 =
sleepq_catch_signals+0x2e6 sleepq_wait_sig+0x9 _sleep+0x1f2 =
umtxq_sleep+0x143 do_wait+0x3e5 __umtx_op_wait+0x53 sys__umtx_op+0x7e =
amd64_syscall+0x10c fast_syscall_common+0xf8
>=20
> 12339 100582 none-amd64-freebsd  -                   mi_switch+0xc2 =
sleepq_catch_signals+0x2e6 sleepq_timedwait_sig+0x12 _sleep+0x1d1 =
kern_clock_nanosleep+0x1c1 sys_nanosleep+0x3b amd64_syscall+0x10c =
fast_syscall_common+0xf8
>=20
> 12339 100583 none-amd64-freebsd  -                   mi_switch+0xc2 =
ast+0x1e6 doreti_ast+0x1f
>=20
> 12339 100584 none-amd64-freebsd  -                   mi_switch+0xc2 =
sleepq_catch_signals+0x2e6 sleepq_wait_sig+0x9 _sleep+0x1f2 =
pipe_read+0x3d6 dofileread+0x81 sys_read+0xbc amd64_syscall+0x10c =
fast_syscall_common+0xf8
>=20
> 12339 100585 none-amd64-freebsd  -                   mi_switch+0xc2 =
intr_event_handle+0x167 intr_execute_handlers+0x4b Xapic_isr1+0xdc =
sleepq_catch_signals+0x2e6 sleepq_wait_sig+0x9 _sleep+0x1f2 =
pipe_read+0x3d6 dofileread+0x81 sys_read+0xbc amd64_syscall+0x10c =
fast_syscall_common+0xf8
> [ =E2=80=A6 ]

All these mi_switch+0xc2 hangs reminded me of something I saw once on =
13.1-RC2 back in April. The machine was running five concurrent =E2=80=9Cm=
ake -j32 installword=E2=80=9D processes.

The machine hung, disk activity stopped. Results of ^T on various =
running commands:

^T on a =E2=80=9Ctail -F=E2=80=9D command:

load: 1.93  cmd: tail 27541 [zfs teardown inactive] 393.65r 0.06u 0.10s =
0% 2548k
mi_switch+0xc2 _sleep+0x1fc rms_rlock_fallback+0x90 =
zfs_freebsd_reclaim+0x26 VOP_RECLAIM_APV+0x1f vgonel+0x342 =
vnlru_free_impl+0x2f7 vn_alloc_hard+0xc8 getnewvnode_reserve+0x93 =
zfs_zget+0x22 zfs_dirent_lookup+0x16b zfs_dirlook+0x7a zfs_lookup+0x3d0 =
zfs_cache_lookup+0xa9 VOP_LOOKUP+0x30 cache_fplookup_noentry+0x1a3 =
cache_fplookup+0x366 namei+0x12a=20

^T on a zsh doing a cd to a UFS directory:

load: 0.48  cmd: zsh 86937 [zfs teardown inactive] 84663.01r 0.06u 0.01s =
0% 6412k
mi_switch+0xc2 _sleep+0x1fc rms_rlock_fallback+0x90 =
zfs_freebsd_reclaim+0x26 VOP_RECLAIM_APV+0x1f vgonel+0x342 =
vnlru_free_impl+0x2f7 vn_alloc_hard+0xc8 getnewvnode_reserve+0x93 =
zfs_zget+0x22 zfs_dirent_lookup+0x16b zfs_dirlook+0x7a zfs_lookup+0x3d0 =
zfs_cache_lookup+0xa9 lookup+0x45c namei+0x259 kern_statat+0xf3 =
sys_fstatat+0x2f=20


^T on an attempt to start gstat

load: 0.17  cmd: gstat 63307 [ufs] 298.29r 0.00u 0.00s 0% 228k
mi_switch+0xc2 sleeplk+0xf6 lockmgr_slock_hard+0x3e7 ffs_lock+0x6c =
_vn_lock+0x48 vget_finish+0x21 cache_lookup+0x26c vfs_cache_lookup+0x7b =
lookup+0x45c namei+0x259 vn_open_cred+0x533 kern_openat+0x283 =
amd64_syscall+0x10c fast_syscall_common+0xf8=20

A short press of the system power button did nothing.

The installworld target directories were on a ZFS filesystem with a =
single mirror of two SATA SSDs.

Unsure if it=E2=80=99s related because the rest of the stack traces are =
different. However, the mi_switch+0xc2 triggered a memory.

Regards,

Jan M.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6DBE2C6F-7F8B-457A-AB10-1912965C3376>