Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 10 Jul 2023 09:39:35 +0000
From:      John F Carr <jfc@mit.edu>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Current FreeBSD <freebsd-current@freebsd.org>
Subject:   Re: shell hung in fork system call
Message-ID:  <1684E4FD-8C43-4D2C-BC34-659A263BBBAB@mit.edu>
In-Reply-To: <ZKtJ51ZHZWCixlk9@kib.kiev.ua>
References:  <909E2C96-3BFA-41AD-8EE7-0902231C2B95@mit.edu> <ZKtB8B_Fs0GwVCcP@kib.kiev.ua> <52A8F775-17D9-4240-A444-98AD5339622F@mit.edu> <ZKtJ51ZHZWCixlk9@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help


> On Jul 9, 2023, at 19:59, Konstantin Belousov <kostikbel@gmail.com> wrote=
:
>=20
> On Sun, Jul 09, 2023 at 11:36:03PM +0000, John F Carr wrote:
>>=20
>>=20
>>> On Jul 9, 2023, at 19:25, Konstantin Belousov <kostikbel@gmail.com> wro=
te:
>>>=20
>>> On Sun, Jul 09, 2023 at 10:41:27PM +0000, John F Carr wrote:
>>>> Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus som=
e irrelevant local changes, four 64 bit ARM processors, make.conf sets CPUT=
YPE?=3Dcortex-a57.
>>>>=20
>>>> I typed ^C while /bin/sh was starting a pipeline and my shell got hung=
 in the middle of fork().
>>>>=20
>>>>> From the terminal:
>>>>=20
>>>> # git log --oneline --|more
>>>> ^C^C^C
>>>> load: 3.26  cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k
>>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 =
fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44=20
>>>> load: 3.16  cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k
>>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 =
fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44=20
>>>>=20
>>>> According to ps -d on another terminal the shell has no children:
>>>>=20
>>>> PID TT  STAT       TIME COMMAND
>>>> [...]
>>>> 873 u0  IWs     0:00.00 `-- login [pam] (login)
>>>> 874 u0  I       0:00.17   `-- -sh (sh)
>>>> 95504 u0  I       0:00.01     `-- su -
>>>> 95505 u0  D+      0:00.05       `-- -su (sh)
>>>> [...]
>>>>=20
>>>> Nothing on the (115200 bps serial) console.  No change in system perfo=
rmance.
>>>>=20
>>>> The system is busy copying a large amount of data from the network to =
a ZFS pool on spinning disks.  The git|more pipeline could have taken some =
time to get going while I/O requests worked their way through the queue.  I=
t would not have touched the busy pool, only the zroot pool on an SSD.
>>>>=20
>>>> Has anything changed recently that might cause this?
>>>=20
>>> There was some change around fork, but your sleep seems to be not from
>>> that change.  Can you show the wait channel for the process?  Do someth=
ing
>>> like
>>> $ ps alxww
>>>=20
>>=20
>> UID   PID  PPID  C PRI NI   VSZ   RSS MWCHAN   STAT TT        TIME COMMA=
ND
>>   0 95505 95504  2  20  0 13508  2876 fork     D+   u0     0:00.13 -su (=
sh)
>>=20
>> This is probably the same information displayed as [fork] in the output =
from ^T.
>>=20
>> Does it correspond to the source line
>>=20
>> pause("fork", hz / 2);
>>=20
>> ?
>=20
> Yes, it is rate-limiting code.  Still it is interesting to see the whole
> ps output.
>=20
> Do you have 7a70f17ac4bd64dc1a5020f in your source?

No, I do not have that commit.

The comment mentions livelock.  CPU use as reported by iostat did not chang=
e after the process hung.






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1684E4FD-8C43-4D2C-BC34-659A263BBBAB>