Date: Mon, 10 Jul 2023 09:39:35 +0000 From: John F Carr <jfc@mit.edu> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Current FreeBSD <freebsd-current@freebsd.org> Subject: Re: shell hung in fork system call Message-ID: <1684E4FD-8C43-4D2C-BC34-659A263BBBAB@mit.edu> In-Reply-To: <ZKtJ51ZHZWCixlk9@kib.kiev.ua> References: <909E2C96-3BFA-41AD-8EE7-0902231C2B95@mit.edu> <ZKtB8B_Fs0GwVCcP@kib.kiev.ua> <52A8F775-17D9-4240-A444-98AD5339622F@mit.edu> <ZKtJ51ZHZWCixlk9@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Jul 9, 2023, at 19:59, Konstantin Belousov <kostikbel@gmail.com> wrote= : >=20 > On Sun, Jul 09, 2023 at 11:36:03PM +0000, John F Carr wrote: >>=20 >>=20 >>> On Jul 9, 2023, at 19:25, Konstantin Belousov <kostikbel@gmail.com> wro= te: >>>=20 >>> On Sun, Jul 09, 2023 at 10:41:27PM +0000, John F Carr wrote: >>>> Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus som= e irrelevant local changes, four 64 bit ARM processors, make.conf sets CPUT= YPE?=3Dcortex-a57. >>>>=20 >>>> I typed ^C while /bin/sh was starting a pipeline and my shell got hung= in the middle of fork(). >>>>=20 >>>>> From the terminal: >>>>=20 >>>> # git log --oneline --|more >>>> ^C^C^C >>>> load: 3.26 cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k >>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 = fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44=20 >>>> load: 3.16 cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k >>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 = fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44=20 >>>>=20 >>>> According to ps -d on another terminal the shell has no children: >>>>=20 >>>> PID TT STAT TIME COMMAND >>>> [...] >>>> 873 u0 IWs 0:00.00 `-- login [pam] (login) >>>> 874 u0 I 0:00.17 `-- -sh (sh) >>>> 95504 u0 I 0:00.01 `-- su - >>>> 95505 u0 D+ 0:00.05 `-- -su (sh) >>>> [...] >>>>=20 >>>> Nothing on the (115200 bps serial) console. No change in system perfo= rmance. >>>>=20 >>>> The system is busy copying a large amount of data from the network to = a ZFS pool on spinning disks. The git|more pipeline could have taken some = time to get going while I/O requests worked their way through the queue. I= t would not have touched the busy pool, only the zroot pool on an SSD. >>>>=20 >>>> Has anything changed recently that might cause this? >>>=20 >>> There was some change around fork, but your sleep seems to be not from >>> that change. Can you show the wait channel for the process? Do someth= ing >>> like >>> $ ps alxww >>>=20 >>=20 >> UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMA= ND >> 0 95505 95504 2 20 0 13508 2876 fork D+ u0 0:00.13 -su (= sh) >>=20 >> This is probably the same information displayed as [fork] in the output = from ^T. >>=20 >> Does it correspond to the source line >>=20 >> pause("fork", hz / 2); >>=20 >> ? >=20 > Yes, it is rate-limiting code. Still it is interesting to see the whole > ps output. >=20 > Do you have 7a70f17ac4bd64dc1a5020f in your source? No, I do not have that commit. The comment mentions livelock. CPU use as reported by iostat did not chang= e after the process hung.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1684E4FD-8C43-4D2C-BC34-659A263BBBAB>