Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Dec 2021 16:13:50 +0100
From:      Jan Mikkelsen <janm@transactionware.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: closefrom blocking, wchan urdlck
Message-ID:  <9CB0803A-E15B-47F9-97A9-03597D41C01E@transactionware.com>
In-Reply-To: <YcnVwe/Yb0YU5PVe@kib.kiev.ua>
References:  <2B3BA665-D42A-4B5F-AD2F-ED10E64A7276@transactionware.com> <YcnFDUxa/m1xeaUS@kib.kiev.ua> <BE544C61-86CB-48B3-92C7-39F7FFDE64DE@transactionware.com> <YcnVwe/Yb0YU5PVe@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 27 Dec 2021, at 16:03, Konstantin Belousov <kostikbel@gmail.com> =
wrote:
>=20
> On Mon, Dec 27, 2021 at 03:54:57PM +0100, Jan Mikkelsen wrote:
>>=20
>>> On 27 Dec 2021, at 14:52, Konstantin Belousov <kostikbel@gmail.com> =
wrote:
>>>=20
>>> On Mon, Dec 27, 2021 at 01:39:11PM +0100, Jan Mikkelsen wrote:
>>>> Hi,
>>>>=20
>>>> (On 11.2)
>>>>=20
>>>> I am occasionally seeing closefrom() block in a child process =
created by a call to pdfork().
>>>>=20
>>>> When this does happen, it is very early after the process has =
started, while other threads are being created elsewhere in the process. =
I cannot reproduce it after the thread creation is complete. According =
to the sigaction man page, this should be async signal safe.
>>>>=20
>>>> Stack trace from the call to closefrom():
>>>>=20
>>>> * frame #0: 0x000000080090276c libthr.so.3`_umtx_op_err at =
_umtx_op_err.S:37
>>>>   frame #1: 0x00000008008f6121 =
libthr.so.3`__thr_rwlock_rdlock(rwlock=3D<unavailable>, =
flags=3D<unavailable>, tsp=3D<unavailable>) at thr_umtx.c:307:10
>>>>   frame #2: 0x00000008008ff1ac libthr.so.3`_thr_rtld_rlock_acquire =
[inlined] _thr_rwlock_rdlock(rwlock=3D0x0000000800911600, flags=3D0, =
tsp=3D0x0000000000000000) at thr_umtx.h:232:10
>>>>   frame #3: 0x00000008008ff19b =
libthr.so.3`_thr_rtld_rlock_acquire(lock=3D0x0000000800911600) at =
thr_rtld.c:125
>>>>   frame #4: 0x000000080075332b =
ld-elf.so.1`rlock_acquire(lock=3D0x0000000800765270, =
lockstate=3D0x00007fffdfbfb8d0) at rtld_lock.c:208:2
>>>>   frame #5: 0x000000080074ba20 =
ld-elf.so.1`_rtld_bind(obj=3D0x0000000800769000, reloff=3D6072) at =
rtld.c:861:5
>>>>   frame #6: 0x0000000800747c7d ld-elf.so.1`_rtld_bind_start at =
rtld_start.S:121
>>>>   frame #7: 0x00000000006562d3 =
prog`Twio::ProcHandle::spawn(this=3D<unavailable>, command=3D"/bin/echo", =
args=3D0x0000000800d7e000, descriptor_mapping=3D<unavailable>, =
descriptor_end=3D3) at prochandle_pdfork.cpp:308:2
>>> And where is the closefrom() call in the demonstrated trace?
>>>=20
>>> What version of the system do you use?
>>> You need at least cbdec8db18b533f6d7be (on HEAD) or =
a5659943e37a74c96e
>>> (stable/13) for pdfork() to behave sanely.  But you still not =
allowed to
>>> call non-async signal safe functions in the child before exec.
>>=20
>>=20
>> This is 12.2-p11. I just noticed that I wrote 11.2 above, that is =
incorrect.
>>=20
>> Frame 7 is a call to closefrom(). The child process calls dup2(), =
closefrom(), signal() and then execv(). No other calls are made, and I =
believe closefrom() is meant to be async signal safe.
>>=20
> Frame 7 cannot be a call to closefrom(), it would be resolved to =
closefrom()
> symbol would it be.

=46rom lldb, attached to the hung process:

(lldb)=20
frame #6: 0x0000000800748c7d ld-elf.so.1`_rtld_bind_start at =
rtld_start.S:121
   118 		leaq	(%rsi,%rsi,2),%rsi	# multiply by 3
   119 		leaq	(,%rsi,8),%rsi		# now 8, for 24 (sizeof =
Elf_Rela)
   120 =09
-> 121 		call	_rtld_bind		# Transfer control to =
the binder
   122 		/* Now %rax contains the entry point of the function =
being called. */
   123 =09
   124 		movq	%rax,0x60(%rsp)		# Store target over =
reloff argument
(lldb)=20
frame #7: 0x0000000000656813 =
amt5-chefd`Twio::ProcHandle::spawn(this=3D<unavailable>, =
command=3D"/bin/date", args=3D0x0000000800d5f010, =
descriptor_mapping=3D<unavailable>, descriptor_end=3D3) at =
prochandle_pdfork.cpp:304:2
   301 			_exit(127);
   302 		}
   303 =09
-> 304 		closefrom(descriptor_end);
   305 =09
   306 		signal(SIGPIPE, SIG_DFL);
   307 		signal(SIGALRM, SIG_DFL);
(lldb)=20


>> The commit you can apply cleanly to 12.2, I=E2=80=99m running a build =
now. Are there other issues with pdfork in 12.2?
>=20
> pdfork() with threading processes requires 21f749da82e755aafab1276 and =
the
> followup cbdec8db18b533f6d7be.  I do not believe any of this is in =
12.3,
> and definitely not in 12.2.

Thanks, will check and apply.

Regards,

Jan M.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9CB0803A-E15B-47F9-97A9-03597D41C01E>