Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Jul 2024 13:07:39 +0000
From:      John F Carr <jfc@mit.edu>
To:        Konstantin Belousov <kib@freebsd.org>
Cc:        "mmel@freebsd.org" <mmel@freebsd.org>, Mark Millard <marklmi@yahoo.com>, FreeBSD Current <freebsd-current@freebsd.org>, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
Subject:   Re: armv7-on-aarch64 stuck at urdlck
Message-ID:  <A7348370-0BEE-4EA4-8521-03C07F025F40@mit.edu>
In-Reply-To: <ZqDcamh6r3B-oEB-@kib.kiev.ua>
References:  <a1b978fe-ff54-4112-860c-b09500d89d0b@freebsd.org> <C0B42CBB-8F12-4597-A04B-26F2107E176E@yahoo.com> <33251aa3-681f-4d17-afe9-953490afeaf0@gmail.com> <0DD19771-3AAB-469E-981B-1203F1C28233@yahoo.com> <be023545-2b25-49ec-b6f1-9e05cd402646@gmail.com> <Zp95qtxK0CeDdp-d@kib.kiev.ua> <6a969609-fa0e-419d-83d5-e4fcf0f6ec35@freebsd.org> <FABF7440-70D2-4BAB-8B0B-4CA950CFFA60@mit.edu> <ZqDWSU_h5J1fYCrz@kib.kiev.ua> <f39b16b5-bbfb-4011-92fb-834330841533@freebsd.org> <ZqDcamh6r3B-oEB-@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help


> On Jul 24, 2024, at 06:50, Konstantin Belousov <kib@freebsd.org> wrote:
>=20
> On Wed, Jul 24, 2024 at 12:34:57PM +0200, mmel@freebsd.org wrote:
>>=20
>>=20
>> On 24.07.2024 12:24, Konstantin Belousov wrote:
>>> On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote:
>>>> On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com> wro=
te:
>>>>>=20
>>>>> On 23.07.2024 11:36, Konstantin Belousov wrote:
>>>>>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
>>>>>>> The good news is that I'm finally able to generate a working/lockin=
g
>>>>>>> test case.  The culprit (at least for me) is if "-mcpu" is used whe=
n
>>>>>>> compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make=
.conf).
>>>>>>> If it is not used, libthr is broken (regardless of -O level or debu=
g/normal
>>>>>>> build), but -mcpu=3Dcortex-a15 will always produce a working libthr=
.
>>>>>> I think this is very significant progress.
>>>>>> Do you plan to drill down more to see what is going on?
>>>>>=20
>>>>> So the problem is now clear, and I fear it may apply to other archite=
ctures as well.
>>>>> dlopen_object() (from rtld_elf),
>>>>> https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
>>>>> holds the rtld_bind_lock write lock for almost the entire time a new =
library is loaded.
>>>>> If the code uses a yet unresolved symbol to load the library, the rtl=
_bind() function attempts to get read lock of  rtld_bind_lock and a deadloc=
k occurs.
>>>>>=20
>>>>> In this case, it round_up() in _thr_stack_fix_protection,
>>>>> https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
>>>>> Issued by __aeabi_uidiv (since not all armv7 processors support HW di=
vide).
>>>>>=20
>>>>> Unfortunately, I'm not sure how to fix it.  The compiler can emit __a=
eabi_<> in any place, and I'm not sure if it can resolve all the symbols us=
ed by rtld_eld and libthr beforehand.
>>>>>=20
>>>>>=20
>>>>> Michal
>>>>>=20
>>>>=20
>>>> In this case (but not for all _aeabi_ functions) we can avoid division
>>>> as long as page size is a power of 2.
>>>>=20
>>>> The function is
>>>>=20
>>>>   static inline size_t
>>>>   round_up(size_t size)
>>>>   {
>>>>    if (size % _thr_page_size !=3D 0)
>>>>    size =3D ((size / _thr_page_size) + 1) *
>>>>        _thr_page_size;
>>>>    return size;
>>>>   }
>>>>=20
>>>> The body can be condensed to
>>>>=20
>>>>   return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);
>>>>=20
>>>> This is shorter in both lines of code and instruction bytes.
>>>=20
>>> Lets not allow this to be lost.  Could anybody confirm that the patch
>>> below fixes the issue?
>>>=20
>>> commit d560f4f6690a48476565278fd07ca131bf4eeb3c
>>> Author: Konstantin Belousov <kib@FreeBSD.org>
>>> Date:   Wed Jul 24 13:17:55 2024 +0300
>>>=20
>>>     rtld: avoid division in __thr_map_stacks_exec()
>>>     The function is called by rtld with the rtld bind lock write-locked=
,
>>>     when fixing the stack permission during dso load.  Not every ARMv7 =
CPU
>>>     supports the div, which causes the recursive entry into rtld to res=
olve
>>>     the  __aeabi_uidiv symbol, causing self-lock.
>>>     Workaround the problem by using roundup2() instead of open-coding l=
ess
>>>     efficient formula.
>>>     Diagnosed by:   mmel
>>>     Based on submission by: John F Carr <jfc@mit.edu>
>>>     Sponsored by:   The FreeBSD Foundation
>>>     MFC after:      1 week
>>>=20
> Just realized that it is wrong.  Stack size is user-controlled and it doe=
s
> not need to be power of two.

Your change is correct.  _thr_page_size is set to getpagesize(),
which is a power of 2.   The call to roundup2 takes a user-provided
size and rounds it up to a multiple of the system page size.

I tested the change and it works.  My change also works and
should compile to identical code.  I forgot there was a standard
function to do the rounding.

> For final resolving of deadlocks, after a full day of digging, I'm very m=
uch
>> incline  of adding -znow to the linker flags for libthr.so (and maybe al=
so
>> for ld-elf.so). The runtime cost of resolving all symbols at startup is =
very
>> low. Direct pre-solving in _thr_rtld_init() is problematic for the _aeab=
i_*
>> symbols, since they don't have an official C prototypes, and some are no=
t
>> compatible with C calling conventions.
> I do not like it. `-z now' changes (breaks) the ABI and makes some symbol=
s
> not preemtible.
>=20
> In the worst case, we would need a call to the asm routine which causes t=
he
> resolution of the _eabi_* symbols on arm.
>=20

It would also be possible to link libthr with libgcc.a and use a linker map
to hide the _eabi_ symbols.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A7348370-0BEE-4EA4-8521-03C07F025F40>