Date: Wed, 24 Jul 2024 18:47:16 +0300 From: Konstantin Belousov <kib@freebsd.org> To: John F Carr <jfc@mit.edu> Cc: "mmel@freebsd.org" <mmel@freebsd.org>, Mark Millard <marklmi@yahoo.com>, FreeBSD Current <freebsd-current@freebsd.org>, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org> Subject: Re: armv7-on-aarch64 stuck at urdlck Message-ID: <ZqEiBJF7UxcUWCQ2@kib.kiev.ua> In-Reply-To: <A7348370-0BEE-4EA4-8521-03C07F025F40@mit.edu> References: <33251aa3-681f-4d17-afe9-953490afeaf0@gmail.com> <0DD19771-3AAB-469E-981B-1203F1C28233@yahoo.com> <be023545-2b25-49ec-b6f1-9e05cd402646@gmail.com> <Zp95qtxK0CeDdp-d@kib.kiev.ua> <6a969609-fa0e-419d-83d5-e4fcf0f6ec35@freebsd.org> <FABF7440-70D2-4BAB-8B0B-4CA950CFFA60@mit.edu> <ZqDWSU_h5J1fYCrz@kib.kiev.ua> <f39b16b5-bbfb-4011-92fb-834330841533@freebsd.org> <ZqDcamh6r3B-oEB-@kib.kiev.ua> <A7348370-0BEE-4EA4-8521-03C07F025F40@mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 24, 2024 at 01:07:39PM +0000, John F Carr wrote: > > > > On Jul 24, 2024, at 06:50, Konstantin Belousov <kib@freebsd.org> wrote: > > > > On Wed, Jul 24, 2024 at 12:34:57PM +0200, mmel@freebsd.org wrote: > >> > >> > >> On 24.07.2024 12:24, Konstantin Belousov wrote: > >>> On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote: > >>>> On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com> wrote: > >>>>> > >>>>> On 23.07.2024 11:36, Konstantin Belousov wrote: > >>>>>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote: > >>>>>>> The good news is that I'm finally able to generate a working/locking > >>>>>>> test case. The culprit (at least for me) is if "-mcpu" is used when > >>>>>>> compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf). > >>>>>>> If it is not used, libthr is broken (regardless of -O level or debug/normal > >>>>>>> build), but -mcpu=cortex-a15 will always produce a working libthr. > >>>>>> I think this is very significant progress. > >>>>>> Do you plan to drill down more to see what is going on? > >>>>> > >>>>> So the problem is now clear, and I fear it may apply to other architectures as well. > >>>>> dlopen_object() (from rtld_elf), > >>>>> https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766, > >>>>> holds the rtld_bind_lock write lock for almost the entire time a new library is loaded. > >>>>> If the code uses a yet unresolved symbol to load the library, the rtl_bind() function attempts to get read lock of rtld_bind_lock and a deadlock occurs. > >>>>> > >>>>> In this case, it round_up() in _thr_stack_fix_protection, > >>>>> https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136. > >>>>> Issued by __aeabi_uidiv (since not all armv7 processors support HW divide). > >>>>> > >>>>> Unfortunately, I'm not sure how to fix it. The compiler can emit __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols used by rtld_eld and libthr beforehand. > >>>>> > >>>>> > >>>>> Michal > >>>>> > >>>> > >>>> In this case (but not for all _aeabi_ functions) we can avoid division > >>>> as long as page size is a power of 2. > >>>> > >>>> The function is > >>>> > >>>> static inline size_t > >>>> round_up(size_t size) > >>>> { > >>>> if (size % _thr_page_size != 0) > >>>> size = ((size / _thr_page_size) + 1) * > >>>> _thr_page_size; > >>>> return size; > >>>> } > >>>> > >>>> The body can be condensed to > >>>> > >>>> return (size + _thr_page_size - 1) & ~(_thr_page_size - 1); > >>>> > >>>> This is shorter in both lines of code and instruction bytes. > >>> > >>> Lets not allow this to be lost. Could anybody confirm that the patch > >>> below fixes the issue? > >>> > >>> commit d560f4f6690a48476565278fd07ca131bf4eeb3c > >>> Author: Konstantin Belousov <kib@FreeBSD.org> > >>> Date: Wed Jul 24 13:17:55 2024 +0300 > >>> > >>> rtld: avoid division in __thr_map_stacks_exec() > >>> The function is called by rtld with the rtld bind lock write-locked, > >>> when fixing the stack permission during dso load. Not every ARMv7 CPU > >>> supports the div, which causes the recursive entry into rtld to resolve > >>> the __aeabi_uidiv symbol, causing self-lock. > >>> Workaround the problem by using roundup2() instead of open-coding less > >>> efficient formula. > >>> Diagnosed by: mmel > >>> Based on submission by: John F Carr <jfc@mit.edu> > >>> Sponsored by: The FreeBSD Foundation > >>> MFC after: 1 week > >>> > > Just realized that it is wrong. Stack size is user-controlled and it does > > not need to be power of two. > > Your change is correct. _thr_page_size is set to getpagesize(), > which is a power of 2. The call to roundup2 takes a user-provided > size and rounds it up to a multiple of the system page size. > > I tested the change and it works. My change also works and > should compile to identical code. I forgot there was a standard > function to do the rounding. Right, my bad, thank you for correcting my thinko. > > > For final resolving of deadlocks, after a full day of digging, I'm very much > >> incline of adding -znow to the linker flags for libthr.so (and maybe also > >> for ld-elf.so). The runtime cost of resolving all symbols at startup is very > >> low. Direct pre-solving in _thr_rtld_init() is problematic for the _aeabi_* > >> symbols, since they don't have an official C prototypes, and some are not > >> compatible with C calling conventions. > > I do not like it. `-z now' changes (breaks) the ABI and makes some symbols > > not preemtible. > > > > In the worst case, we would need a call to the asm routine which causes the > > resolution of the _eabi_* symbols on arm. > > > > It would also be possible to link libthr with libgcc.a and use a linker map > to hide the _eabi_ symbols. In principle yes, but if the ARM ABI states that _eabi symbols must be used, and exported from libc, then this is also some form of ABI breakage.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ZqEiBJF7UxcUWCQ2>