Date: Wed, 24 Jul 2024 14:20:23 -0600 From: Warner Losh <imp@bsdimp.com> To: mmel@freebsd.org Cc: Konstantin Belousov <kib@freebsd.org>, John F Carr <jfc@mit.edu>, Mark Millard <marklmi@yahoo.com>, FreeBSD Current <freebsd-current@freebsd.org>, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org> Subject: Re: armv7-on-aarch64 stuck at urdlck Message-ID: <CANCZdfqQxUtjN6u=t08UovSMK1y_OPvEYA2SFjT08zgs9Vpabw@mail.gmail.com> In-Reply-To: <28484869-05fd-4391-9501-10b93280f7a4@freebsd.org> References: <33251aa3-681f-4d17-afe9-953490afeaf0@gmail.com> <0DD19771-3AAB-469E-981B-1203F1C28233@yahoo.com> <be023545-2b25-49ec-b6f1-9e05cd402646@gmail.com> <Zp95qtxK0CeDdp-d@kib.kiev.ua> <6a969609-fa0e-419d-83d5-e4fcf0f6ec35@freebsd.org> <FABF7440-70D2-4BAB-8B0B-4CA950CFFA60@mit.edu> <ZqDWSU_h5J1fYCrz@kib.kiev.ua> <f39b16b5-bbfb-4011-92fb-834330841533@freebsd.org> <ZqDcamh6r3B-oEB-@kib.kiev.ua> <A7348370-0BEE-4EA4-8521-03C07F025F40@mit.edu> <ZqEiBJF7UxcUWCQ2@kib.kiev.ua> <28484869-05fd-4391-9501-10b93280f7a4@freebsd.org>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On Wed, Jul 24, 2024 at 11:34 AM mmel@freebsd.org <meloun.michal@gmail.com> wrote: > > > On 24.07.2024 17:47, Konstantin Belousov wrote: > > On Wed, Jul 24, 2024 at 01:07:39PM +0000, John F Carr wrote: > >> > >> > >>> On Jul 24, 2024, at 06:50, Konstantin Belousov <kib@freebsd.org> > wrote: > >>> > >>> On Wed, Jul 24, 2024 at 12:34:57PM +0200, mmel@freebsd.org wrote: > >>>> > >>>> > >>>> On 24.07.2024 12:24, Konstantin Belousov wrote: > >>>>> On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote: > >>>>>> On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com> > wrote: > >>>>>>> > >>>>>>> On 23.07.2024 11:36, Konstantin Belousov wrote: > >>>>>>>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote: > >>>>>>>>> The good news is that I'm finally able to generate a > working/locking > >>>>>>>>> test case. The culprit (at least for me) is if "-mcpu" is used > when > >>>>>>>>> compiling libthr (e.g. indirectly injected via CPUTYPE in > /etc/make.conf). > >>>>>>>>> If it is not used, libthr is broken (regardless of -O level or > debug/normal > >>>>>>>>> build), but -mcpu=cortex-a15 will always produce a working > libthr. > >>>>>>>> I think this is very significant progress. > >>>>>>>> Do you plan to drill down more to see what is going on? > >>>>>>> > >>>>>>> So the problem is now clear, and I fear it may apply to other > architectures as well. > >>>>>>> dlopen_object() (from rtld_elf), > >>>>>>> https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766, > >>>>>>> holds the rtld_bind_lock write lock for almost the entire time a > new library is loaded. > >>>>>>> If the code uses a yet unresolved symbol to load the library, the > rtl_bind() function attempts to get read lock of rtld_bind_lock and a > deadlock occurs. > >>>>>>> > >>>>>>> In this case, it round_up() in _thr_stack_fix_protection, > >>>>>>> > https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136. > >>>>>>> Issued by __aeabi_uidiv (since not all armv7 processors support HW > divide). > >>>>>>> > >>>>>>> Unfortunately, I'm not sure how to fix it. The compiler can emit > __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols > used by rtld_eld and libthr beforehand. > >>>>>>> > >>>>>>> > >>>>>>> Michal > >>>>>>> > >>>>>> > >>>>>> In this case (but not for all _aeabi_ functions) we can avoid > division > >>>>>> as long as page size is a power of 2. > >>>>>> > >>>>>> The function is > >>>>>> > >>>>>> static inline size_t > >>>>>> round_up(size_t size) > >>>>>> { > >>>>>> if (size % _thr_page_size != 0) > >>>>>> size = ((size / _thr_page_size) + 1) * > >>>>>> _thr_page_size; > >>>>>> return size; > >>>>>> } > >>>>>> > >>>>>> The body can be condensed to > >>>>>> > >>>>>> return (size + _thr_page_size - 1) & ~(_thr_page_size - 1); > >>>>>> > >>>>>> This is shorter in both lines of code and instruction bytes. > >>>>> > >>>>> Lets not allow this to be lost. Could anybody confirm that the patch > >>>>> below fixes the issue? > >>>>> > >>>>> commit d560f4f6690a48476565278fd07ca131bf4eeb3c > >>>>> Author: Konstantin Belousov <kib@FreeBSD.org> > >>>>> Date: Wed Jul 24 13:17:55 2024 +0300 > >>>>> > >>>>> rtld: avoid division in __thr_map_stacks_exec() > >>>>> The function is called by rtld with the rtld bind lock > write-locked, > >>>>> when fixing the stack permission during dso load. Not every > ARMv7 CPU > >>>>> supports the div, which causes the recursive entry into rtld to > resolve > >>>>> the __aeabi_uidiv symbol, causing self-lock. > >>>>> Workaround the problem by using roundup2() instead of > open-coding less > >>>>> efficient formula. > >>>>> Diagnosed by: mmel > >>>>> Based on submission by: John F Carr <jfc@mit.edu> > >>>>> Sponsored by: The FreeBSD Foundation > >>>>> MFC after: 1 week > >>>>> > >>> Just realized that it is wrong. Stack size is user-controlled and it > does > >>> not need to be power of two. > >> > >> Your change is correct. _thr_page_size is set to getpagesize(), > >> which is a power of 2. The call to roundup2 takes a user-provided > >> size and rounds it up to a multiple of the system page size. > >> > >> I tested the change and it works. My change also works and > >> should compile to identical code. I forgot there was a standard > >> function to do the rounding. > > Right, my bad, thank you for correcting my thinko. > > > >> > >>> For final resolving of deadlocks, after a full day of digging, I'm > very much > >>>> incline of adding -znow to the linker flags for libthr.so (and maybe > also > >>>> for ld-elf.so). The runtime cost of resolving all symbols at startup > is very > >>>> low. Direct pre-solving in _thr_rtld_init() is problematic for the > _aeabi_* > >>>> symbols, since they don't have an official C prototypes, and some are > not > >>>> compatible with C calling conventions. > >>> I do not like it. `-z now' changes (breaks) the ABI and makes some > symbols > >>> not preemtible. > >>> > >>> In the worst case, we would need a call to the asm routine which > causes the > >>> resolution of the _eabi_* symbols on arm. > >>> > >> > >> It would also be possible to link libthr with libgcc.a and use a linker > map > >> to hide the _eabi_ symbols. > > In principle yes, but if the ARM ABI states that _eabi symbols must be > used, > > and exported from libc, then this is also some form of ABI breakage. > > I hope that https://reviews.freebsd.org/D46104 is acceptable :) > Can't speak for kib, but it looks good to my eye (though I agree with his naming quibble). And helps avoid -znow, though I could have gone either way on that. It's also simple enough not to be a burden. Warner [-- Attachment #2 --] <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 24, 2024 at 11:34 AM <a href="mailto:mmel@freebsd.org">mmel@freebsd.org</a> <<a href="mailto:meloun.michal@gmail.com">meloun.michal@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br> <br> On 24.07.2024 17:47, Konstantin Belousov wrote:<br> > On Wed, Jul 24, 2024 at 01:07:39PM +0000, John F Carr wrote:<br> >><br> >><br> >>> On Jul 24, 2024, at 06:50, Konstantin Belousov <<a href="mailto:kib@freebsd.org" target="_blank">kib@freebsd.org</a>> wrote:<br> >>><br> >>> On Wed, Jul 24, 2024 at 12:34:57PM +0200, <a href="mailto:mmel@freebsd.org" target="_blank">mmel@freebsd.org</a> wrote:<br> >>>><br> >>>><br> >>>> On 24.07.2024 12:24, Konstantin Belousov wrote:<br> >>>>> On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote:<br> >>>>>> On Jul 23, 2024, at 13:46, Michal Meloun <<a href="mailto:meloun.michal@gmail.com" target="_blank">meloun.michal@gmail.com</a>> wrote:<br> >>>>>>><br> >>>>>>> On 23.07.2024 11:36, Konstantin Belousov wrote:<br> >>>>>>>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:<br> >>>>>>>>> The good news is that I'm finally able to generate a working/locking<br> >>>>>>>>> test case. The culprit (at least for me) is if "-mcpu" is used when<br> >>>>>>>>> compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf).<br> >>>>>>>>> If it is not used, libthr is broken (regardless of -O level or debug/normal<br> >>>>>>>>> build), but -mcpu=cortex-a15 will always produce a working libthr.<br> >>>>>>>> I think this is very significant progress.<br> >>>>>>>> Do you plan to drill down more to see what is going on?<br> >>>>>>><br> >>>>>>> So the problem is now clear, and I fear it may apply to other architectures as well.<br> >>>>>>> dlopen_object() (from rtld_elf),<br> >>>>>>> <a href="https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766" rel="noreferrer" target="_blank">https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766</a>,<br> >>>>>>> holds the rtld_bind_lock write lock for almost the entire time a new library is loaded.<br> >>>>>>> If the code uses a yet unresolved symbol to load the library, the rtl_bind() function attempts to get read lock of rtld_bind_lock and a deadlock occurs.<br> >>>>>>><br> >>>>>>> In this case, it round_up() in _thr_stack_fix_protection,<br> >>>>>>> <a href="https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136" rel="noreferrer" target="_blank">https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136</a>.<br> >>>>>>> Issued by __aeabi_uidiv (since not all armv7 processors support HW divide).<br> >>>>>>><br> >>>>>>> Unfortunately, I'm not sure how to fix it. The compiler can emit __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols used by rtld_eld and libthr beforehand.<br> >>>>>>><br> >>>>>>><br> >>>>>>> Michal<br> >>>>>>><br> >>>>>><br> >>>>>> In this case (but not for all _aeabi_ functions) we can avoid division<br> >>>>>> as long as page size is a power of 2.<br> >>>>>><br> >>>>>> The function is<br> >>>>>><br> >>>>>> static inline size_t<br> >>>>>> round_up(size_t size)<br> >>>>>> {<br> >>>>>> if (size % _thr_page_size != 0)<br> >>>>>> size = ((size / _thr_page_size) + 1) *<br> >>>>>> _thr_page_size;<br> >>>>>> return size;<br> >>>>>> }<br> >>>>>><br> >>>>>> The body can be condensed to<br> >>>>>><br> >>>>>> return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);<br> >>>>>><br> >>>>>> This is shorter in both lines of code and instruction bytes.<br> >>>>><br> >>>>> Lets not allow this to be lost. Could anybody confirm that the patch<br> >>>>> below fixes the issue?<br> >>>>><br> >>>>> commit d560f4f6690a48476565278fd07ca131bf4eeb3c<br> >>>>> Author: Konstantin Belousov <kib@FreeBSD.org><br> >>>>> Date: Wed Jul 24 13:17:55 2024 +0300<br> >>>>><br> >>>>> rtld: avoid division in __thr_map_stacks_exec()<br> >>>>> The function is called by rtld with the rtld bind lock write-locked,<br> >>>>> when fixing the stack permission during dso load. Not every ARMv7 CPU<br> >>>>> supports the div, which causes the recursive entry into rtld to resolve<br> >>>>> the __aeabi_uidiv symbol, causing self-lock.<br> >>>>> Workaround the problem by using roundup2() instead of open-coding less<br> >>>>> efficient formula.<br> >>>>> Diagnosed by: mmel<br> >>>>> Based on submission by: John F Carr <<a href="mailto:jfc@mit.edu" target="_blank">jfc@mit.edu</a>><br> >>>>> Sponsored by: The FreeBSD Foundation<br> >>>>> MFC after: 1 week<br> >>>>><br> >>> Just realized that it is wrong. Stack size is user-controlled and it does<br> >>> not need to be power of two.<br> >><br> >> Your change is correct. _thr_page_size is set to getpagesize(),<br> >> which is a power of 2. The call to roundup2 takes a user-provided<br> >> size and rounds it up to a multiple of the system page size.<br> >><br> >> I tested the change and it works. My change also works and<br> >> should compile to identical code. I forgot there was a standard<br> >> function to do the rounding.<br> > Right, my bad, thank you for correcting my thinko.<br> > <br> >><br> >>> For final resolving of deadlocks, after a full day of digging, I'm very much<br> >>>> incline of adding -znow to the linker flags for libthr.so (and maybe also<br> >>>> for ld-elf.so). The runtime cost of resolving all symbols at startup is very<br> >>>> low. Direct pre-solving in _thr_rtld_init() is problematic for the _aeabi_*<br> >>>> symbols, since they don't have an official C prototypes, and some are not<br> >>>> compatible with C calling conventions.<br> >>> I do not like it. `-z now' changes (breaks) the ABI and makes some symbols<br> >>> not preemtible.<br> >>><br> >>> In the worst case, we would need a call to the asm routine which causes the<br> >>> resolution of the _eabi_* symbols on arm.<br> >>><br> >><br> >> It would also be possible to link libthr with libgcc.a and use a linker map<br> >> to hide the _eabi_ symbols.<br> > In principle yes, but if the ARM ABI states that _eabi symbols must be used,<br> > and exported from libc, then this is also some form of ABI breakage.<br> <br> I hope that <a href="https://reviews.freebsd.org/D46104" rel="noreferrer" target="_blank">https://reviews.freebsd.org/D46104</a> is acceptable :)<br></blockquote><div><br></div><div>Can't speak for kib, but it looks good to my eye (though I agree with his naming</div><div>quibble). And helps avoid -znow, though I could have gone either way on that.</div><div>It's also simple enough not to be a burden.</div><div><br></div><div>Warner </div></div></div>help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqQxUtjN6u=t08UovSMK1y_OPvEYA2SFjT08zgs9Vpabw>
