Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Jul 2024 14:20:23 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        mmel@freebsd.org
Cc:        Konstantin Belousov <kib@freebsd.org>, John F Carr <jfc@mit.edu>, Mark Millard <marklmi@yahoo.com>,  FreeBSD Current <freebsd-current@freebsd.org>,  "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
Subject:   Re: armv7-on-aarch64 stuck at urdlck
Message-ID:  <CANCZdfqQxUtjN6u=t08UovSMK1y_OPvEYA2SFjT08zgs9Vpabw@mail.gmail.com>
In-Reply-To: <28484869-05fd-4391-9501-10b93280f7a4@freebsd.org>
References:  <33251aa3-681f-4d17-afe9-953490afeaf0@gmail.com> <0DD19771-3AAB-469E-981B-1203F1C28233@yahoo.com> <be023545-2b25-49ec-b6f1-9e05cd402646@gmail.com> <Zp95qtxK0CeDdp-d@kib.kiev.ua> <6a969609-fa0e-419d-83d5-e4fcf0f6ec35@freebsd.org> <FABF7440-70D2-4BAB-8B0B-4CA950CFFA60@mit.edu> <ZqDWSU_h5J1fYCrz@kib.kiev.ua> <f39b16b5-bbfb-4011-92fb-834330841533@freebsd.org> <ZqDcamh6r3B-oEB-@kib.kiev.ua> <A7348370-0BEE-4EA4-8521-03C07F025F40@mit.edu> <ZqEiBJF7UxcUWCQ2@kib.kiev.ua> <28484869-05fd-4391-9501-10b93280f7a4@freebsd.org>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Wed, Jul 24, 2024 at 11:34 AM mmel@freebsd.org <meloun.michal@gmail.com>
wrote:

>
>
> On 24.07.2024 17:47, Konstantin Belousov wrote:
> > On Wed, Jul 24, 2024 at 01:07:39PM +0000, John F Carr wrote:
> >>
> >>
> >>> On Jul 24, 2024, at 06:50, Konstantin Belousov <kib@freebsd.org>
> wrote:
> >>>
> >>> On Wed, Jul 24, 2024 at 12:34:57PM +0200, mmel@freebsd.org wrote:
> >>>>
> >>>>
> >>>> On 24.07.2024 12:24, Konstantin Belousov wrote:
> >>>>> On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote:
> >>>>>> On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>> On 23.07.2024 11:36, Konstantin Belousov wrote:
> >>>>>>>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
> >>>>>>>>> The good news is that I'm finally able to generate a
> working/locking
> >>>>>>>>> test case.  The culprit (at least for me) is if "-mcpu" is used
> when
> >>>>>>>>> compiling libthr (e.g. indirectly injected via CPUTYPE in
> /etc/make.conf).
> >>>>>>>>> If it is not used, libthr is broken (regardless of -O level or
> debug/normal
> >>>>>>>>> build), but -mcpu=cortex-a15 will always produce a working
> libthr.
> >>>>>>>> I think this is very significant progress.
> >>>>>>>> Do you plan to drill down more to see what is going on?
> >>>>>>>
> >>>>>>> So the problem is now clear, and I fear it may apply to other
> architectures as well.
> >>>>>>> dlopen_object() (from rtld_elf),
> >>>>>>> https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
> >>>>>>> holds the rtld_bind_lock write lock for almost the entire time a
> new library is loaded.
> >>>>>>> If the code uses a yet unresolved symbol to load the library, the
> rtl_bind() function attempts to get read lock of  rtld_bind_lock and a
> deadlock occurs.
> >>>>>>>
> >>>>>>> In this case, it round_up() in _thr_stack_fix_protection,
> >>>>>>>
> https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
> >>>>>>> Issued by __aeabi_uidiv (since not all armv7 processors support HW
> divide).
> >>>>>>>
> >>>>>>> Unfortunately, I'm not sure how to fix it.  The compiler can emit
> __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols
> used by rtld_eld and libthr beforehand.
> >>>>>>>
> >>>>>>>
> >>>>>>> Michal
> >>>>>>>
> >>>>>>
> >>>>>> In this case (but not for all _aeabi_ functions) we can avoid
> division
> >>>>>> as long as page size is a power of 2.
> >>>>>>
> >>>>>> The function is
> >>>>>>
> >>>>>>    static inline size_t
> >>>>>>    round_up(size_t size)
> >>>>>>    {
> >>>>>>     if (size % _thr_page_size != 0)
> >>>>>>     size = ((size / _thr_page_size) + 1) *
> >>>>>>         _thr_page_size;
> >>>>>>     return size;
> >>>>>>    }
> >>>>>>
> >>>>>> The body can be condensed to
> >>>>>>
> >>>>>>    return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);
> >>>>>>
> >>>>>> This is shorter in both lines of code and instruction bytes.
> >>>>>
> >>>>> Lets not allow this to be lost.  Could anybody confirm that the patch
> >>>>> below fixes the issue?
> >>>>>
> >>>>> commit d560f4f6690a48476565278fd07ca131bf4eeb3c
> >>>>> Author: Konstantin Belousov <kib@FreeBSD.org>
> >>>>> Date:   Wed Jul 24 13:17:55 2024 +0300
> >>>>>
> >>>>>      rtld: avoid division in __thr_map_stacks_exec()
> >>>>>      The function is called by rtld with the rtld bind lock
> write-locked,
> >>>>>      when fixing the stack permission during dso load.  Not every
> ARMv7 CPU
> >>>>>      supports the div, which causes the recursive entry into rtld to
> resolve
> >>>>>      the  __aeabi_uidiv symbol, causing self-lock.
> >>>>>      Workaround the problem by using roundup2() instead of
> open-coding less
> >>>>>      efficient formula.
> >>>>>      Diagnosed by:   mmel
> >>>>>      Based on submission by: John F Carr <jfc@mit.edu>
> >>>>>      Sponsored by:   The FreeBSD Foundation
> >>>>>      MFC after:      1 week
> >>>>>
> >>> Just realized that it is wrong.  Stack size is user-controlled and it
> does
> >>> not need to be power of two.
> >>
> >> Your change is correct.  _thr_page_size is set to getpagesize(),
> >> which is a power of 2.   The call to roundup2 takes a user-provided
> >> size and rounds it up to a multiple of the system page size.
> >>
> >> I tested the change and it works.  My change also works and
> >> should compile to identical code.  I forgot there was a standard
> >> function to do the rounding.
> > Right, my bad, thank you for correcting my thinko.
> >
> >>
> >>> For final resolving of deadlocks, after a full day of digging, I'm
> very much
> >>>> incline  of adding -znow to the linker flags for libthr.so (and maybe
> also
> >>>> for ld-elf.so). The runtime cost of resolving all symbols at startup
> is very
> >>>> low. Direct pre-solving in _thr_rtld_init() is problematic for the
> _aeabi_*
> >>>> symbols, since they don't have an official C prototypes, and some are
> not
> >>>> compatible with C calling conventions.
> >>> I do not like it. `-z now' changes (breaks) the ABI and makes some
> symbols
> >>> not preemtible.
> >>>
> >>> In the worst case, we would need a call to the asm routine which
> causes the
> >>> resolution of the _eabi_* symbols on arm.
> >>>
> >>
> >> It would also be possible to link libthr with libgcc.a and use a linker
> map
> >> to hide the _eabi_ symbols.
> > In principle yes, but if the ARM ABI states that _eabi symbols must be
> used,
> > and exported from libc, then this is also some form of ABI breakage.
>
> I hope that https://reviews.freebsd.org/D46104 is acceptable :)
>

Can't speak for kib, but it looks good to my eye (though I agree with his
naming
quibble). And helps avoid -znow, though I could have gone either way on
that.
It's also simple enough not to be a burden.

Warner

[-- Attachment #2 --]
<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 24, 2024 at 11:34 AM <a href="mailto:mmel@freebsd.org">mmel@freebsd.org</a> &lt;<a href="mailto:meloun.michal@gmail.com">meloun.michal@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
<br>
On 24.07.2024 17:47, Konstantin Belousov wrote:<br>
&gt; On Wed, Jul 24, 2024 at 01:07:39PM +0000, John F Carr wrote:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;&gt; On Jul 24, 2024, at 06:50, Konstantin Belousov &lt;<a href="mailto:kib@freebsd.org" target="_blank">kib@freebsd.org</a>&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; On Wed, Jul 24, 2024 at 12:34:57PM +0200, <a href="mailto:mmel@freebsd.org" target="_blank">mmel@freebsd.org</a> wrote:<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; On 24.07.2024 12:24, Konstantin Belousov wrote:<br>
&gt;&gt;&gt;&gt;&gt; On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt; On Jul 23, 2024, at 13:46, Michal Meloun &lt;<a href="mailto:meloun.michal@gmail.com" target="_blank">meloun.michal@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; On 23.07.2024 11:36, Konstantin Belousov wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; The good news is that I&#39;m finally able to generate a working/locking<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; test case.  The culprit (at least for me) is if &quot;-mcpu&quot; is used when<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf).<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; If it is not used, libthr is broken (regardless of -O level or debug/normal<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; build), but -mcpu=cortex-a15 will always produce a working libthr.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; I think this is very significant progress.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Do you plan to drill down more to see what is going on?<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; So the problem is now clear, and I fear it may apply to other architectures as well.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; dlopen_object() (from rtld_elf),<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; <a href="https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766" rel="noreferrer" target="_blank">https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766</a>,<br>;
&gt;&gt;&gt;&gt;&gt;&gt;&gt; holds the rtld_bind_lock write lock for almost the entire time a new library is loaded.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; If the code uses a yet unresolved symbol to load the library, the rtl_bind() function attempts to get read lock of  rtld_bind_lock and a deadlock occurs.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; In this case, it round_up() in _thr_stack_fix_protection,<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; <a href="https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136" rel="noreferrer" target="_blank">https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136</a>.<br>;
&gt;&gt;&gt;&gt;&gt;&gt;&gt; Issued by __aeabi_uidiv (since not all armv7 processors support HW divide).<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; Unfortunately, I&#39;m not sure how to fix it.  The compiler can emit __aeabi_&lt;&gt; in any place, and I&#39;m not sure if it can resolve all the symbols used by rtld_eld and libthr beforehand.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; Michal<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; In this case (but not for all _aeabi_ functions) we can avoid division<br>
&gt;&gt;&gt;&gt;&gt;&gt; as long as page size is a power of 2.<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; The function is<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;    static inline size_t<br>
&gt;&gt;&gt;&gt;&gt;&gt;    round_up(size_t size)<br>
&gt;&gt;&gt;&gt;&gt;&gt;    {<br>
&gt;&gt;&gt;&gt;&gt;&gt;     if (size % _thr_page_size != 0)<br>
&gt;&gt;&gt;&gt;&gt;&gt;     size = ((size / _thr_page_size) + 1) *<br>
&gt;&gt;&gt;&gt;&gt;&gt;         _thr_page_size;<br>
&gt;&gt;&gt;&gt;&gt;&gt;     return size;<br>
&gt;&gt;&gt;&gt;&gt;&gt;    }<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; The body can be condensed to<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;    return (size + _thr_page_size - 1) &amp; ~(_thr_page_size - 1);<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; This is shorter in both lines of code and instruction bytes.<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Lets not allow this to be lost.  Could anybody confirm that the patch<br>
&gt;&gt;&gt;&gt;&gt; below fixes the issue?<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; commit d560f4f6690a48476565278fd07ca131bf4eeb3c<br>
&gt;&gt;&gt;&gt;&gt; Author: Konstantin Belousov &lt;kib@FreeBSD.org&gt;<br>
&gt;&gt;&gt;&gt;&gt; Date:   Wed Jul 24 13:17:55 2024 +0300<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;      rtld: avoid division in __thr_map_stacks_exec()<br>
&gt;&gt;&gt;&gt;&gt;      The function is called by rtld with the rtld bind lock write-locked,<br>
&gt;&gt;&gt;&gt;&gt;      when fixing the stack permission during dso load.  Not every ARMv7 CPU<br>
&gt;&gt;&gt;&gt;&gt;      supports the div, which causes the recursive entry into rtld to resolve<br>
&gt;&gt;&gt;&gt;&gt;      the  __aeabi_uidiv symbol, causing self-lock.<br>
&gt;&gt;&gt;&gt;&gt;      Workaround the problem by using roundup2() instead of open-coding less<br>
&gt;&gt;&gt;&gt;&gt;      efficient formula.<br>
&gt;&gt;&gt;&gt;&gt;      Diagnosed by:   mmel<br>
&gt;&gt;&gt;&gt;&gt;      Based on submission by: John F Carr &lt;<a href="mailto:jfc@mit.edu" target="_blank">jfc@mit.edu</a>&gt;<br>
&gt;&gt;&gt;&gt;&gt;      Sponsored by:   The FreeBSD Foundation<br>
&gt;&gt;&gt;&gt;&gt;      MFC after:      1 week<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt; Just realized that it is wrong.  Stack size is user-controlled and it does<br>
&gt;&gt;&gt; not need to be power of two.<br>
&gt;&gt;<br>
&gt;&gt; Your change is correct.  _thr_page_size is set to getpagesize(),<br>
&gt;&gt; which is a power of 2.   The call to roundup2 takes a user-provided<br>
&gt;&gt; size and rounds it up to a multiple of the system page size.<br>
&gt;&gt;<br>
&gt;&gt; I tested the change and it works.  My change also works and<br>
&gt;&gt; should compile to identical code.  I forgot there was a standard<br>
&gt;&gt; function to do the rounding.<br>
&gt; Right, my bad, thank you for correcting my thinko.<br>
&gt; <br>
&gt;&gt;<br>
&gt;&gt;&gt; For final resolving of deadlocks, after a full day of digging, I&#39;m very much<br>
&gt;&gt;&gt;&gt; incline  of adding -znow to the linker flags for libthr.so (and maybe also<br>
&gt;&gt;&gt;&gt; for ld-elf.so). The runtime cost of resolving all symbols at startup is very<br>
&gt;&gt;&gt;&gt; low. Direct pre-solving in _thr_rtld_init() is problematic for the _aeabi_*<br>
&gt;&gt;&gt;&gt; symbols, since they don&#39;t have an official C prototypes, and some are not<br>
&gt;&gt;&gt;&gt; compatible with C calling conventions.<br>
&gt;&gt;&gt; I do not like it. `-z now&#39; changes (breaks) the ABI and makes some symbols<br>
&gt;&gt;&gt; not preemtible.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; In the worst case, we would need a call to the asm routine which causes the<br>
&gt;&gt;&gt; resolution of the _eabi_* symbols on arm.<br>
&gt;&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; It would also be possible to link libthr with libgcc.a and use a linker map<br>
&gt;&gt; to hide the _eabi_ symbols.<br>
&gt; In principle yes, but if the ARM ABI states that _eabi symbols must be used,<br>
&gt; and exported from libc, then this is also some form of ABI breakage.<br>
<br>
I hope that <a href="https://reviews.freebsd.org/D46104" rel="noreferrer" target="_blank">https://reviews.freebsd.org/D46104</a>; is acceptable :)<br></blockquote><div><br></div><div>Can&#39;t speak for kib, but it looks good to my eye (though I agree with his naming</div><div>quibble). And helps avoid -znow, though I could have gone either way on that.</div><div>It&#39;s also simple enough not to be a burden.</div><div><br></div><div>Warner </div></div></div>
help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqQxUtjN6u=t08UovSMK1y_OPvEYA2SFjT08zgs9Vpabw>