Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Jul 2024 14:54:46 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        John F Carr <jfc@mit.edu>
Cc:        "mmel@freebsd.org" <mmel@freebsd.org>, Konstantin Belousov <kib@freebsd.org>, Mark Millard <marklmi@yahoo.com>,  FreeBSD Current <freebsd-current@freebsd.org>,  "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
Subject:   Re: armv7-on-aarch64 stuck at urdlck
Message-ID:  <CANCZdfrc0O6YEx2pHtC=h=1K5_O=riUP05ktpjSimXj88ixaCA@mail.gmail.com>
In-Reply-To: <FABF7440-70D2-4BAB-8B0B-4CA950CFFA60@mit.edu>
References:  <724db42b-5550-4381-8277-2971e6b3e8f1@freebsd.org> <B5E2275D-21F0-43C8-AF06-A45DB7448D66@yahoo.com> <86185657-e521-466b-89e2-f291aaac10a6@freebsd.org> <0EF18174-8735-46A4-BD71-FFA3472B319F@yahoo.com> <a1b978fe-ff54-4112-860c-b09500d89d0b@freebsd.org> <C0B42CBB-8F12-4597-A04B-26F2107E176E@yahoo.com> <33251aa3-681f-4d17-afe9-953490afeaf0@gmail.com> <0DD19771-3AAB-469E-981B-1203F1C28233@yahoo.com> <be023545-2b25-49ec-b6f1-9e05cd402646@gmail.com> <Zp95qtxK0CeDdp-d@kib.kiev.ua> <6a969609-fa0e-419d-83d5-e4fcf0f6ec35@freebsd.org> <FABF7440-70D2-4BAB-8B0B-4CA950CFFA60@mit.edu>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Tue, Jul 23, 2024 at 2:11 PM John F Carr <jfc@mit.edu> wrote:

> On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com> wrote:
> >
> > On 23.07.2024 11:36, Konstantin Belousov wrote:
> >> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
> >>> The good news is that I'm finally able to generate a working/locking
> >>> test case.  The culprit (at least for me) is if "-mcpu" is used when
> >>> compiling libthr (e.g. indirectly injected via CPUTYPE in
> /etc/make.conf).
> >>> If it is not used, libthr is broken (regardless of -O level or
> debug/normal
> >>> build), but -mcpu=cortex-a15 will always produce a working libthr.
> >> I think this is very significant progress.
> >> Do you plan to drill down more to see what is going on?
> >
> > So the problem is now clear, and I fear it may apply to other
> architectures as well.
> > dlopen_object() (from rtld_elf),
> > https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
> > holds the rtld_bind_lock write lock for almost the entire time a new
> library is loaded.
> > If the code uses a yet unresolved symbol to load the library, the
> rtl_bind() function attempts to get read lock of  rtld_bind_lock and a
> deadlock occurs.
> >
> > In this case, it round_up() in _thr_stack_fix_protection,
> > https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
> > Issued by __aeabi_uidiv (since not all armv7 processors support HW
> divide).
> >
> > Unfortunately, I'm not sure how to fix it.  The compiler can emit
> __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols
> used by rtld_eld and libthr beforehand.
> >
> >
> > Michal
> >
>
> In this case (but not for all _aeabi_ functions) we can avoid division
> as long as page size is a power of 2.
>
> The function is
>
>   static inline size_t
>   round_up(size_t size)
>   {
>         if (size % _thr_page_size != 0)
>                 size = ((size / _thr_page_size) + 1) *
>                     _thr_page_size;
>         return size;
>   }
>
> The body can be condensed to
>
>   return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);
>
> This is shorter in both lines of code and instruction bytes.
>

I like this change...

But we do need to fix the deadlocks... They seem to be more likely
when building in bsd-user emulation...

Warner

[-- Attachment #2 --]
<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jul 23, 2024 at 2:11 PM John F Carr &lt;<a href="mailto:jfc@mit.edu">jfc@mit.edu</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Jul 23, 2024, at 13:46, Michal Meloun &lt;<a href="mailto:meloun.michal@gmail.com" target="_blank">meloun.michal@gmail.com</a>&gt; wrote:<br>
&gt; <br>
&gt; On 23.07.2024 11:36, Konstantin Belousov wrote:<br>
&gt;&gt; On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:<br>
&gt;&gt;&gt; The good news is that I&#39;m finally able to generate a working/locking<br>
&gt;&gt;&gt; test case.  The culprit (at least for me) is if &quot;-mcpu&quot; is used when<br>
&gt;&gt;&gt; compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf).<br>
&gt;&gt;&gt; If it is not used, libthr is broken (regardless of -O level or debug/normal<br>
&gt;&gt;&gt; build), but -mcpu=cortex-a15 will always produce a working libthr.<br>
&gt;&gt; I think this is very significant progress.<br>
&gt;&gt; Do you plan to drill down more to see what is going on?<br>
&gt; <br>
&gt; So the problem is now clear, and I fear it may apply to other architectures as well.<br>
&gt; dlopen_object() (from rtld_elf),<br>
&gt; <a href="https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766" rel="noreferrer" target="_blank">https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766</a>,<br>;
&gt; holds the rtld_bind_lock write lock for almost the entire time a new library is loaded.<br>
&gt; If the code uses a yet unresolved symbol to load the library, the rtl_bind() function attempts to get read lock of  rtld_bind_lock and a deadlock occurs.<br>
&gt; <br>
&gt; In this case, it round_up() in _thr_stack_fix_protection,<br>
&gt; <a href="https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136" rel="noreferrer" target="_blank">https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136</a>.<br>;
&gt; Issued by __aeabi_uidiv (since not all armv7 processors support HW divide).<br>
&gt; <br>
&gt; Unfortunately, I&#39;m not sure how to fix it.  The compiler can emit __aeabi_&lt;&gt; in any place, and I&#39;m not sure if it can resolve all the symbols used by rtld_eld and libthr beforehand.<br>
&gt; <br>
&gt; <br>
&gt; Michal<br>
&gt; <br>
<br>
In this case (but not for all _aeabi_ functions) we can avoid division<br>
as long as page size is a power of 2.<br>
<br>
The function is<br>
<br>
  static inline size_t<br>
  round_up(size_t size)<br>
  {<br>
        if (size % _thr_page_size != 0)<br>
                size = ((size / _thr_page_size) + 1) *<br>
                    _thr_page_size;<br>
        return size;<br>
  }<br>
<br>
The body can be condensed to<br>
<br>
  return (size + _thr_page_size - 1) &amp; ~(_thr_page_size - 1);<br>
<br>
This is shorter in both lines of code and instruction bytes.<br></blockquote><div><br></div><div>I like this change...</div><div><br></div><div>But we do need to fix the deadlocks... They seem to be more likely</div><div>when building in bsd-user emulation...</div><div><br></div><div>Warner </div></div></div>
help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrc0O6YEx2pHtC=h=1K5_O=riUP05ktpjSimXj88ixaCA>