Date: Tue, 23 Jul 2024 14:54:46 -0600 From: Warner Losh <imp@bsdimp.com> To: John F Carr <jfc@mit.edu> Cc: "mmel@freebsd.org" <mmel@freebsd.org>, Konstantin Belousov <kib@freebsd.org>, Mark Millard <marklmi@yahoo.com>, FreeBSD Current <freebsd-current@freebsd.org>, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org> Subject: Re: armv7-on-aarch64 stuck at urdlck Message-ID: <CANCZdfrc0O6YEx2pHtC=h=1K5_O=riUP05ktpjSimXj88ixaCA@mail.gmail.com> In-Reply-To: <FABF7440-70D2-4BAB-8B0B-4CA950CFFA60@mit.edu> References: <724db42b-5550-4381-8277-2971e6b3e8f1@freebsd.org> <B5E2275D-21F0-43C8-AF06-A45DB7448D66@yahoo.com> <86185657-e521-466b-89e2-f291aaac10a6@freebsd.org> <0EF18174-8735-46A4-BD71-FFA3472B319F@yahoo.com> <a1b978fe-ff54-4112-860c-b09500d89d0b@freebsd.org> <C0B42CBB-8F12-4597-A04B-26F2107E176E@yahoo.com> <33251aa3-681f-4d17-afe9-953490afeaf0@gmail.com> <0DD19771-3AAB-469E-981B-1203F1C28233@yahoo.com> <be023545-2b25-49ec-b6f1-9e05cd402646@gmail.com> <Zp95qtxK0CeDdp-d@kib.kiev.ua> <6a969609-fa0e-419d-83d5-e4fcf0f6ec35@freebsd.org> <FABF7440-70D2-4BAB-8B0B-4CA950CFFA60@mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000d2acd8061df05f05 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Jul 23, 2024 at 2:11=E2=80=AFPM John F Carr <jfc@mit.edu> wrote: > On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com> wrote: > > > > On 23.07.2024 11:36, Konstantin Belousov wrote: > >> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote: > >>> The good news is that I'm finally able to generate a working/locking > >>> test case. The culprit (at least for me) is if "-mcpu" is used when > >>> compiling libthr (e.g. indirectly injected via CPUTYPE in > /etc/make.conf). > >>> If it is not used, libthr is broken (regardless of -O level or > debug/normal > >>> build), but -mcpu=3Dcortex-a15 will always produce a working libthr. > >> I think this is very significant progress. > >> Do you plan to drill down more to see what is going on? > > > > So the problem is now clear, and I fear it may apply to other > architectures as well. > > dlopen_object() (from rtld_elf), > > https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766, > > holds the rtld_bind_lock write lock for almost the entire time a new > library is loaded. > > If the code uses a yet unresolved symbol to load the library, the > rtl_bind() function attempts to get read lock of rtld_bind_lock and a > deadlock occurs. > > > > In this case, it round_up() in _thr_stack_fix_protection, > > https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136. > > Issued by __aeabi_uidiv (since not all armv7 processors support HW > divide). > > > > Unfortunately, I'm not sure how to fix it. The compiler can emit > __aeabi_<> in any place, and I'm not sure if it can resolve all the symbo= ls > used by rtld_eld and libthr beforehand. > > > > > > Michal > > > > In this case (but not for all _aeabi_ functions) we can avoid division > as long as page size is a power of 2. > > The function is > > static inline size_t > round_up(size_t size) > { > if (size % _thr_page_size !=3D 0) > size =3D ((size / _thr_page_size) + 1) * > _thr_page_size; > return size; > } > > The body can be condensed to > > return (size + _thr_page_size - 1) & ~(_thr_page_size - 1); > > This is shorter in both lines of code and instruction bytes. > I like this change... But we do need to fix the deadlocks... They seem to be more likely when building in bsd-user emulation... Warner --000000000000d2acd8061df05f05 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">= <div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jul 23, 2024 at 2:11=E2=80=AF= PM John F Carr <<a href=3D"mailto:jfc@mit.edu">jfc@mit.edu</a>> wrote= :<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.= 8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Jul 23, 202= 4, at 13:46, Michal Meloun <<a href=3D"mailto:meloun.michal@gmail.com" t= arget=3D"_blank">meloun.michal@gmail.com</a>> wrote:<br> > <br> > On 23.07.2024 11:36, Konstantin Belousov wrote:<br> >> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:<br> >>> The good news is that I'm finally able to generate a worki= ng/locking<br> >>> test case.=C2=A0 The culprit (at least for me) is if "-mc= pu" is used when<br> >>> compiling libthr (e.g. indirectly injected via CPUTYPE in /etc= /make.conf).<br> >>> If it is not used, libthr is broken (regardless of -O level or= debug/normal<br> >>> build), but -mcpu=3Dcortex-a15 will always produce a working l= ibthr.<br> >> I think this is very significant progress.<br> >> Do you plan to drill down more to see what is going on?<br> > <br> > So the problem is now clear, and I fear it may apply to other architec= tures as well.<br> > dlopen_object() (from rtld_elf),<br> > <a href=3D"https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n= 3766" rel=3D"noreferrer" target=3D"_blank">https://cgit.freebsd.org/src/tre= e/libexec/rtld-elf/rtld.c#n3766</a>,<br> > holds the rtld_bind_lock write lock for almost the entire time a new l= ibrary is loaded.<br> > If the code uses a yet unresolved symbol to load the library, the rtl_= bind() function attempts to get read lock of=C2=A0 rtld_bind_lock and a dea= dlock occurs.<br> > <br> > In this case, it round_up() in _thr_stack_fix_protection,<br> > <a href=3D"https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_sta= ck.c#n136" rel=3D"noreferrer" target=3D"_blank">https://cgit.freebsd.org/sr= c/tree/lib/libthr/thread/thr_stack.c#n136</a>.<br> > Issued by __aeabi_uidiv (since not all armv7 processors support HW div= ide).<br> > <br> > Unfortunately, I'm not sure how to fix it.=C2=A0 The compiler can = emit __aeabi_<> in any place, and I'm not sure if it can resolve = all the symbols used by rtld_eld and libthr beforehand.<br> > <br> > <br> > Michal<br> > <br> <br> In this case (but not for all _aeabi_ functions) we can avoid division<br> as long as page size is a power of 2.<br> <br> The function is<br> <br> =C2=A0 static inline size_t<br> =C2=A0 round_up(size_t size)<br> =C2=A0 {<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (size % _thr_page_size !=3D 0)<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 size =3D ((size / _= thr_page_size) + 1) *<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 _thr_= page_size;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 return size;<br> =C2=A0 }<br> <br> The body can be condensed to<br> <br> =C2=A0 return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);<br> <br> This is shorter in both lines of code and instruction bytes.<br></blockquot= e><div><br></div><div>I like this change...</div><div><br></div><div>But we= do need to fix the deadlocks... They seem to be more likely</div><div>when= building in bsd-user emulation...</div><div><br></div><div>Warner=C2=A0</d= iv></div></div> --000000000000d2acd8061df05f05--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrc0O6YEx2pHtC=h=1K5_O=riUP05ktpjSimXj88ixaCA>