Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Jan 2024 08:53:54 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Guido Falsi <mad@madpilot.net>
Cc:        Nathan Reilly-list <lists@nreilly.com>, emulation@freebsd.org,  "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>, freebsd-pkg@freebsd.org
Subject:   Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))
Message-ID:  <CANCZdfpUFYHeqkw7RdeyO=394PVfENfr8RbsG-SrrhAr8_2=Zw@mail.gmail.com>
In-Reply-To: <990427ae-0491-463e-92c7-c74700deb6fa@madpilot.net>
References:  <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <CANCZdfr-0w2EMa=_hFT3p4gFSDO-P1Yf8Vb-1eLiwRVomo1Jfg@mail.gmail.com> <a1845758-3535-4aa0-9274-d3b13dd3801b@madpilot.net> <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> <D2DD631F-8AED-48B7-8FB3-86F93BA707F2@nreilly.com> <CANCZdfqELPcaCj-d%2BLj_qocR6gMiHp1RL1Y92myq=TnR-W6Y1w@mail.gmail.com> <e434f5a7-5730-498e-b54d-b01310f95f7a@madpilot.net> <990427ae-0491-463e-92c7-c74700deb6fa@madpilot.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000cd7156061017a743
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, Jan 29, 2024, 8:48=E2=80=AFAM Guido Falsi <mad@madpilot.net> wrote:

> On 29/01/24 09:26, Guido Falsi wrote:
> > On 29/01/24 02:10, Warner Losh wrote:
> >>
> >>
> >> On Sun, Jan 28, 2024 at 4:45=E2=80=AFPM Nathan Reilly-list <lists@nrei=
lly.com
> >> <mailto:lists@nreilly.com>> wrote:
> >>
> >>
> >>
> >>>     On 29 Jan 2024, at 8:43=E2=80=AFam, Guido Falsi <mad@madpilot.net
> >>>     <mailto:mad@madpilot.net>> wrote:
> >>>     On 28/01/24 22:34, Guido Falsi wrote:
> >>>>     On 28/01/24 22:23, Warner Losh wrote:
> >>>>>     On Sun, Jan 28, 2024, 12:38=E2=80=AFPM Guido Falsi <mad@madpilo=
t.net
> >>>>>     <mailto:mad@madpilot.net> <mailto:mad@madpilot.net
> >>>>>     <mailto:mad@madpilot.net>>> wrote:
> >>>>>
> >>>>>         On 28/01/24 15:15, Guido Falsi wrote:
> >>>>>         [snip]
> >>>>>          > Creating repository in /tmp/packages:   0%
> >>>>>          >
> >>>>>
> >>>>>         BTW, forgot to mention last time this worked without issue
> >>>>>     was around
> >>>>>         20th December.
> >>>>>
> >>>>>
> >>>>>     I think this is a bsd-user issue. There is a race somewhere in
> >>>>>     that code that causes the hangs. I'd love a reproducible test
> >>>>>     case that is somewhat smaller than python... there are bigger
> >>>>>     races with the newer stuff and I've not had the time to chase i=
t
> >>>>>     there either. =F0=9F=98=9E
> >>>>     First of all thanks for your feedback. It encourages me having
> >>>>     someone else with better knowledge about this confirm that a rac=
e
> >>>>     condition is actually a possible cause!
> >>>>     Strange this has not been happening up to mid December.
> >>>>     My main and fully reproducible use case is actually mostly with
> >>>> pkg.
> >>>>     at the end of the run poudriere runs `pkg repo` to create the
> >>>>     meta files and sign the repo. It forks itself (ncpus + 2 I guess=
,
> >>>>     even forcing it to 1 worker I see three processes), and then
> >>>>     locks up, with all the processes stopping using CPU (ps output i=
s
> >>>>     in my message)
> >>>>     I guess this can be reproduced with any poudriere repo with at
> >>>>     least more than ncpus packages in it. can also be reproduced
> >>>>     using `poudriere pkgclean -u <etc>`
> >>>>     If that does not work I'm not sure how to reproduce it in other
> >>>>     ways, but I can try  writing some code mocking what pkg seems to
> >>>>     be doing, not an expert at such things, though.
> >>>
> >>>     In case it helps further norrow doen things, It looks like the
> >>>     lockup is happening somewhere around here:
> >>>
> >>>
> >>>
> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e=
e82/libpkg/pkg_repo_create.c#L778
> <
> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e=
e82/libpkg/pkg_repo_create.c#L778
> >
> >>>
> >>>     and/or in the pkg_create_repo_worker() function here:
> >>>
> >>>
> >>>
> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e=
e82/libpkg/pkg_repo_create.c#L341
> <
> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e=
e82/libpkg/pkg_repo_create.c#L341
> >
> >>>
> >>>
> >>>     (I'm trying to spare you the time needed to find the actual code
> >>>     being executed, I guess you would have identified this in a few
> >>>     minutes yourself, but I'm trying to make myself useful)
> >>
> >>
> >>     There appears to be a GitHub issue for poudriere with this, but
> >>     seems to be looking in another direction.
> >>
> >>     https://github.com/freebsd/poudriere/issues/1009
> >>     <https://github.com/freebsd/poudriere/issues/1009>;
> >>
> >
> > This one looks quite similar.
> >
> > In my case the ports/pkg are aligned between host and jail, in fact I
> > have built them from the exact same git checkout.
> >
> > I noticed pkg head has been converted to using pthreads instead of fork=
,
> > maybe that could help. I will make time to perform some testing.
>
> Thanks for pointing me here, it looks like this was "it", in that by
> fixing this issue it uses native pkg-static, and sidesteps the issue.
>
>
> Unluckily there ARE qemu races and lockups that prevent arm64 pkg-static
> binary to be correctly emulated by qemu-user-static. such conditions
> also cause sporadic failures in some ports being built.
>
> I filed a PR with a fix for that issue:
>
> https://github.com/freebsd/poudriere/pull/1115


Ok. This dodges the problem. But it papers over things.

Any chance you could give me the state of pkg before + the package added as
a test case for qemu?

Warner


>
> --
> Guido Falsi <mad@madpilot.net>
>
>

--000000000000cd7156061017a743
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Mon, Jan 29, 2024, 8:48=E2=80=AFAM Guido Falsi &lt;=
<a href=3D"mailto:mad@madpilot.net">mad@madpilot.net</a>&gt; wrote:<br></di=
v><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex">On 29/01/24 09:26, Guido Falsi wrote:<br>
&gt; On 29/01/24 02:10, Warner Losh wrote:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Sun, Jan 28, 2024 at 4:45=E2=80=AFPM Nathan Reilly-list &lt;<a =
href=3D"mailto:lists@nreilly.com" target=3D"_blank" rel=3D"noreferrer">list=
s@nreilly.com</a> <br>
&gt;&gt; &lt;mailto:<a href=3D"mailto:lists@nreilly.com" target=3D"_blank" =
rel=3D"noreferrer">lists@nreilly.com</a>&gt;&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 On 29 Jan 2024, at 8:43=E2=80=AFam, Guido F=
alsi &lt;<a href=3D"mailto:mad@madpilot.net" target=3D"_blank" rel=3D"noref=
errer">mad@madpilot.net</a><br>
&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 &lt;mailto:<a href=3D"mailto:mad@madpilot.n=
et" target=3D"_blank" rel=3D"noreferrer">mad@madpilot.net</a>&gt;&gt; wrote=
:<br>
&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 On 28/01/24 22:34, Guido Falsi wrote:<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 On 28/01/24 22:23, Warner Losh wrote:<b=
r>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 On Sun, Jan 28, 2024, 12:38=E2=80=
=AFPM Guido Falsi &lt;<a href=3D"mailto:mad@madpilot.net" target=3D"_blank"=
 rel=3D"noreferrer">mad@madpilot.net</a><br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 &lt;mailto:<a href=3D"mailto:mad@ma=
dpilot.net" target=3D"_blank" rel=3D"noreferrer">mad@madpilot.net</a>&gt; &=
lt;mailto:<a href=3D"mailto:mad@madpilot.net" target=3D"_blank" rel=3D"nore=
ferrer">mad@madpilot.net</a><br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 &lt;mailto:<a href=3D"mailto:mad@ma=
dpilot.net" target=3D"_blank" rel=3D"noreferrer">mad@madpilot.net</a>&gt;&g=
t;&gt; wrote:<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 On 28/01/24 15:1=
5, Guido Falsi wrote:<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 [snip]<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0 &gt; Creat=
ing repository in /tmp/packages:=C2=A0=C2=A0 0%<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0 &gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 BTW, forgot to m=
ention last time this worked without issue<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 was around<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 20th December.<b=
r>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 I think this is a bsd-user issue. T=
here is a race somewhere in<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 that code that causes the hangs. I&=
#39;d love a reproducible test<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 case that is somewhat smaller than =
python... there are bigger<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 races with the newer stuff and I&#3=
9;ve not had the time to chase it<br>
&gt;&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 there either. =F0=9F=98=9E<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 First of all thanks for your feedback. =
It encourages me having<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 someone else with better knowledge abou=
t this confirm that a race<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 condition is actually a possible cause!=
<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 Strange this has not been happening up =
to mid December.<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 My main and fully reproducible use case=
 is actually mostly with <br>
&gt;&gt;&gt;&gt; pkg.<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 at the end of the run poudriere runs `p=
kg repo` to create the<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 meta files and sign the repo. It forks =
itself (ncpus + 2 I guess,<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 even forcing it to 1 worker I see three=
 processes), and then<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 locks up, with all the processes stoppi=
ng using CPU (ps output is<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 in my message)<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 I guess this can be reproduced with any=
 poudriere repo with at<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 least more than ncpus packages in it. c=
an also be reproduced<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 using `poudriere pkgclean -u &lt;etc&gt=
;`<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 If that does not work I&#39;m not sure =
how to reproduce it in other<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 ways, but I can try=C2=A0 writing some =
code mocking what pkg seems to<br>
&gt;&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 be doing, not an expert at such things,=
 though.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 In case it helps further norrow doen things=
, It looks like the<br>
&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 lockup is happening somewhere around here:<=
br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;=C2=A0 =C2=A0 =C2=A0<br>
&gt;&gt;&gt; <a href=3D"https://github.com/freebsd/pkg/blob/56fa3f87d9d9644=
348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778" rel=3D"noreferrer =
noreferrer" target=3D"_blank">https://github.com/freebsd/pkg/blob/56fa3f87d=
9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778</a> &lt;<a hr=
ef=3D"https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a=
860ee82/libpkg/pkg_repo_create.c#L778" rel=3D"noreferrer noreferrer" target=
=3D"_blank">https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd=
8af47a860ee82/libpkg/pkg_repo_create.c#L778</a>&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 and/or in the pkg_create_repo_worker() func=
tion here:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;=C2=A0 =C2=A0 =C2=A0<br>
&gt;&gt;&gt; <a href=3D"https://github.com/freebsd/pkg/blob/56fa3f87d9d9644=
348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341" rel=3D"noreferrer =
noreferrer" target=3D"_blank">https://github.com/freebsd/pkg/blob/56fa3f87d=
9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341</a> &lt;<a hr=
ef=3D"https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a=
860ee82/libpkg/pkg_repo_create.c#L341" rel=3D"noreferrer noreferrer" target=
=3D"_blank">https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd=
8af47a860ee82/libpkg/pkg_repo_create.c#L341</a>&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 (I&#39;m trying to spare you the time neede=
d to find the actual code<br>
&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 being executed, I guess you would have iden=
tified this in a few<br>
&gt;&gt;&gt; =C2=A0=C2=A0=C2=A0 minutes yourself, but I&#39;m trying to mak=
e myself useful)<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; =C2=A0=C2=A0=C2=A0 There appears to be a GitHub issue for poudrier=
e=C2=A0with this, but<br>
&gt;&gt; =C2=A0=C2=A0=C2=A0 seems to be looking in another direction.<br>
&gt;&gt;<br>
&gt;&gt; =C2=A0=C2=A0=C2=A0 <a href=3D"https://github.com/freebsd/poudriere=
/issues/1009" rel=3D"noreferrer noreferrer" target=3D"_blank">https://githu=
b.com/freebsd/poudriere/issues/1009</a><br>
&gt;&gt; =C2=A0=C2=A0=C2=A0 &lt;<a href=3D"https://github.com/freebsd/poudr=
iere/issues/1009" rel=3D"noreferrer noreferrer" target=3D"_blank">https://g=
ithub.com/freebsd/poudriere/issues/1009</a>&gt;<br>
&gt;&gt;<br>
&gt; <br>
&gt; This one looks quite similar.<br>
&gt; <br>
&gt; In my case the ports/pkg are aligned between host and jail, in fact I =
<br>
&gt; have built them from the exact same git checkout.<br>
&gt; <br>
&gt; I noticed pkg head has been converted to using pthreads instead of for=
k, <br>
&gt; maybe that could help. I will make time to perform some testing.<br>
<br>
Thanks for pointing me here, it looks like this was &quot;it&quot;, in that=
 by <br>
fixing this issue it uses native pkg-static, and sidesteps the issue.<br>
<br>
<br>
Unluckily there ARE qemu races and lockups that prevent arm64 pkg-static <b=
r>
binary to be correctly emulated by qemu-user-static. such conditions <br>
also cause sporadic failures in some ports being built.<br>
<br>
I filed a PR with a fix for that issue:<br>
<br>
<a href=3D"https://github.com/freebsd/poudriere/pull/1115" rel=3D"noreferre=
r noreferrer" target=3D"_blank">https://github.com/freebsd/poudriere/pull/1=
115</a></blockquote></div></div><div dir=3D"auto"><br></div><div dir=3D"aut=
o">Ok. This dodges the problem. But it papers over things.</div><div dir=3D=
"auto"><br></div><div dir=3D"auto">Any chance you could give me the state o=
f pkg before + the package added as a test case for qemu?</div><div dir=3D"=
auto"><br></div><div dir=3D"auto">Warner</div><div dir=3D"auto"><br></div><=
div dir=3D"auto"><div class=3D"gmail_quote"><blockquote class=3D"gmail_quot=
e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">=
<br>
<br>
-- <br>
Guido Falsi &lt;<a href=3D"mailto:mad@madpilot.net" target=3D"_blank" rel=
=3D"noreferrer">mad@madpilot.net</a>&gt;<br>
<br>
</blockquote></div></div></div>

--000000000000cd7156061017a743--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpUFYHeqkw7RdeyO=394PVfENfr8RbsG-SrrhAr8_2=Zw>