Date: Mon, 29 Jan 2024 08:53:54 -0700 From: Warner Losh <imp@bsdimp.com> To: Guido Falsi <mad@madpilot.net> Cc: Nathan Reilly-list <lists@nreilly.com>, emulation@freebsd.org, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>, freebsd-pkg@freebsd.org Subject: Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) Message-ID: <CANCZdfpUFYHeqkw7RdeyO=394PVfENfr8RbsG-SrrhAr8_2=Zw@mail.gmail.com> In-Reply-To: <990427ae-0491-463e-92c7-c74700deb6fa@madpilot.net> References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <CANCZdfr-0w2EMa=_hFT3p4gFSDO-P1Yf8Vb-1eLiwRVomo1Jfg@mail.gmail.com> <a1845758-3535-4aa0-9274-d3b13dd3801b@madpilot.net> <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> <D2DD631F-8AED-48B7-8FB3-86F93BA707F2@nreilly.com> <CANCZdfqELPcaCj-d%2BLj_qocR6gMiHp1RL1Y92myq=TnR-W6Y1w@mail.gmail.com> <e434f5a7-5730-498e-b54d-b01310f95f7a@madpilot.net> <990427ae-0491-463e-92c7-c74700deb6fa@madpilot.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000cd7156061017a743 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Jan 29, 2024, 8:48=E2=80=AFAM Guido Falsi <mad@madpilot.net> wrote: > On 29/01/24 09:26, Guido Falsi wrote: > > On 29/01/24 02:10, Warner Losh wrote: > >> > >> > >> On Sun, Jan 28, 2024 at 4:45=E2=80=AFPM Nathan Reilly-list <lists@nrei= lly.com > >> <mailto:lists@nreilly.com>> wrote: > >> > >> > >> > >>> On 29 Jan 2024, at 8:43=E2=80=AFam, Guido Falsi <mad@madpilot.net > >>> <mailto:mad@madpilot.net>> wrote: > >>> On 28/01/24 22:34, Guido Falsi wrote: > >>>> On 28/01/24 22:23, Warner Losh wrote: > >>>>> On Sun, Jan 28, 2024, 12:38=E2=80=AFPM Guido Falsi <mad@madpilo= t.net > >>>>> <mailto:mad@madpilot.net> <mailto:mad@madpilot.net > >>>>> <mailto:mad@madpilot.net>>> wrote: > >>>>> > >>>>> On 28/01/24 15:15, Guido Falsi wrote: > >>>>> [snip] > >>>>> > Creating repository in /tmp/packages: 0% > >>>>> > > >>>>> > >>>>> BTW, forgot to mention last time this worked without issue > >>>>> was around > >>>>> 20th December. > >>>>> > >>>>> > >>>>> I think this is a bsd-user issue. There is a race somewhere in > >>>>> that code that causes the hangs. I'd love a reproducible test > >>>>> case that is somewhat smaller than python... there are bigger > >>>>> races with the newer stuff and I've not had the time to chase i= t > >>>>> there either. =F0=9F=98=9E > >>>> First of all thanks for your feedback. It encourages me having > >>>> someone else with better knowledge about this confirm that a rac= e > >>>> condition is actually a possible cause! > >>>> Strange this has not been happening up to mid December. > >>>> My main and fully reproducible use case is actually mostly with > >>>> pkg. > >>>> at the end of the run poudriere runs `pkg repo` to create the > >>>> meta files and sign the repo. It forks itself (ncpus + 2 I guess= , > >>>> even forcing it to 1 worker I see three processes), and then > >>>> locks up, with all the processes stopping using CPU (ps output i= s > >>>> in my message) > >>>> I guess this can be reproduced with any poudriere repo with at > >>>> least more than ncpus packages in it. can also be reproduced > >>>> using `poudriere pkgclean -u <etc>` > >>>> If that does not work I'm not sure how to reproduce it in other > >>>> ways, but I can try writing some code mocking what pkg seems to > >>>> be doing, not an expert at such things, though. > >>> > >>> In case it helps further norrow doen things, It looks like the > >>> lockup is happening somewhere around here: > >>> > >>> > >>> > https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e= e82/libpkg/pkg_repo_create.c#L778 > < > https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e= e82/libpkg/pkg_repo_create.c#L778 > > > >>> > >>> and/or in the pkg_create_repo_worker() function here: > >>> > >>> > >>> > https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e= e82/libpkg/pkg_repo_create.c#L341 > < > https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e= e82/libpkg/pkg_repo_create.c#L341 > > > >>> > >>> > >>> (I'm trying to spare you the time needed to find the actual code > >>> being executed, I guess you would have identified this in a few > >>> minutes yourself, but I'm trying to make myself useful) > >> > >> > >> There appears to be a GitHub issue for poudriere with this, but > >> seems to be looking in another direction. > >> > >> https://github.com/freebsd/poudriere/issues/1009 > >> <https://github.com/freebsd/poudriere/issues/1009> > >> > > > > This one looks quite similar. > > > > In my case the ports/pkg are aligned between host and jail, in fact I > > have built them from the exact same git checkout. > > > > I noticed pkg head has been converted to using pthreads instead of fork= , > > maybe that could help. I will make time to perform some testing. > > Thanks for pointing me here, it looks like this was "it", in that by > fixing this issue it uses native pkg-static, and sidesteps the issue. > > > Unluckily there ARE qemu races and lockups that prevent arm64 pkg-static > binary to be correctly emulated by qemu-user-static. such conditions > also cause sporadic failures in some ports being built. > > I filed a PR with a fix for that issue: > > https://github.com/freebsd/poudriere/pull/1115 Ok. This dodges the problem. But it papers over things. Any chance you could give me the state of pkg before + the package added as a test case for qemu? Warner > > -- > Guido Falsi <mad@madpilot.net> > > --000000000000cd7156061017a743 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" = class=3D"gmail_attr">On Mon, Jan 29, 2024, 8:48=E2=80=AFAM Guido Falsi <= <a href=3D"mailto:mad@madpilot.net">mad@madpilot.net</a>> wrote:<br></di= v><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:= 1px #ccc solid;padding-left:1ex">On 29/01/24 09:26, Guido Falsi wrote:<br> > On 29/01/24 02:10, Warner Losh wrote:<br> >><br> >><br> >> On Sun, Jan 28, 2024 at 4:45=E2=80=AFPM Nathan Reilly-list <<a = href=3D"mailto:lists@nreilly.com" target=3D"_blank" rel=3D"noreferrer">list= s@nreilly.com</a> <br> >> <mailto:<a href=3D"mailto:lists@nreilly.com" target=3D"_blank" = rel=3D"noreferrer">lists@nreilly.com</a>>> wrote:<br> >><br> >><br> >><br> >>> =C2=A0=C2=A0=C2=A0 On 29 Jan 2024, at 8:43=E2=80=AFam, Guido F= alsi <<a href=3D"mailto:mad@madpilot.net" target=3D"_blank" rel=3D"noref= errer">mad@madpilot.net</a><br> >>> =C2=A0=C2=A0=C2=A0 <mailto:<a href=3D"mailto:mad@madpilot.n= et" target=3D"_blank" rel=3D"noreferrer">mad@madpilot.net</a>>> wrote= :<br> >>> =C2=A0=C2=A0=C2=A0 On 28/01/24 22:34, Guido Falsi wrote:<br> >>>> =C2=A0=C2=A0=C2=A0 On 28/01/24 22:23, Warner Losh wrote:<b= r> >>>>> =C2=A0=C2=A0=C2=A0 On Sun, Jan 28, 2024, 12:38=E2=80= =AFPM Guido Falsi <<a href=3D"mailto:mad@madpilot.net" target=3D"_blank"= rel=3D"noreferrer">mad@madpilot.net</a><br> >>>>> =C2=A0=C2=A0=C2=A0 <mailto:<a href=3D"mailto:mad@ma= dpilot.net" target=3D"_blank" rel=3D"noreferrer">mad@madpilot.net</a>> &= lt;mailto:<a href=3D"mailto:mad@madpilot.net" target=3D"_blank" rel=3D"nore= ferrer">mad@madpilot.net</a><br> >>>>> =C2=A0=C2=A0=C2=A0 <mailto:<a href=3D"mailto:mad@ma= dpilot.net" target=3D"_blank" rel=3D"noreferrer">mad@madpilot.net</a>>&g= t;> wrote:<br> >>>>><br> >>>>> =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 On 28/01/24 15:1= 5, Guido Falsi wrote:<br> >>>>> =C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 [snip]<br> >>>>> =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0 > Creat= ing repository in /tmp/packages:=C2=A0=C2=A0 0%<br> >>>>> =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0 ><br> >>>>><br> >>>>> =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 BTW, forgot to m= ention last time this worked without issue<br> >>>>> =C2=A0=C2=A0=C2=A0 was around<br> >>>>> =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 20th December.<b= r> >>>>><br> >>>>><br> >>>>> =C2=A0=C2=A0=C2=A0 I think this is a bsd-user issue. T= here is a race somewhere in<br> >>>>> =C2=A0=C2=A0=C2=A0 that code that causes the hangs. I&= #39;d love a reproducible test<br> >>>>> =C2=A0=C2=A0=C2=A0 case that is somewhat smaller than = python... there are bigger<br> >>>>> =C2=A0=C2=A0=C2=A0 races with the newer stuff and I= 9;ve not had the time to chase it<br> >>>>> =C2=A0=C2=A0=C2=A0 there either. =F0=9F=98=9E<br> >>>> =C2=A0=C2=A0=C2=A0 First of all thanks for your feedback. = It encourages me having<br> >>>> =C2=A0=C2=A0=C2=A0 someone else with better knowledge abou= t this confirm that a race<br> >>>> =C2=A0=C2=A0=C2=A0 condition is actually a possible cause!= <br> >>>> =C2=A0=C2=A0=C2=A0 Strange this has not been happening up = to mid December.<br> >>>> =C2=A0=C2=A0=C2=A0 My main and fully reproducible use case= is actually mostly with <br> >>>> pkg.<br> >>>> =C2=A0=C2=A0=C2=A0 at the end of the run poudriere runs `p= kg repo` to create the<br> >>>> =C2=A0=C2=A0=C2=A0 meta files and sign the repo. It forks = itself (ncpus + 2 I guess,<br> >>>> =C2=A0=C2=A0=C2=A0 even forcing it to 1 worker I see three= processes), and then<br> >>>> =C2=A0=C2=A0=C2=A0 locks up, with all the processes stoppi= ng using CPU (ps output is<br> >>>> =C2=A0=C2=A0=C2=A0 in my message)<br> >>>> =C2=A0=C2=A0=C2=A0 I guess this can be reproduced with any= poudriere repo with at<br> >>>> =C2=A0=C2=A0=C2=A0 least more than ncpus packages in it. c= an also be reproduced<br> >>>> =C2=A0=C2=A0=C2=A0 using `poudriere pkgclean -u <etc>= ;`<br> >>>> =C2=A0=C2=A0=C2=A0 If that does not work I'm not sure = how to reproduce it in other<br> >>>> =C2=A0=C2=A0=C2=A0 ways, but I can try=C2=A0 writing some = code mocking what pkg seems to<br> >>>> =C2=A0=C2=A0=C2=A0 be doing, not an expert at such things,= though.<br> >>><br> >>> =C2=A0=C2=A0=C2=A0 In case it helps further norrow doen things= , It looks like the<br> >>> =C2=A0=C2=A0=C2=A0 lockup is happening somewhere around here:<= br> >>><br> >>>=C2=A0 =C2=A0 =C2=A0<br> >>> <a href=3D"https://github.com/freebsd/pkg/blob/56fa3f87d9d9644= 348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778" rel=3D"noreferrer = noreferrer" target=3D"_blank">https://github.com/freebsd/pkg/blob/56fa3f87d= 9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778</a> <<a hr= ef=3D"https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a= 860ee82/libpkg/pkg_repo_create.c#L778" rel=3D"noreferrer noreferrer" target= =3D"_blank">https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd= 8af47a860ee82/libpkg/pkg_repo_create.c#L778</a>><br> >>><br> >>> =C2=A0=C2=A0=C2=A0 and/or in the pkg_create_repo_worker() func= tion here:<br> >>><br> >>>=C2=A0 =C2=A0 =C2=A0<br> >>> <a href=3D"https://github.com/freebsd/pkg/blob/56fa3f87d9d9644= 348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341" rel=3D"noreferrer = noreferrer" target=3D"_blank">https://github.com/freebsd/pkg/blob/56fa3f87d= 9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341</a> <<a hr= ef=3D"https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a= 860ee82/libpkg/pkg_repo_create.c#L341" rel=3D"noreferrer noreferrer" target= =3D"_blank">https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd= 8af47a860ee82/libpkg/pkg_repo_create.c#L341</a>><br> >>><br> >>><br> >>> =C2=A0=C2=A0=C2=A0 (I'm trying to spare you the time neede= d to find the actual code<br> >>> =C2=A0=C2=A0=C2=A0 being executed, I guess you would have iden= tified this in a few<br> >>> =C2=A0=C2=A0=C2=A0 minutes yourself, but I'm trying to mak= e myself useful)<br> >><br> >><br> >> =C2=A0=C2=A0=C2=A0 There appears to be a GitHub issue for poudrier= e=C2=A0with this, but<br> >> =C2=A0=C2=A0=C2=A0 seems to be looking in another direction.<br> >><br> >> =C2=A0=C2=A0=C2=A0 <a href=3D"https://github.com/freebsd/poudriere= /issues/1009" rel=3D"noreferrer noreferrer" target=3D"_blank">https://githu= b.com/freebsd/poudriere/issues/1009</a><br> >> =C2=A0=C2=A0=C2=A0 <<a href=3D"https://github.com/freebsd/poudr= iere/issues/1009" rel=3D"noreferrer noreferrer" target=3D"_blank">https://g= ithub.com/freebsd/poudriere/issues/1009</a>><br> >><br> > <br> > This one looks quite similar.<br> > <br> > In my case the ports/pkg are aligned between host and jail, in fact I = <br> > have built them from the exact same git checkout.<br> > <br> > I noticed pkg head has been converted to using pthreads instead of for= k, <br> > maybe that could help. I will make time to perform some testing.<br> <br> Thanks for pointing me here, it looks like this was "it", in that= by <br> fixing this issue it uses native pkg-static, and sidesteps the issue.<br> <br> <br> Unluckily there ARE qemu races and lockups that prevent arm64 pkg-static <b= r> binary to be correctly emulated by qemu-user-static. such conditions <br> also cause sporadic failures in some ports being built.<br> <br> I filed a PR with a fix for that issue:<br> <br> <a href=3D"https://github.com/freebsd/poudriere/pull/1115" rel=3D"noreferre= r noreferrer" target=3D"_blank">https://github.com/freebsd/poudriere/pull/1= 115</a></blockquote></div></div><div dir=3D"auto"><br></div><div dir=3D"aut= o">Ok. This dodges the problem. But it papers over things.</div><div dir=3D= "auto"><br></div><div dir=3D"auto">Any chance you could give me the state o= f pkg before + the package added as a test case for qemu?</div><div dir=3D"= auto"><br></div><div dir=3D"auto">Warner</div><div dir=3D"auto"><br></div><= div dir=3D"auto"><div class=3D"gmail_quote"><blockquote class=3D"gmail_quot= e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">= <br> <br> -- <br> Guido Falsi <<a href=3D"mailto:mad@madpilot.net" target=3D"_blank" rel= =3D"noreferrer">mad@madpilot.net</a>><br> <br> </blockquote></div></div></div> --000000000000cd7156061017a743--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpUFYHeqkw7RdeyO=394PVfENfr8RbsG-SrrhAr8_2=Zw>