Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 28 Jan 2024 18:10:45 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Nathan Reilly-list <lists@nreilly.com>
Cc:        Guido Falsi <mad@madpilot.net>, emulation@freebsd.org,  "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>, freebsd-pkg@freebsd.org
Subject:   Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))
Message-ID:  <CANCZdfqELPcaCj-d%2BLj_qocR6gMiHp1RL1Y92myq=TnR-W6Y1w@mail.gmail.com>
In-Reply-To: <D2DD631F-8AED-48B7-8FB3-86F93BA707F2@nreilly.com>
References:  <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <CANCZdfr-0w2EMa=_hFT3p4gFSDO-P1Yf8Vb-1eLiwRVomo1Jfg@mail.gmail.com> <a1845758-3535-4aa0-9274-d3b13dd3801b@madpilot.net> <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> <D2DD631F-8AED-48B7-8FB3-86F93BA707F2@nreilly.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000064a28406100b5186
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sun, Jan 28, 2024 at 4:45=E2=80=AFPM Nathan Reilly-list <lists@nreilly.c=
om>
wrote:

>
>
> On 29 Jan 2024, at 8:43=E2=80=AFam, Guido Falsi <mad@madpilot.net> wrote:
> On 28/01/24 22:34, Guido Falsi wrote:
>
> On 28/01/24 22:23, Warner Losh wrote:
>
> On Sun, Jan 28, 2024, 12:38=E2=80=AFPM Guido Falsi <mad@madpilot.net <mai=
lto:
> mad@madpilot.net>> wrote:
>
>     On 28/01/24 15:15, Guido Falsi wrote:
>     [snip]
>      > Creating repository in /tmp/packages:   0%
>      >
>
>     BTW, forgot to mention last time this worked without issue was around
>     20th December.
>
>
> I think this is a bsd-user issue. There is a race somewhere in that code
> that causes the hangs. I'd love a reproducible test case that is somewhat
> smaller than python... there are bigger races with the newer stuff and I'=
ve
> not had the time to chase it there either. =F0=9F=98=9E
>
> First of all thanks for your feedback. It encourages me having someone
> else with better knowledge about this confirm that a race condition is
> actually a possible cause!
> Strange this has not been happening up to mid December.
> My main and fully reproducible use case is actually mostly with pkg.
> at the end of the run poudriere runs `pkg repo` to create the meta files
> and sign the repo. It forks itself (ncpus + 2 I guess, even forcing it to=
 1
> worker I see three processes), and then locks up, with all the processes
> stopping using CPU (ps output is in my message)
> I guess this can be reproduced with any poudriere repo with at least more
> than ncpus packages in it. can also be reproduced using `poudriere pkgcle=
an
> -u <etc>`
> If that does not work I'm not sure how to reproduce it in other ways, but
> I can try  writing some code mocking what pkg seems to be doing, not an
> expert at such things, though.
>
>
> In case it helps further norrow doen things, It looks like the lockup is
> happening somewhere around here:
>
>
> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e=
e82/libpkg/pkg_repo_create.c#L778
>
> and/or in the pkg_create_repo_worker() function here:
>
>
> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e=
e82/libpkg/pkg_repo_create.c#L341
>
>
> (I'm trying to spare you the time needed to find the actual code being
> executed, I guess you would have identified this in a few minutes yoursel=
f,
> but I'm trying to make myself useful)
>
>
>
> There appears to be a GitHub issue for poudriere with this, but seems to
> be looking in another direction.
>
> https://github.com/freebsd/poudriere/issues/1009
>

There's a FreeBSD bug saying this is happening w/o qemu in the loop.
https://bugs.freebsd.org/276690 at least I think that's similar.

Warner

--00000000000064a28406100b5186
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Sun, Jan 28, 2024 at 4:45=E2=80=AF=
PM Nathan Reilly-list &lt;<a href=3D"mailto:lists@nreilly.com">lists@nreill=
y.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:=
1ex"><div><br id=3D"m_2983618753015372262lineBreakAtBeginningOfMessage"><di=
v><br><blockquote type=3D"cite"><div>On 29 Jan 2024, at 8:43=E2=80=AFam, Gu=
ido Falsi &lt;<a href=3D"mailto:mad@madpilot.net" target=3D"_blank">mad@mad=
pilot.net</a>&gt; wrote:</div><div><div>On 28/01/24 22:34, Guido Falsi wrot=
e:<br><blockquote type=3D"cite">On 28/01/24 22:23, Warner Losh wrote:<block=
quote type=3D"cite">On Sun, Jan 28, 2024, 12:38=E2=80=AFPM Guido Falsi &lt;=
<a href=3D"mailto:mad@madpilot.net" target=3D"_blank">mad@madpilot.net</a> =
&lt;mailto:<a href=3D"mailto:mad@madpilot.net" target=3D"_blank">mad@madpil=
ot.net</a>&gt;&gt; wrote:<br><br>=C2=A0=C2=A0=C2=A0 On 28/01/24 15:15, Guid=
o Falsi wrote:<br>=C2=A0 =C2=A0 [snip]<br>=C2=A0=C2=A0=C2=A0=C2=A0 &gt; Cre=
ating repository in /tmp/packages:=C2=A0=C2=A0 0%<br>=C2=A0=C2=A0=C2=A0=C2=
=A0 &gt;<br><br>=C2=A0=C2=A0=C2=A0 BTW, forgot to mention last time this wo=
rked without issue was around<br>=C2=A0=C2=A0=C2=A0 20th December.<br><br><=
br>I think this is a bsd-user issue. There is a race somewhere in that code=
 that causes the hangs. I&#39;d love a reproducible test case that is somew=
hat smaller than python... there are bigger races with the newer stuff and =
I&#39;ve not had the time to chase it there either. =F0=9F=98=9E<br></block=
quote>First of all thanks for your feedback. It encourages me having someon=
e else with better knowledge about this confirm that a race condition is ac=
tually a possible cause!<br>Strange this has not been happening up to mid D=
ecember.<br>My main and fully reproducible use case is actually mostly with=
 pkg.<br>at the end of the run poudriere runs `pkg repo` to create the meta=
 files and sign the repo. It forks itself (ncpus + 2 I guess, even forcing =
it to 1 worker I see three processes), and then locks up, with all the proc=
esses stopping using CPU (ps output is in my message)<br>I guess this can b=
e reproduced with any poudriere repo with at least more than ncpus packages=
 in it. can also be reproduced using `poudriere pkgclean -u &lt;etc&gt;`<br=
>If that does not work I&#39;m not sure how to reproduce it in other ways, =
but I can try=C2=A0 writing some code mocking what pkg seems to be doing, n=
ot an expert at such things, though.<br></blockquote><br>In case it helps f=
urther norrow doen things, It looks like the lockup is happening somewhere =
around here:<br><br><a href=3D"https://github.com/freebsd/pkg/blob/56fa3f87=
d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778" target=3D"_=
blank">https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47=
a860ee82/libpkg/pkg_repo_create.c#L778</a><br><br>and/or in the pkg_create_=
repo_worker() function here:<br><br><a href=3D"https://github.com/freebsd/p=
kg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L=
341" target=3D"_blank">https://github.com/freebsd/pkg/blob/56fa3f87d9d96443=
48b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341</a><br><br><br>(I&#3=
9;m trying to spare you the time needed to find the actual code being execu=
ted, I guess you would have identified this in a few minutes yourself, but =
I&#39;m trying to make myself useful)<br></div></div></blockquote><div><br>=
</div><div><br></div></div>There appears to be a GitHub issue for poudriere=
=C2=A0with this, but seems to be looking in another direction.<div><br></di=
v><div><a href=3D"https://github.com/freebsd/poudriere/issues/1009" target=
=3D"_blank">https://github.com/freebsd/poudriere/issues/1009</a></div></div=
></blockquote><div><br></div><div>There&#39;s a FreeBSD bug saying this is =
happening w/o qemu in the loop. <a href=3D"https://bugs.freebsd.org/276690"=
>https://bugs.freebsd.org/276690</a>; at least I think that&#39;s similar.<b=
r></div><div><br></div><div>Warner <br></div></div></div>

--00000000000064a28406100b5186--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqELPcaCj-d%2BLj_qocR6gMiHp1RL1Y92myq=TnR-W6Y1w>