Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Jan 2024 10:45:19 +1100
From:      Nathan Reilly-list <lists@nreilly.com>
To:        Guido Falsi <mad@madpilot.net>
Cc:        emulation@freebsd.org, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>, freebsd-pkg@freebsd.org
Subject:   Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))
Message-ID:  <D2DD631F-8AED-48B7-8FB3-86F93BA707F2@nreilly.com>
In-Reply-To: <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net>
References:  <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <CANCZdfr-0w2EMa=_hFT3p4gFSDO-P1Yf8Vb-1eLiwRVomo1Jfg@mail.gmail.com> <a1845758-3535-4aa0-9274-d3b13dd3801b@madpilot.net> <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_999BD83C-D93A-42A4-BA24-5CB3F450FB05
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8



> On 29 Jan 2024, at 8:43=E2=80=AFam, Guido Falsi <mad@madpilot.net> =
wrote:
> On 28/01/24 22:34, Guido Falsi wrote:
>> On 28/01/24 22:23, Warner Losh wrote:
>>>=20
>>> On Sun, Jan 28, 2024, 12:38=E2=80=AFPM Guido Falsi <mad@madpilot.net =
<mailto:mad@madpilot.net>> wrote:
>>>=20
>>>     On 28/01/24 15:15, Guido Falsi wrote:
>>>     [snip]
>>>      > Creating repository in /tmp/packages:   0%
>>>      >
>>>=20
>>>     BTW, forgot to mention last time this worked without issue was =
around
>>>     20th December.
>>>=20
>>>=20
>>> I think this is a bsd-user issue. There is a race somewhere in that =
code that causes the hangs. I'd love a reproducible test case that is =
somewhat smaller than python... there are bigger races with the newer =
stuff and I've not had the time to chase it there either. =F0=9F=98=9E
>> First of all thanks for your feedback. It encourages me having =
someone else with better knowledge about this confirm that a race =
condition is actually a possible cause!
>> Strange this has not been happening up to mid December.
>> My main and fully reproducible use case is actually mostly with pkg.
>> at the end of the run poudriere runs `pkg repo` to create the meta =
files and sign the repo. It forks itself (ncpus + 2 I guess, even =
forcing it to 1 worker I see three processes), and then locks up, with =
all the processes stopping using CPU (ps output is in my message)
>> I guess this can be reproduced with any poudriere repo with at least =
more than ncpus packages in it. can also be reproduced using `poudriere =
pkgclean -u <etc>`
>> If that does not work I'm not sure how to reproduce it in other ways, =
but I can try  writing some code mocking what pkg seems to be doing, not =
an expert at such things, though.
>=20
> In case it helps further norrow doen things, It looks like the lockup =
is happening somewhere around here:
>=20
> =
https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee=
82/libpkg/pkg_repo_create.c#L778
>=20
> and/or in the pkg_create_repo_worker() function here:
>=20
> =
https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee=
82/libpkg/pkg_repo_create.c#L341
>=20
>=20
> (I'm trying to spare you the time needed to find the actual code being =
executed, I guess you would have identified this in a few minutes =
yourself, but I'm trying to make myself useful)


There appears to be a GitHub issue for poudriere with this, but seems to =
be looking in another direction.

https://github.com/freebsd/poudriere/issues/1009

Regards,
Nathan=

--Apple-Mail=_999BD83C-D93A-42A4-BA24-5CB3F450FB05
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"overflow-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;"><br =
id=3D"lineBreakAtBeginningOfMessage"><div><br><blockquote =
type=3D"cite"><div>On 29 Jan 2024, at 8:43=E2=80=AFam, Guido Falsi =
&lt;mad@madpilot.net&gt; wrote:</div><div><div>On 28/01/24 22:34, Guido =
Falsi wrote:<br><blockquote type=3D"cite">On 28/01/24 22:23, Warner Losh =
wrote:<blockquote type=3D"cite">On Sun, Jan 28, 2024, 12:38=E2=80=AFPM =
Guido Falsi &lt;mad@madpilot.net &lt;mailto:mad@madpilot.net&gt;&gt; =
wrote:<br><br>&nbsp;&nbsp;&nbsp; On 28/01/24 15:15, Guido Falsi =
wrote:<br>&nbsp; &nbsp; [snip]<br>&nbsp;&nbsp;&nbsp;&nbsp; &gt; Creating =
repository in /tmp/packages:&nbsp;&nbsp; 0%<br>&nbsp;&nbsp;&nbsp;&nbsp; =
&gt;<br><br>&nbsp;&nbsp;&nbsp; BTW, forgot to mention last time this =
worked without issue was around<br>&nbsp;&nbsp;&nbsp; 20th =
December.<br><br><br>I think this is a bsd-user issue. There is a race =
somewhere in that code that causes the hangs. I'd love a reproducible =
test case that is somewhat smaller than python... there are bigger races =
with the newer stuff and I've not had the time to chase it there either. =
=F0=9F=98=9E<br></blockquote>First of all thanks for your feedback. It =
encourages me having someone else with better knowledge about this =
confirm that a race condition is actually a possible cause!<br>Strange =
this has not been happening up to mid December.<br>My main and fully =
reproducible use case is actually mostly with pkg.<br>at the end of the =
run poudriere runs `pkg repo` to create the meta files and sign the =
repo. It forks itself (ncpus + 2 I guess, even forcing it to 1 worker I =
see three processes), and then locks up, with all the processes stopping =
using CPU (ps output is in my message)<br>I guess this can be reproduced =
with any poudriere repo with at least more than ncpus packages in it. =
can also be reproduced using `poudriere pkgclean -u &lt;etc&gt;`<br>If =
that does not work I'm not sure how to reproduce it in other ways, but I =
can try&nbsp; writing some code mocking what pkg seems to be doing, not =
an expert at such things, though.<br></blockquote><br>In case it helps =
further norrow doen things, It looks like the lockup is happening =
somewhere around =
here:<br><br>https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680d=
fd8af47a860ee82/libpkg/pkg_repo_create.c#L778<br><br>and/or in the =
pkg_create_repo_worker() function =
here:<br><br>https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680d=
fd8af47a860ee82/libpkg/pkg_repo_create.c#L341<br><br><br>(I'm trying to =
spare you the time needed to find the actual code being executed, I =
guess you would have identified this in a few minutes yourself, but I'm =
trying to make myself =
useful)<br></div></div></blockquote><div><br></div><div><br></div></div>Th=
ere appears to be a GitHub issue for poudriere&nbsp;with this, but seems =
to be looking in another direction.<div><br></div><div><a =
href=3D"https://github.com/freebsd/poudriere/issues/1009">https://github.c=
om/freebsd/poudriere/issues/1009</a></div><div><br></div><div>Regards,</di=
v><div>Nathan</div></body></html>=

--Apple-Mail=_999BD83C-D93A-42A4-BA24-5CB3F450FB05--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D2DD631F-8AED-48B7-8FB3-86F93BA707F2>