Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Jan 2024 16:47:57 +0100
From:      Guido Falsi <mad@madpilot.net>
To:        Warner Losh <imp@bsdimp.com>, Nathan Reilly-list <lists@nreilly.com>
Cc:        emulation@freebsd.org, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>, freebsd-pkg@freebsd.org
Subject:   Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))
Message-ID:  <990427ae-0491-463e-92c7-c74700deb6fa@madpilot.net>
In-Reply-To: <e434f5a7-5730-498e-b54d-b01310f95f7a@madpilot.net>
References:  <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <CANCZdfr-0w2EMa=_hFT3p4gFSDO-P1Yf8Vb-1eLiwRVomo1Jfg@mail.gmail.com> <a1845758-3535-4aa0-9274-d3b13dd3801b@madpilot.net> <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> <D2DD631F-8AED-48B7-8FB3-86F93BA707F2@nreilly.com> <CANCZdfqELPcaCj-d%2BLj_qocR6gMiHp1RL1Y92myq=TnR-W6Y1w@mail.gmail.com> <e434f5a7-5730-498e-b54d-b01310f95f7a@madpilot.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 29/01/24 09:26, Guido Falsi wrote:
> On 29/01/24 02:10, Warner Losh wrote:
>>
>>
>> On Sun, Jan 28, 2024 at 4:45 PM Nathan Reilly-list <lists@nreilly.com 
>> <mailto:lists@nreilly.com>> wrote:
>>
>>
>>
>>>     On 29 Jan 2024, at 8:43 am, Guido Falsi <mad@madpilot.net
>>>     <mailto:mad@madpilot.net>> wrote:
>>>     On 28/01/24 22:34, Guido Falsi wrote:
>>>>     On 28/01/24 22:23, Warner Losh wrote:
>>>>>     On Sun, Jan 28, 2024, 12:38 PM Guido Falsi <mad@madpilot.net
>>>>>     <mailto:mad@madpilot.net> <mailto:mad@madpilot.net
>>>>>     <mailto:mad@madpilot.net>>> wrote:
>>>>>
>>>>>         On 28/01/24 15:15, Guido Falsi wrote:
>>>>>         [snip]
>>>>>          > Creating repository in /tmp/packages:   0%
>>>>>          >
>>>>>
>>>>>         BTW, forgot to mention last time this worked without issue
>>>>>     was around
>>>>>         20th December.
>>>>>
>>>>>
>>>>>     I think this is a bsd-user issue. There is a race somewhere in
>>>>>     that code that causes the hangs. I'd love a reproducible test
>>>>>     case that is somewhat smaller than python... there are bigger
>>>>>     races with the newer stuff and I've not had the time to chase it
>>>>>     there either. 😞
>>>>     First of all thanks for your feedback. It encourages me having
>>>>     someone else with better knowledge about this confirm that a race
>>>>     condition is actually a possible cause!
>>>>     Strange this has not been happening up to mid December.
>>>>     My main and fully reproducible use case is actually mostly with 
>>>> pkg.
>>>>     at the end of the run poudriere runs `pkg repo` to create the
>>>>     meta files and sign the repo. It forks itself (ncpus + 2 I guess,
>>>>     even forcing it to 1 worker I see three processes), and then
>>>>     locks up, with all the processes stopping using CPU (ps output is
>>>>     in my message)
>>>>     I guess this can be reproduced with any poudriere repo with at
>>>>     least more than ncpus packages in it. can also be reproduced
>>>>     using `poudriere pkgclean -u <etc>`
>>>>     If that does not work I'm not sure how to reproduce it in other
>>>>     ways, but I can try  writing some code mocking what pkg seems to
>>>>     be doing, not an expert at such things, though.
>>>
>>>     In case it helps further norrow doen things, It looks like the
>>>     lockup is happening somewhere around here:
>>>
>>>     
>>> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778 <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778>;
>>>
>>>     and/or in the pkg_create_repo_worker() function here:
>>>
>>>     
>>> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341 <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341>;
>>>
>>>
>>>     (I'm trying to spare you the time needed to find the actual code
>>>     being executed, I guess you would have identified this in a few
>>>     minutes yourself, but I'm trying to make myself useful)
>>
>>
>>     There appears to be a GitHub issue for poudriere with this, but
>>     seems to be looking in another direction.
>>
>>     https://github.com/freebsd/poudriere/issues/1009
>>     <https://github.com/freebsd/poudriere/issues/1009>;
>>
> 
> This one looks quite similar.
> 
> In my case the ports/pkg are aligned between host and jail, in fact I 
> have built them from the exact same git checkout.
> 
> I noticed pkg head has been converted to using pthreads instead of fork, 
> maybe that could help. I will make time to perform some testing.

Thanks for pointing me here, it looks like this was "it", in that by 
fixing this issue it uses native pkg-static, and sidesteps the issue.


Unluckily there ARE qemu races and lockups that prevent arm64 pkg-static 
binary to be correctly emulated by qemu-user-static. such conditions 
also cause sporadic failures in some ports being built.

I filed a PR with a fix for that issue:

https://github.com/freebsd/poudriere/pull/1115


-- 
Guido Falsi <mad@madpilot.net>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?990427ae-0491-463e-92c7-c74700deb6fa>