Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 28 Jan 2024 22:43:27 +0100
From:      Guido Falsi <mad@madpilot.net>
To:        Warner Losh <imp@bsdimp.com>
Cc:        emulation@freebsd.org, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
Subject:   Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))
Message-ID:  <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net>
In-Reply-To: <a1845758-3535-4aa0-9274-d3b13dd3801b@madpilot.net>
References:  <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <CANCZdfr-0w2EMa=_hFT3p4gFSDO-P1Yf8Vb-1eLiwRVomo1Jfg@mail.gmail.com> <a1845758-3535-4aa0-9274-d3b13dd3801b@madpilot.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 28/01/24 22:34, Guido Falsi wrote:
> On 28/01/24 22:23, Warner Losh wrote:
>>
>>
>> On Sun, Jan 28, 2024, 12:38 PM Guido Falsi <mad@madpilot.net 
>> <mailto:mad@madpilot.net>> wrote:
>>
>>     On 28/01/24 15:15, Guido Falsi wrote:
>>      > Hi all, again,
>>      >
>>      > I have some more findings about this, I'm top posting because the
>>     old
>>      > message is not really that much relevant anymore.
>>      >
>>      > I'm now running a machine with head (commit
>>      > b32d49cfbaa0437d08e65e7cd7c82c5951b1a852 Jan 25th), poudriere
>>     installed
>>      > in it, machine is amd64, with an arm64 jail, 14.0-RELEASE, 
>> installed
>>      > from official distribution binaries (https download method), with
>>     cross
>>      > tools.
>>      >
>>      > To make sure everything is aligned I rebuild everything: updated
>>     head,
>>      > rebuild cross tools in the jail, recompiled all ports for the host
>>      > architecture and force reinstalled them, especially
>>     qemu-user-static,
>>      > cleaned up all packages for the arm64 jail.
>>      >
>>      > If I missed something important please point it out.
>>      >
>>      > I have made some more tests and I'm getting python failures in
>>     poudriere
>>      > like the one described below from time to time (don't have hard
>>     stats
>>      > but feels like 50% chance). If I get past that it usually is 
>> able to
>>      > build all the not many packages, but locks up at:
>>      >
>>      > Creating repository in /tmp/packages:   0%
>>      >
>>
>>     BTW, forgot to mention last time this worked without issue was around
>>     20th December.
>>
>>
>> I think this is a bsd-user issue. There is a race somewhere in that 
>> code that causes the hangs. I'd love a reproducible test case that is 
>> somewhat smaller than python... there are bigger races with the newer 
>> stuff and I've not had the time to chase it there either. 😞
> 
> First of all thanks for your feedback. It encourages me having someone 
> else with better knowledge about this confirm that a race condition is 
> actually a possible cause!
> 
> Strange this has not been happening up to mid December.
> 
> My main and fully reproducible use case is actually mostly with pkg.
> 
> at the end of the run poudriere runs `pkg repo` to create the meta files 
> and sign the repo. It forks itself (ncpus + 2 I guess, even forcing it 
> to 1 worker I see three processes), and then locks up, with all the 
> processes stopping using CPU (ps output is in my message)
> 
> I guess this can be reproduced with any poudriere repo with at least 
> more than ncpus packages in it. can also be reproduced using `poudriere 
> pkgclean -u <etc>`
> 
> If that does not work I'm not sure how to reproduce it in other ways, 
> but I can try  writing some code mocking what pkg seems to be doing, not 
> an expert at such things, though.
> 

In case it helps further norrow doen things, It looks like the lockup is 
happening somewhere around here:

https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778

and/or in the pkg_create_repo_worker() function here:

https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341


(I'm trying to spare you the time needed to find the actual code being 
executed, I guess you would have identified this in a few minutes 
yourself, but I'm trying to make myself useful)

-- 
Guido Falsi <mad@madpilot.net>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5ef2ab66-25ef-45f1-aa5a-4b614eab2f40>