Date: Sun, 28 Jan 2024 22:43:27 +0100 From: Guido Falsi <mad@madpilot.net> To: Warner Losh <imp@bsdimp.com> Cc: emulation@freebsd.org, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org> Subject: Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) Message-ID: <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> In-Reply-To: <a1845758-3535-4aa0-9274-d3b13dd3801b@madpilot.net> References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <CANCZdfr-0w2EMa=_hFT3p4gFSDO-P1Yf8Vb-1eLiwRVomo1Jfg@mail.gmail.com> <a1845758-3535-4aa0-9274-d3b13dd3801b@madpilot.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 28/01/24 22:34, Guido Falsi wrote: > On 28/01/24 22:23, Warner Losh wrote: >> >> >> On Sun, Jan 28, 2024, 12:38 PM Guido Falsi <mad@madpilot.net >> <mailto:mad@madpilot.net>> wrote: >> >> On 28/01/24 15:15, Guido Falsi wrote: >> > Hi all, again, >> > >> > I have some more findings about this, I'm top posting because the >> old >> > message is not really that much relevant anymore. >> > >> > I'm now running a machine with head (commit >> > b32d49cfbaa0437d08e65e7cd7c82c5951b1a852 Jan 25th), poudriere >> installed >> > in it, machine is amd64, with an arm64 jail, 14.0-RELEASE, >> installed >> > from official distribution binaries (https download method), with >> cross >> > tools. >> > >> > To make sure everything is aligned I rebuild everything: updated >> head, >> > rebuild cross tools in the jail, recompiled all ports for the host >> > architecture and force reinstalled them, especially >> qemu-user-static, >> > cleaned up all packages for the arm64 jail. >> > >> > If I missed something important please point it out. >> > >> > I have made some more tests and I'm getting python failures in >> poudriere >> > like the one described below from time to time (don't have hard >> stats >> > but feels like 50% chance). If I get past that it usually is >> able to >> > build all the not many packages, but locks up at: >> > >> > Creating repository in /tmp/packages: 0% >> > >> >> BTW, forgot to mention last time this worked without issue was around >> 20th December. >> >> >> I think this is a bsd-user issue. There is a race somewhere in that >> code that causes the hangs. I'd love a reproducible test case that is >> somewhat smaller than python... there are bigger races with the newer >> stuff and I've not had the time to chase it there either. 😞 > > First of all thanks for your feedback. It encourages me having someone > else with better knowledge about this confirm that a race condition is > actually a possible cause! > > Strange this has not been happening up to mid December. > > My main and fully reproducible use case is actually mostly with pkg. > > at the end of the run poudriere runs `pkg repo` to create the meta files > and sign the repo. It forks itself (ncpus + 2 I guess, even forcing it > to 1 worker I see three processes), and then locks up, with all the > processes stopping using CPU (ps output is in my message) > > I guess this can be reproduced with any poudriere repo with at least > more than ncpus packages in it. can also be reproduced using `poudriere > pkgclean -u <etc>` > > If that does not work I'm not sure how to reproduce it in other ways, > but I can try writing some code mocking what pkg seems to be doing, not > an expert at such things, though. > In case it helps further norrow doen things, It looks like the lockup is happening somewhere around here: https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778 and/or in the pkg_create_repo_worker() function here: https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341 (I'm trying to spare you the time needed to find the actual code being executed, I guess you would have identified this in a few minutes yourself, but I'm trying to make myself useful) -- Guido Falsi <mad@madpilot.net>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5ef2ab66-25ef-45f1-aa5a-4b614eab2f40>